
What is ChatGPT?
ChatGPT corresponds to three key concepts: Generative, Pre-Training and Transformer. try it now
Generative ChatGPT
Generative: refers to generating brand new data by learning from historical data. When a question is answered using ChatGPT, it is generated word by word (or three or four characters together). During the generation process, each character (or word, which may be the root word in English) can be called a token.

Pre-Training ChatGPT
This refers to pre-training the model. As a simple example, if we want a student who knows nothing about English to translate and summarize a technical article in English, we first need to teach this student the 26 letters of the English alphabet, as well as the grammar of the words and other basic knowledge, and then let him understand the technical content related to the article, and then finally we can complete the task. In contrast, it would be much simpler to have a student who is proficient in English do the task; he would only need to understand the technical content of the article in general, and would be able to summarize it well.
This is where pre-training comes in – training up some generalized abilities in advance. In AI, pre-training is achieved by constantly tweaking the parameters.
If we can train the parameters related to these generalized capabilities in advance, then in specific scenarios, simple parameter fine-tuning is all that is needed, thus dramatically reducing the computational cost of each training task.
Transformer ChatGPT
This is the core architecture of ChatGPT, a neural network model. It will be explained in detail later.
In summary, ChatGPT is a generative neural network model that uses pre-training to simulate human conversations.
ChatGPT Core Task
The core task of ChatGPT is to generate the next reasonable content that conforms to human writing habits. The specific implementation logic is: based on the statistical patterns of a large number of web pages, digitized books and other human-written content, it speculates the content that may appear next.
Word-by-Word Speculation
When using ChatGPT, if you look closely, you will notice that it answers questions verbatim or word by word. This is exactly the nature of ChatGPT: speculation on the next word or phrase to appear based on the context.
For example, let’s say we want ChatGPT to predict “It’s a beautiful day today”:
- Input the word “today”, the output may be “day”, “sun”, “tomorrow”, among which the most probable word combining with the word with the highest context probability is “day”.
- Input the five words “Today’s weather is really nice”, the output may be the words “nice”, “hot”, and “beautiful”, of which the highest probability of contextualization is the word “nice”.
Since ChatGPT learns a lot of existing human knowledge, it can make a variety of predictions. This is what the Transformer model ultimately does, but the actual principle is much more complex.
Reinforcement Learning with Human Feedback (RLHF)
Reinforcement Learning with Human Feedback (RLHF) is a method created to train the GPT-3.5 family of models. It consists of three main steps designed to optimize the quality of the output of a language model through human feedback.
Steps of RLHF
- Training a language model using supervised learning: A base language model is first trained with a large amount of labeled data.
- Collecting comparative data and training a reward model based on human preferences: Generate multiple outputs and let humans evaluate their quality, and train a reward model to predict the quality scores of these outputs.
- Optimize the language model for the reward model using reinforcement learning: Optimize the language model with the reward model so that it produces outputs that better match human preferences.
Example of RLHF
Suppose we want to train an LLM that generates high-quality dialog, the specific steps of RLHF are as follows:
- Pre-training and fine-tuning: Pre-train and fine-tune the LLM using a large amount of dialog data so that it can generate coherent dialog text.
- Generate multiple outputs:
- Provide the LLM with a prompt, e.g., “What’s the weather like today?”
- LLM generates multiple responses:
- Response 1: It’s a beautiful day.
- Response 2: I don’t know, I didn’t check the weather forecast.
- Response 3: It’s a beautiful day to go out.
- Human evaluation: Have a human evaluate the quality of these responses and assign a score to each one.
- Train a reward model: Use the data from these human assessments to train a reward model. The reward model learns how to predict the quality scores of LLM-generated text.
- Reinforcement Learning Loop: Create a reinforcement learning loop where copies of the LLM become RL agents and are optimized according to the reward model.
In this way, RLHF can significantly improve the quality of the LLM’s output, making its generated text more consistent with human preferences and expectations.
How to Learn AI Big Models?
“Those who master AI first will have a competitive advantage over those who master it later.”
This sentence, put in the opening period of the computer, the Internet, the mobile Internet, are the same truth.
I have mentored many of my peers and juniors in more than ten years of working in front-line Internet companies. Helped many people to learn and grow.
I realize that there is a lot of experience and knowledge is worth sharing to everyone, so this will and will be important AI big model materials including AI big model introductory learning mind map, fine AI big model learning books manual, video tutorials, real-world learning and other recorded video free to share out.