Recommended Terms
Original Link
https://zhuanlan.zhihu.com/p/624793654
Main Text
(If there are significant inaccuracies, please point them out, experts.)
First, let’s explain one-shot. The company’s access control system disables facial recognition. You provide only one photo, and the system can recognize you from different angles — this is one-shot. One-shot can be understood as fine-tuning a model with just one piece of data. In facial recognition scenarios, one-shot is very common.
Zero-shot and few-shot bring us back to the NLP scenario. Training a GPT model using Wikipedia, news, etc., and directly using it for dialogue tasks is zero-shot. Then, realizing there is quite a lot of nonsense generated, some people labeled a small amount of quality data and fed it in — this is few-shot.
The development history of chatGPT is from zero-shot to few-shot. (Excerpted from Mu Shen’s paper reading series)
- Background. Before GPT-3, it was a competitive relationship with BERT along two different routes.
- GPT-2 is zero-shot. Its performance did not surpass BERT, but wanting to publish a paper, it defined its selling point as zero-shot (methodological innovation), i.e., completely unsupervised learning. The paper title: Language Models are Unsupervised Multitask Learners.
- GPT-3 is few-shot. Its performance is better than BERT, no need to find academic selling points anymore. Also, the cost-effectiveness of zero-shot for products is indeed not high, so it switched to few-shot, meaning some people did labeling. The paper title: Language Models are Few-Shot Learners.
- chatGPT is HFRL. After GPT-3, the problem was: what exactly is the shot in few-shot (which data to label)? They combined it with reinforcement learning, that is, human feedback reinforcement learning, commonly known as HFRL. This is the core technology of chatGPT.
The essence of the HRFL method is: how to align the machine’s knowledge with human knowledge. It then pioneered a new direction called alignment. Many big names including openAI are following this new direction.
Note: The “alignment” here is completely different from the alignment in facial recognition.