Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Thesis: Chain-of-thought prompting facilitates Chain-of-thought reasoning, which is an emergent ability for large scale LLMs. Through Chain-of-thought prompting, we can improve reasoning capabilities significantly, which in turn improves the accuracy on arithmetic, commonsense, and symbolic reasoning tasks.
- Methods:
- Chain-of-thought prompting template
- Contribution:
- Chain-of-thought prompting technique
- CoT allows for easier debugging since the model provide the intermediate steps that it used to arrive at the final answer. This is also helpful for interpretability
- CoT uses more computation to solve intermediate/smaller tasks before getting the final answer
- CoT prompt works across different LLMs
- There is no meaningful difference in the performance of CoT prompting based on number of examples, their order, and the type of examples
- Takeaways: We can improve LLMs performance on reasoning tasks using few-shot CoT prompting w/o the need for training from scratch or fine-tuning. This is true especially for large scale LLMs as chain of thought reasoning is because such models acquire better semantic understanding and reasoning capabilities with scale an emergent ability of LLMs
- Improvements:
- CoT prompting works for sufficiently large LLMs, so it may not work smaller LLMs or have smaller gains
- We have to have task-specific prompt template where examples and intermediate steps (chain of thoughts) must be carefully chosen to realize the performance boost
- There is a decent variance in performance for different CoT prompting examples, which require more human effort to make sure we’re realizing the full potential of this approach. Even though we have variance in performance, CoT prompting performed better than few-shot prompting. So prompt engineering STILL MATTERS
- There is no guarantee that the intermediate steps generated by the model are correct and may also lead to the wrong answer
- We don’t know if the chain of thought the model is following is
reasoning
- Inference cost for large scale LLMs with few-shot CoT prompting
- Notes:
- Chain-of-thought is a series of intermediate steps
- Few-shot prompt doesn’t help on reasoning tasks
- CoT reasoning abilities improve with scale
#nlp #llm #agents