Imad Dabbura - Chain-of-Thought Reasoning without Prompting

Chain-of-Thought Reasoning without Prompting

Thesis: Provide unsupervised method to elicit reasoning capabilities of LLMs in the decoding space. Instead of relying on CoT prompting that is task-specific and requires humans to keep iterating and optimizing the prompt to yield the intended results, CoT-decoding uses top-k alternative tokens to get the best CoT path. For each token of the top-k tokens in the decoding step 1, we continue with greedy decoding algorithm.
Methods:
- Models: PaLM2, Mistral-7B, and Gemma-7B
Contribution:
- Elicit LLMs inherent reasoning capabilities w/o the need to use prompting by simply changing the decoding process
- Avoid using human priors that are task-specific with prompting to force the LLM how to solve a task through few-shot CoT prompting
- Improve the model’s confidence by traversing top-k alternative paths
Takeaways: We can utilize reasoning capabilities of LLMs; both pre-trained and instruction-tuned, by operating in the decoding space that doesn’t require any human intervention or extensive resources to tune such models for reasoning intensive tasks
Improvements:
- CoT-decoding adds computational complexity to the inference
- The paper only branches out of the first token to explore possible paths
- It is harder to use CoT-decoding in the case of open-ended answers
- The gains from CoT-decoding starts to drop as the tasks gets harder. LLMs still struggle with tasks that require \(>= 3\) manipulation steps. One possibility for this behavior is that pre-training data distribution is mostly simple-to-medium difficulty tasks
Notes:
- LLM reasoning are typically elicited by prompting that comes in various forms:
  - CoT with few-shot examples
  - Zero-shot with detailed instructions of how to perform the task and showing the steps of how the LLM came up with the answer
  - Training of instruction fine-tuning with CoT data
- LLMs struggle with reasoning when greedy decoding is used
- The difference between the top two tokens at the \(t^{th}\) decoding step for the \(k^{th}\) decoding path is very high, which means the confidence of the LLM in the answer is very high
  - The answer’s confidence is the average of the difference of the top two tokens probability for each each time step
  - The model’s own probability is not a reliable indicator for the confidence/correctness of the answer
- LLMs typically rush directly into problem solving when asked math/commonsense reasoning questions, which results in the wrong answer most of the times. This can be partially fixed with prompting but don’t yield results as good as CoT-decoding
- Branching at the first-token leads to more diverse paths as opposed to branching at later stages of the decoding process
- It is recommended to aggregate the top-k paths to get stable results
- There is a very high correlation between the answer confidence and the existence of CoT paths in the answer
- Model’s inherent reasoning varies according to the task difficulty. Models show less reasoning abilities with harder tasks
- top-k alternative tokens decoding reveals the existence of CoT reasoning paths which correlates with the model’s confidence in the final answer
- Both pre-trained models and Instruction-tuned models showed Improvements in accuracy from CoT-decoding
  - Even thought instruction-tuned models have used a lot of CoT examples during instruction fine-tuning, these models still try to directly jump to problem solving. Therefore, CoT-decoding helped boost performance tremendously
- CoT-decoding closes the gap between pre-trained models and Instruction-tuned models in terms of reasoning capabilities
- k has significant effect on pre-trained models but has negligible effect for instruction-tuned models after k = 2. For pre-trained models, the best paths may start at a lower k, but for instruction-tuned models they are already trained to bring the best paths to the top
- Combining CoT prompting with CoT-decoding yields even better performance from eliciting more reasoning capabilities

#nlp #llm #agents