一种简单的大模型自动增强回答方式

简单的CoT + RL

如何让大模型有类人的思考能力？

本文的CoT实际上是一种简单的利用大模型本身的能力进行自动推理链生成方式，外加让大模型自身review自身，根据反馈意见重新生成回答以提高回答质量，后者有点类似于强化学习（RL, reinforce learning）。

最核心的逻辑是模拟人类思考过程：

先进行发散思维（答案可能会藏在哪里？与问题有关的方面是啥？）
再分步思考（把问题拆解 - 把大象装进冰箱要几步？）
结合现有知识（利用前面的发散思维成果或者外挂知识库进行RAG搜索）
整合答案，并自我验证答案（review）
如果答案不够好，则继续迭代出最终的答案。

效果

问题：What does face the music mean?

直接提问 - gpt-4o模型

“Face the music” is an idiomatic expression that means to confront the consequences of one’s actions, especially when those consequences are unpleasant or difficult. It implies taking responsibility and dealing with the reality of a situation, rather than avoiding or escaping it. For example, if someone procrastinated on a project and then has to face their boss to explain why it’s not done, they are facing the music.

简单化的思考模型 - gpt-4o-mini
值得注意的是，即使是使用了性能更差的gpt-4o-mini模型 + 该方法优化后的回答，==其回答质量也已经能超过gpt-4o直接输出答案==。回答用时21秒。

完整版的思考模型 - gpt-4o-mini
如果允许更长时间的运行和更多token的消耗，则回答质量还能进一步提升，该问题完整版用时33秒。

思路

我最开始构想的autoCoT + RL模型如下，假设用户提出一个主问题：

问题生成
- 模型生成跟主问题相关的三个相似问题
- 模型回答这三个问题
思路整理
- 模型根据这三个相似的问题和答案总结出一个解答思路
- 模型选择三个问题中的最佳回答并说明最佳的原因
- 模型根据上诉的共同解答思路，参考最佳回答模式，生成一个final_outline
回复生成
- 模型根据outline拆分问题，分为n步
- 对于其中的每一步，模型分别回答问题
- 整合问题为总回复
回复迭代
- 让另一个会话（或另一个模型）review该总回复，从一些维度评价回答质量，并给出评价分数
- 模型根据该review修改回复
- 如果回复的质量不够好，则继续迭代

这样确实能生成一个质量较高的回复，如果CoT作为插件的话其实是非常注重性能的，虽然生成更多的文本供LLM参考迭代可能会有更好的结果，但是会严重影响文本生成时间。

此外，让模型的回复进行多次迭代，有时候反而会劣化模型的回答，迭代的次数不能过多（一般最大就是两到三次），原因是模型习惯于进行较为冗长的回答，会稀释信息，降低有效信息的密度，导致丢失重点（尽管review有生成的大纲作为参考）。此外，review模型的修改建议有时候不够具体，也会影响review的质量。

因此，实际上进行简化的生成方式会有更好性能，同时维持一个比较好的回答质量。

简化的思路

省去部分步骤的CoT + RL 仍会有很好的效果，具体地：

问题分解
- 要求A会话分析问题，把问题基于语义/情景等情况分解成若干部分，并给出每部分相应的回复思路
  （考虑到性能，且回答不能过于零碎冗长，步骤不建议超过5）
回复生成
- 创建若干个会话，outline和问题作为context，要求模型输出每一个步骤的回答
- 合并所有的回答为第一次的回复。
review回复
- review该回复并进行答案迭代。

代码

# V1.4
import openai
import re
import os
import time
# 设置OpenAI API
openai.api_base = "yourapibase"
openai.api_key = "apikey"

def timing_decorator(func):
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        print(f"{func.__name__} 模型思考了 {end_time - start_time:.4f} 秒")
        return result
    return wrapper

def get_response_from_model(messages):
    response = openai.ChatCompletion.create(
        messages=messages,
        model = "gpt-4o-mini",
        temperature=0.7,
        max_tokens=800
    )
    return response.choices[0].message['content'].strip()

# 创建日志文件夹
log_dir = "./log"
if not os.path.exists(log_dir):
    os.makedirs(log_dir)

# 问题生成函数
def generate_similar_questions(user_message):
    # 构建消息列表
    messages = [
        {"role": "system", "content": "Generate three similar problems based on the input, please output these three similar problems directly without any extra nonsense, one per line, without serial number and any markdown syntax, only plain text"},
        {"role": "user", "content": f"Generate three similar problems directly without any extra nonsense, one per line, without any markdown syntax. Now, generate three similar problems according to '{user_message}'"}
    ]

    try:
        # 获取生成的文本
        text = get_response_from_model(messages)
        return text
    except openai.error.OpenAIError as e:
        print(f"OpenAI API error received: {e}")
        return None

# 检测并保存问题
def process_and_save_questions(text, user_message):
    # 检测返回值是否是三个问题（每个问题一行）
    if text and text.count("\n") == 2:
        # 如果是三个问题，直接保存
        questions = text.split("\n")
    # 检测返回值是否包含三个“?”，以每个“?”之前的字符作为问题
    elif text and text.count("?") == 3:
        questions = re.split(r'(?<=\?)', text)
    else:
        # 重新生成问题
        print("Failed to detect three valid questions. Regenerating...")
        return generate_similar_questions(user_message)

    # 保存问题到日志文件
    for i, question in enumerate(questions, 1):
        with open(f"./log/build_{i}.log", "w", encoding='utf-8') as log_file:
            log_file.write(question.strip())
    return questions

# 回答问题
def answer_questions(questions):
    answers = []
    for i, question in enumerate(questions, 1):
        messages = [
            {"role": "user", "content": f"Analysis and Explain the following question step by step: {question}"}
        ]
        try:
            answer = get_response_from_model(messages)
            answers.append(answer)
            with open(f"./log/build_{i}.log", "a", encoding='utf-8') as log_file:
                log_file.write("\nAnswer:\n" + answer)
        except openai.error.OpenAIError as e:
            print(f"OpenAI API error received: {e}")
    return answers

# 分析思路
def extract_best_answer_number(text):
    lines = text.strip().split('\n')
    best_answer_line = lines[-1]
    match = re.search(r'\b([1-3])\b', best_answer_line)
    if match:
        return int(match.group(1)) - 1  # 索引从0开始
    return None

def generate_detailed_outline(user_message, initial_outline_and_best):
    prompt = (
        f"Based on the following preliminary outline and original question, generate a detailed outline consisting of steps 1 through 5:\n\n"
        f"Original question: {user_message}\n\n"
        f"Here's the related question and reference\n{initial_outline_and_best}\n\n"
        f"Please output a detailed outline in the form of a sequential table, with each step concise."
        f"Steps: How many steps in total (Provide it in the end)"
    )

    messages = [
        {"role": "system", "content": "You are an organised analyst who can generate clear and detailed outlines with serial numbers. Provide the number of steps in the end"},
        {"role": "user", "content": prompt}
    ]

    try:
        detailed_outline = get_response_from_model(messages)
        with open("./log/build_detailed_outline.log", "w", encoding='utf-8') as log_file:
            log_file.write(detailed_outline)
        return detailed_outline.strip()

    except openai.error.OpenAIError as e:
        print(f"OpenAI API error received: {e}")
        return None

  
def analyze_common_outline(questions, answers, user_message):
    """
    分析三个问题的回答，找出共同的推理方法，选择最佳回答，并生成详细大纲。
    参数：
    - questions: 三个相似问题的列表
    - answers: 三个问题对应的回答列表
    - user_message: 原始主问题
    返回：
    - detailed_outline: 详细的推理大纲
    - best_answer: 最佳回答的内容
    """
    # 构建包含问题和回答的组合消息
    combined_message = "\n".join([f"Question {i+1}: {q}\nAnswer {i+1}: {a}" for i, (q, a) in enumerate(zip(questions, answers))])
    print (f"combined:\n{combined_message}")
    # 构建提示消息，要求在最后一行输出最佳回答的序号
    prompt = (
        f"{combined_message}\n\n"
        "1. Please summarise the common reasoning of the three responses above and list them step-by-step\n"
        "2. Based on the clarity and completeness of the responses, the best one is selected and only the serial number (1, 2 or 3) of that response is output on the last line"
    )

    messages = [
        {"role": "system", "content": "Find out the core step by step"},
        {"role": "user", "content": prompt}
    ]

    try:
        # 获取模型的响应
        outline_and_best = get_response_from_model(messages)
        with open("./log/build_outline.log", "w", encoding='utf-8') as log_file:
            log_file.write(outline_and_best)
        
        # 使用辅助函数提取最佳回答的序号
        best_answer_index = extract_best_answer_number(outline_and_best)
        if best_answer_index is not None and 0 <= best_answer_index < len(answers):
            best_answer = answers[best_answer_index]
            
            # 提取大纲（除最后一行之外的内容）
            outline = '\n'.join(outline_and_best.strip().split('\n')[:-1]).strip()
        else:
            print("无法解析最佳回答的序号，默认使用第一个回答作为参考。")
            outline = '\n'.join(outline_and_best.strip().split('\n')).strip()
            best_answer = answers[0]
        
        # 生成详细大纲
        detailed_outline = generate_detailed_outline(user_message, outline_and_best)
        print (f"detailed_outline:\n{detailed_outline}")
        return detailed_outline, best_answer
    except openai.error.OpenAIError as e:
        print(f"OpenAI API error received: {e}")

        return None, None

  
# 中文数字到阿拉伯数字的映射（仅支持1到10）
CHINESE_NUMERAL_MAP = {
    '零': 0,
    '一': 1,
    '二': 2,
    '两': 2,
    '三': 3,
    '四': 4,
    '五': 5,
    '六': 6,
    '七': 7,
    '八': 8,
    '九': 9,
    '十': 10
}

# 英文数字到阿拉伯数字的映射（仅支持0到10）
ENGLISH_NUMERAL_MAP = {
    'zero': 0,
    'one': 1,
    'two': 2,
    'three': 3,
    'four': 4,
    'five': 5,
    'six': 6,
    'seven': 7,
    'eight': 8,
    'nine': 9,
    'ten': 10
}

def chinese_to_arabic(chinese_num):
    """
    将中文数字转换为阿拉伯数字。
    仅支持一到十。
    """
    return CHINESE_NUMERAL_MAP.get(chinese_num, None)

def english_to_arabic(english_num):
    """
    将英文数字转换为阿拉伯数字。
    仅支持 zero 到 ten。
    """
    return ENGLISH_NUMERAL_MAP.get(english_num.lower(), None)

def parse_steps(outline):
    """
    解析 outline 中的步骤数量，支持中英文描述。
    """
    # 按行分割并反向遍历
    lines = outline.strip().split("\n")[::-1]
    # 定义正则表达式模式
    step_pattern = re.compile(
        r'(?:Total\s*Steps?:\s*(\d+))|'                     # "Total Step: n" 或 "Total Steps: n"
        r'(?:Step\s*(\d+))|'                                # "Step n"
        r'(?:There\s+are\s+(\d+)\s+steps?)|'                # "There are n step(s)"
        r'(?:There\s+are\s+(\d+)\s+sub-problems?)|'         # "There are n sub-problem(s)"
        r'(?:总共有\s*(\d+)\s*个?步骤)|'                    # "总共有n个步骤"
        r'(?:\s*(\d+)\s*步)|'                                # "n步"
        r'(?:\s*([一二三四五六七八九十])\s*个?步骤)|'        # "五个步骤"
        r'(?:\s*([a-zA-Z]+)\s*steps?)|'                    # "five steps", "five step" 等
        r'(?:\s*(\d+)\s*个?步骤)',                          # "5个步骤"
        re.IGNORECASE
    )

    for line in lines:
        # 使用正则表达式匹配
        match = step_pattern.search(line)
        if match:
            # 遍历所有捕获组
            for group in match.groups():
                if group:
                    # 检查是否为中文数字
                    if re.fullmatch(r'[一二三四五六七八九十]', group):
                        num_steps = chinese_to_arabic(group)
                        if num_steps is not None:
                            return num_steps

                    # 检查是否为英文数字
                    elif re.fullmatch(r'[a-zA-Z]+', group):
                        num_steps = english_to_arabic(group)
                        if num_steps is not None:
                            return num_steps
                    else:
                        # 尝试将阿拉伯数字转换为整数
                        try:
                            num_steps = int(group)
                            if 1 <= num_steps <= 10:
                                return num_steps

                        except ValueError:
                            continue

        else:
            # 如果正则表达式未匹配，检查最后一行是否仅包含一个数字或英文数字
            # 检查是否仅包含一个阿拉伯数字
            single_num_match = re.fullmatch(r'\s*(\d+)\s*', line)
            if single_num_match:
                try:
                    num = int(single_num_match.group(1))
                    if 1 <= num <= 10:
                        return num

                except ValueError:
                    pass

            # 检查是否仅包含一个英文数字
            single_eng_match = re.fullmatch(r'\s*([a-zA-Z]+)\s*', line)
            if single_eng_match:
                eng_num = single_eng_match.group(1)
                num_steps = english_to_arabic(eng_num)
                if num_steps is not None:
                    return num_steps

            # 检查是否仅包含一个中文数字
            single_chi_match = re.fullmatch(r'\s*([一二三四五六七八九十])\s*步', line)
            if single_chi_match:
                chi_num = single_chi_match.group(1)
                num_steps = chinese_to_arabic(chi_num)
                if num_steps is not None:
                    return num_steps
    # 如果没有匹配，返回 0
    return 0

def main_question_answer_and_review(outline, user_message, reference_answer, max_iterations=3, threshold=4):
    """
    - max_iterations: 最大迭代次数
    - threshold: 评分阈值，评分 >= threshold 则认为回答质量良好
    """
    current_iteration = 0
    continue_iteration = True
    final_answer = None
    first_iteration_score = None
    num_steps = parse_steps(outline)
    print (f"我们需要分{num_steps}步来回答这个问题")
    
    while continue_iteration and current_iteration < max_iterations:
        current_iteration += 1
        print(f"迭代 {current_iteration}：生成主问题的回答")
        main_answer = ""
        if num_steps > 3:
            # 逐步生成每个步骤的回答
            step_answers = []
            for i in range(1, num_steps + 1):
                prompt = (
                    f"Here is the outline of the question: {outline}\n"
                    f"Here is the main question: {user_message}\n"
                    f"Now please generate PART {i} of the main question according to the outline, You don't need to explicitly declare which step this is."
                )

                messages = [
                    {"role": "system", "content": "You should only write it in one paragraph, since it is just a part of answer"},
                    {"role": "user", "content": prompt}
                ]

                try:
                    step_answer = get_response_from_model(messages)
                    print(step_answer)
                    step_answers.append(f"{step_answer}")
                except openai.error.OpenAIError as e:
                    print(f"OpenAI API error received: {e}")
                    break

            # 合并所有步骤的回答
            main_answer = "\n".join(step_answers)
        else:
            # 直接生成完整的回答
            prompt = (
                f"{user_message}\n"
                f"Think about this question and answer it step-by-step according to the following outline:\n{outline}"
                f"You only need to think in terms of the logic of that framework, not necessarily follow it explicitly."
            )

            messages = [
                {"role": "user", "content": prompt}
            ]

            try:
                main_answer = get_response_from_model(messages)
                print("Main Question Answer:", main_answer)
            except openai.error.OpenAIError as e:
                print(f"OpenAI API error received: {e}")
                break

        # 将当前迭代的回答写入日志文件
        log_filename = f"answer_{current_iteration}.log"
        try:
            with open(log_filename, 'w', encoding='utf-8') as log_file:
                log_file.write(main_answer)
            print(f"回答已写入 {log_filename}")
        except IOError as e:
            print(f"写入日志文件时出错: {e}")
        # 让另一个会话根据回答框架迭代评分
        review_message = (
            f"Here is the answer to the question '{user_message}':\n{main_answer}\n"
            f"Please evaluate the quality of this answer based on the following criteria:\n"
            f"1. Accuracy: Is the answer correct and free of factual errors?\n"
            f"2. Completeness: Does the answer cover all aspects of the question?\n"
            f"3. Clarity: Is the answer clearly and logically presented?\n"
            f"4. Depth: Does the answer provide sufficient depth and detail?\n"
            f"5. General Score: Is this answer good enough?"
        )

        messages = [
            {"role": "system", "content": "Provide detailed feedback, and evaluate the answer (Scoring from one to five, an integer) based on the provided criteria in the end."},
            {"role": "user", "content": review_message}
        ]

        try:
            review = get_response_from_model(messages)
            print("Review:", review)
            # 解析评分
            score = extract_average_score_from_review(review)
            if score is None:
                print("无法解析评分，默认继续迭代。")
                continue_iteration = True
            else:
                if current_iteration == 1:
                    first_iteration_score = score
                    if score == 5:
                        print(f"回答质量优秀（评分：{score}），进行整合。")
                        #有可能会有重复混乱的输出，整理一下，提高信息密度
                        modify_message = (
                            f"According to the question {user_message}:\nHere is the answer:\n{main_answer}\n"
                            f"Please polish the answer to be more well-organized, while retaining all critical information"
                        )

                        messages = [
                            {"role": "user", "content": modify_message}
                        ]
                        
                        improved_answer = get_response_from_model(messages)
                        final_answer = improved_answer
                        continue_iteration = False

                    elif score == 4:
                        print(f"回答质量良好（评分：{score}），进行一次迭代。")
                        if max_iterations > 1:
                            continue_iteration = True
                        else:
                            final_answer = main_answer
                            continue_iteration = False
                    else:
                        print(f"回答质量不佳（评分：{score}），需要进一步优化。")
                else:
                    if score == 4 and first_iteration_score == 4:
                        print(f"评分两次均为4（评分：{score}），停止迭代。")
                        final_answer = main_answer
                        continue_iteration = False
                    elif score >= threshold:
                        print(f"回答质量良好（评分：{score}），停止迭代。")
                        final_answer = main_answer
                        continue_iteration = False
                    else:
                        print(f"回答质量不佳（评分：{score}），需要进一步优化。")
                if continue_iteration:
                    # 根据评价修改回答
                    modify_message = (
                        f"Here is the original answer to the question '{user_message}':\n{main_answer}\n"
                        f"Here is the review of the answer:\n{review}\n"
                        f"Please improve the original answer based on the review suggestions."
                    )

                    messages = [
                        {"role": "user", "content": modify_message}
                    ]

                    improved_answer = get_response_from_model(messages)
                    print("Improved Answer:\n", improved_answer)
                    final_answer = improved_answer
                    # 更新参考答案为改进后的回答
                    reference_answer = improved_answer

        except openai.error.OpenAIError as e:
            print(f"OpenAI API error received: {e}")
            break

    if final_answer:
        print("Final Improved Answer:", final_answer)
    else:
        print("未能生成有效的回答。")

def extract_average_score_from_review(review_text):
    """
    假设评审格式包括以下几种情况：
    1. Accuracy: 4
    2. Completeness: 5
    3. Clarity: 4
    4. Depth: 5
    Overall Score: 4
    或者
    Overall Score: four of five
    或者
    Overall Score: 4/5
    """
    # 逆序遍历评审文本的每一行
    for line in reversed(review_text.strip().split('\n')):
        # 优先检查是否包含 "Overall Score" 或 "Average Score"
        if re.search(r'(Overall|Average)\s+Score:', line, re.IGNORECASE):
            # 提取 "Score:" 后的评分
            match = re.search(r'(Overall|Average)\s+Score:\s*(\d+|one|two|three|four|five)(?:\s*of\s*(?:one|two|three|four|five)|/\d+)?', line, re.IGNORECASE)
            if match:
                score = match.group(2).lower()
                word_to_num = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5}
                if score.isdigit():
                    return int(score)
                else:
                    return word_to_num.get(score)

        # 检查 "x of y" 格式
        match = re.search(r'(\d|one|two|three|four|five)\s+of\s+(one|two|three|four|five)', line.lower())
        if match:
            word_to_num = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5}
            score = match.group(1)
            return word_to_num.get(score, int(score)) if score.isdigit() else word_to_num.get(score)
        # 检查 "x/y" 格式
        match = re.search(r'(\d)/(\d)', line)
        if match and int(match.group(2)) == 5:
            return int(match.group(1))

        # 检查单独的数字（1-5）
        match = re.search(r'\b[1-5]\b', line)
        if match:
            return int(match.group(0))
    return None

# 运行自适应链式推理流程
@timing_decorator
def run_chain_of_reasoning(user_message, enable_lite_mode=False):
    if enable_lite_mode:
        print("启用简化模式：直接处理主问题并生成详细大纲。")
        # 生成问题的大纲
        prompt = (
            f"Please break down the following problem into sub-problems or steps and indicate what needs to be addressed in each one\n\n"
            f"Question: {user_message}\n\n"
            f"Please provide sub-problems or steps and related key point to answer this question."
            f"normally it should be no more than 5 steps or sub-problems, an easy one deserves fewer steps"
            f"The Number of sub-problems or steps should be only stated in the last line"
        )

        messages = [
            {"role": "system", "content": "You are an organized analyst who can decompose questions into detailed reasoning steps and all aspects."},
            {"role": "user", "content": prompt}
        ]

        try:
            detailed_outline = get_response_from_model(messages)
            with open("./log/build_detailed_outline.log", "w", encoding='utf-8') as log_file:
                log_file.write(detailed_outline)
            print(f"detailed_outline:\n{detailed_outline}")

            # 直接进行主问题的回答和评审
            main_question_answer_and_review(detailed_outline, user_message, reference_answer=None)
        except openai.error.OpenAIError as e:
            print(f"OpenAI API error received: {e}")
    else:
        print("启用标准模式：生成相似问题并选择最佳回答。")
        similar_questions_text = generate_similar_questions(user_message)
        if similar_questions_text:
            questions = process_and_save_questions(similar_questions_text, user_message)
            if questions:
                answers = answer_questions(questions)
                outline, best_answer = analyze_common_outline(questions, answers, user_message)
                if outline and best_answer:
                    main_question_answer_and_review(outline, user_message, best_answer)

                else:
                    print("无法生成大纲或选择最佳回答。")
        else:
            print("未能生成相似问题。")

# 获取用户输入
user_message = input("请输入您的问题:\n")

# 选择运行模式
mode = input("请选择模式（standard/lite） ").strip().lower()
enable_lite_mode = mode == "lite"

# 执行链式推理
run_chain_of_reasoning(user_message, enable_lite_mode=enable_lite_mode)

Kevin's Blog

一种简单的大模型自动增强回答方式

简单的CoT + RL

效果

思路

简化的思路

代码