众所周知，LLM 知识储备有限，为了让 LLM 更好的回答问题，一个方法就是结合检索增强生成（RAG）召回相关资料、上下文，给 LLM 提供更多信息。

具体到一个 ChatBot 上，一方面，LLM 在特定领域内知识有限，需要通过外挂知识库查询相关信息，或者结合 functional calling 联网获取最新资讯。另一方面，当聊天历史变长时，LLM 很难兼顾所有历史聊天信息，为了避免顾此失彼，需要针对用户的提问（query），从历史数据中找到最相关的数据，交给 LLM 处理。

比如，聊天过程中用户可能会问到：OpenAI 的最新安全总监是谁（需要联网查询）或者 昨天我们聊到了 Vincent Van Gogh 的画(The Starry Night)，再给我说说创作这幅画的精神状态（需要从历史聊天中找到相关信息）。

RAG 可以根据 query 从知识库、对话历史中检索相关信息，让 LLM 更好的推理回答，但具体实现上也有很多问题需要解决，一个问题是，聊天过程中如果针对后续（follow-up）问题作检索。

user: Apple 发布会有哪些新产品？
bot: iPhone 16 系列手机、Apple Watch 10、AirPods 4。
user: 手机价格多少？

仅针对 follow-up 问题从知识库中检索显然不理想，缺少了必要的细节。当然也可以把过去几轮的聊天内容拼接在一起，一起去检索，但这样干扰信息比较多，聊天的主题如果在这几轮中发生变化，那么会召回很多非直接相关内容。

压缩历史对话

一个解决思路，就是压缩历史对话，并嵌入 user 的 query，然后生成一个简短的 query。这个 query 可以是一句话，也可以一个查询项 (term)。

[history] + query -> "what is the price of iPhone 16 series?" # or
[history] + query -> "iPhone 16 series price"

然后再把这个总结后的 query 交给 RAG 去检索，获得检索结果后，再交给 LLM 去回答。

实现

根据历史对话生成 query 这个任务由 LLM 来做再合适不过了，一个简单的 prompt 就可以实现：

Prompt

prompt = '''Given a context of recent chat history, summarize the user's query as a search term. Return ONLY this paraphrase.

Chat history:
{chat_history}

User query:
{user_query}
'''

结果对比

history = [
    {"role": "user", "content": "iPhone 16 just came out"},
    {"role": "assistant", "content": "That's nice, what do you think about it?"},
    {"role": "user", "content": "I think it's overpriced, iPhone 16 Pro starts at $1299, iphone 16 promax starts at $1599 and iPhone 16 standard starts at $999"},
    {"role": "assistant", "content": "The price is a bit high, but the features are impressive. It has a new chip, a new display, and a new camera system."}
]
query= 'really?'

直接使用 query 去检索，结果：

# cosine similarity, text
0.64616966 You're welcome! I'm glad I could clarify the phrase for you. Remember, idioms can be tricky, but they're a fun way to add flavor to our language. Keep practicing, and you'll be a pro in no time!
0.59912086 I think there might be a small mistake there! You meant to say "you think so", correct? The phrase "you two think so" implies that there are two people thinking, but in this conversation, it's just us having a chat.
...

先使用 summarizer 生成压缩后的 query，得到：iPhone 16 features versus price，再使用压缩后的 query 去检索，结果：

# cosine similarity, text
0.78006 That's a subjective question! As a literature teacher, I focus more on the features and functionality that suit my needs, rather than comparing brands. I'm happy with my Samsung phone's performance, but I'm sure some iPhone users would have different opinions ...
0.75120413 When choosing a phone, I generally look for a device that offers the features and specifications that best suit my needs...
...

对比发现，使用压缩后的 query 去检索，召回的文本跟上下文关联性更高。

潜在问题

通过压缩历史对话生成 query 的方式，可以减少不必要的历史信息，但如果历史比较长，当最新的 query 跟最近历史对话关联性不高时，压缩后的 query 可能无法准确表达 user 的意图。比如 user 聊到 iPhone 16，又过了几十轮对话后，user 突然问：“刚才说到 iPhone 新手机，有哪些颜色？”。滑窗后的对话历史包含的信息是有限的，到这里关于手机具体型号（16）可能已经丢失了。

此外，对比直接使用 query 去检索，增加了压缩历史对话的步骤，对话延迟会进一步提升。

参考：

Is there a way to use RAG with converasation?

对话机器人中 RAG 如何处理后续提问

压缩历史对话

实现

Prompt

结果对比

潜在问题

参考：

Comments

理想拖

Previous Article

Next Article