实现聊天机器人

前面章节我们详细学习了如何在LangChain中构建提示词，但我们还留了个ChatPromptTemplate没有展开说明。这篇笔记我们实现更复杂的例子，一个能够“记忆”对话历史实现多轮对话的聊天机器人。

LangChain中的“对话历史”

LLM实际上是无状态的，它并没有什么“记忆”。在多轮对话中LLM能够记得之前说过的话，是因为我们在应用层将对话历史全部传给了LLM，下面例子展示了LangChain中如何传递“对话历史”。

from langchain.globals import set_debug, set_verbose
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama import ChatOllama

set_debug(True)
set_verbose(True)

prompt_template = ChatPromptTemplate.from_messages([
    ('human', 'Hi, my name is Aiko.'),
    ('ai', 'Hi, Aiko.'),
    ('human', 'What is my name?'),
])

llm = ChatOllama(model='llama3.1:8b', temperature=1)

chain = prompt_template | llm | StrOutputParser()

result = chain.invoke({})
print(result)

输出结果：

Your name is Aiko! Nice to meet you!

从输出结果我们可以看到，LLM基于我们的对话历史输出了内容。

ChatPromptTemplate 聊天提示词模板

前面例子中我们使用了ChatPromptTemplate，它也是一种提示词模板，但和前一章节我们学习的各种单条消息提示词模板不同，ChatPromptTemplate还支持传入对话历史消息。基于ChatPromptTemplate我们很容易能够实现和大语言模型的多轮对话。

ChatPromptTemplate包含的内容是“消息”，除了human和ai类型的消息，大多数LLM还支持system消息，它通常用于为LLM在对话时设置一些全局规则。下面代码是一个例子，用于指示LLM在回复时加上Emoji表情来活跃气氛。

prompt_template = ChatPromptTemplate.from_messages([
    ('system', 'You are a helpful AI assistant. You always add emojis to your responses.'),
    ('human', 'Hi, my name is Aiko.'),
    ('ai', 'Hi, Aiko.'),
    ('human', 'What is my name?'),
])

加入系统提示后，输出可能就会变成这样：

Your name is Aiko 🙋‍♀️! 😊

ChatPromptTemplate的消息可以包含占位符用于动态组装模板，下面是一个例子。

prompt_template = ChatPromptTemplate.from_messages([
    ('system', 'You are a helpful AI assistant. You always add emojis to your responses. You always response in {language}.'),
    ('human', 'Hi, my name is Aiko.'),
    ('ai', 'Hi, Aiko.'),
    ('human', 'What is my name?'),
])

result = chain.invoke({"language": "Spanish"})

占位符的变量需要在调用“链”时传入。

实际上，上面的('system', '...')等元组都是LangChain定义的一些简写形式，system、human和ai消息在LangChain中具体对应SystemMessage、HumanMessage、AIMessage类型。

prompt_template = ChatPromptTemplate.from_messages([
    HumanMessage('Hi, my name is Aiko.'),
    AIMessage('Hi, Aiko.'),
    HumanMessage('What is my name?'),
])

代码中，SystemMessage、HumanMessage、AIMessage其实还可以换为另一种形式，下面代码中，我们使用了SystemMessagePromptTemplate这个类，和之前不同的是我们这里的SystemMessage其实是基于PromptTemplate生成的。

from langchain.globals import set_debug, set_verbose
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate, SystemMessagePromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama import ChatOllama

set_debug(True)
set_verbose(True)

llm = ChatOllama(model='llama3.1:8b', temperature=1)

prompt_template = PromptTemplate.from_template(
    "You are a helpful AI assistant. You always add emojis to your responses. You always response in {language}."
)
chat_prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate(prompt=prompt_template),
    ('human', 'Hi, my name is Aiko.'),
    ('ai', 'Hi, Aiko.'),
    ('human', 'What is my name?'),
])

chain = chat_prompt_template | llm | StrOutputParser()
result = chain.invoke({"language": "Spanish"})
print(result)

这样我们便实现了组合使用PromptTemplate和ChatPromptTemplate两种提示词模板。前面我们还学习过FewShotPromptTemplate、PiplinePromptTemplate和Mustache模板引擎，这些模板技术都可以组合到ChatPromptTemplate中，实现更加复杂的模板组装功能。

在服务端维护“对话历史”

前面代码我们固定写死了一个数组作为对话历史，实际使用场景中肯定不是这样的，我们和AI之间的对话历史需要不断更新，此外LLM对外提供服务时为了能够和多个人对话，我们还需要建立一个会话（Session）机制，不同的人在不同的会话下有不同的对话历史。我们可以自己实现这些功能，LangChain对这些功能也进行了封装。

Session机制相关的实现被封装在langchain_community包中，我们需要先安装这个包。

pip install langchain_community

下面是实现能够多轮对话的AI聊天机器人的代码。

from langchain.globals import set_debug, set_verbose
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

set_debug(True)
set_verbose(True)

chat_history = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    """根据SessionID获取对话历史信息"""
    if session_id not in chat_history:
        chat_history[session_id] = ChatMessageHistory()
    return chat_history[session_id]


prompt_template = ChatPromptTemplate.from_messages([
    ('system', 'You are a helpful AI assistant. You always add emojis to your responses.'),
    MessagesPlaceholder(variable_name='history')
])

model = ChatOllama(model='llama3.1:8b', temperature=1)

chain = prompt_template | model | StrOutputParser()

chain_with_history = RunnableWithMessageHistory(chain, get_session_history, input_messages_key='history')

while True:
    message = input()
    if message == 'exit':
        break
    result = chain_with_history.invoke({'history': [('human', message)]},
                                       config={'configurable': {'session_id': 'abc123'}})
    print(result)

代码中我们的对话历史都维护在chat_history这个dict对象中，实际开发中我们可能还会将其存储在数据库中，无论如何，查询对话历史的逻辑都封装在get_session_history这个函数里，它接收一个SessionID并返回之前的对话历史。

在组装好Chain后，我们又在其之上封装了RunnableWithMessageHistory对象，它为整个Chain的运行加入了Session机制，随后我们就可以基于这个对象来实现带历史的多轮对话了，在invoke()方法中，我们除了传入当前的输入消息还传入了SessionID，同一个SessionID对应着同一组会话。

当然，LangChain的这个会话层实现也有很多问题，它封装的太深了，使用起来非常不灵活，而对话历史在实际开发中操作和优化空间很大，LangChain的这个会话实现可能难以满足我们的需求，因此这里仅做了解。实际开发中，我们不使用LangChain的这个会话实现也是完全可以的，我们将自己维护的整个对话历史传递给RunnableWithMessageHistory封装前的Chain也是同样的效果。

流式输出

LLM输出大段文本通常需要较长的时间，聊天机器人是一个相对实时的应用场景，因此我们通常会采用流式输出的方式，避免用户苦苦等待。LangChain封装了流式输出的方法，我们可以很方便的实现流式输出，下面是一个例子。

from langchain.globals import set_debug, set_verbose
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

set_debug(True)
set_verbose(True)

chat_history = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    """根据SessionID获取对话历史信息"""
    if session_id not in chat_history:
        chat_history[session_id] = ChatMessageHistory()
    return chat_history[session_id]


prompt_template = ChatPromptTemplate.from_messages([
    ('system', 'You are a helpful AI assistant. You always add emojis to your responses.'),
    MessagesPlaceholder(variable_name='history')
])

model = ChatOllama(model='llama3.1:8b', temperature=1)

chain = prompt_template | model | StrOutputParser()
chain_with_history = RunnableWithMessageHistory(chain, get_session_history, input_messages_key='history')

while True:
    message = input()
    if message == 'exit':
        break
    for response in chain_with_history.stream({'history': [('human', message)]},
                                              config={'configurable': {'session_id': 'abc123'}}):
        print(response, end='')
    print('')

代码中，我们调用Chain的方法从之前的invoke()换成了stream()，我们迭代它的返回值即可以流的方式读取LLM的输出内容。实际开发中，如果我们编写的是一个Web服务，通常都会采用SSE的方式输出流式消息，具体可以参考Web框架相关的章节，这里就不多介绍了。

作者：Gacfox