结构化输出

目前主流的大语言模型（LLM）输出内容格式都是Markdown，这是因为LLM的训练数据通常包含大量Markdown格式的文本，模型会倾向于生成类似格式的内容，但有时我们需要LLM生成一些其它格式的内容，例如CSV、JSON等，传统方式是使用提示词引导大语言模型返回对应格式的文本，对于这种方式LangChain提供了对应的解析器来解析数据。

此外，对于一些最新的模型，其实还有更稳定的JSON输出方式，例如OpenAI在API层直接支持了基于JSON Schema的结构化输出，这种方式要比直接用提示词引导模型输出JSON更加稳定。LangChain 1.x中内置了对于该结构化输出方式的支持，此外如果Provider不支持结构化输出还会自动Fallback到通过工具调用“模拟”结构化输出。

提示词引导实现结构化输出

StrOutputParser 解析文本

数据抽取组件StrOutputParser用于从LLM的返回中解析文本，不过LLM的返回其实本来就是文本，因此它基本什么都没做，只是把LLM返回的文本原样输出。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="qwen3:30b-a3b",
    base_url="http://localhost:11434/v1/",
    api_key="dummy",
    temperature=1,
    top_p=1,
    max_tokens=16384,
    timeout=120,
    max_retries=6
)

prompt_template = PromptTemplate.from_template("你好")

chain = prompt_template | model | StrOutputParser()

result = chain.invoke({})
print(result)

一般来说，它抽取的结果都是纯文本或Markdown文本。

JsonOutputParser 解析JSON

JsonOutputParser可以解析LLM返回的JSON文本，不过传统方式中让LLM输出JSON信息需要给出适当的提示，下面是一个例子。

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="qwen3:30b-a3b",
    base_url="http://localhost:11434/v1/",
    api_key="dummy",
    temperature=1,
    top_p=1,
    max_tokens=16384,
    timeout=120,
    max_retries=6
)

prompt_template = PromptTemplate.from_template(
    "Print a json object with random name and age. Print JSON only, DO NOT include any other text.")

chain = prompt_template | model | JsonOutputParser()

result = chain.invoke({})
print(result)

例子中我们使用了JsonOutputParser()，它能够将LLM返回的JSON文本自动解析为嵌套的字典格式。

注意：JSON输出相比普通文本更困难，qwen3:30b-a3b是一个可以在PC上运行的效果较好的LLM，但如果你使用的是其它能力更弱的模型或者提示词不是特别明确，LLM可能并不总是按预期工作，它可能输出嵌套在Markdown里的JSON或者是完全错误的内容，此时JsonOutputParser()将解析失败并抛出错误。实际开发中，我们最好编写完善的异常处理机制，处理LLM未正确返回JSON的情况。

使用Pydantic类型约束输出JSON

前面我们让LLM输出的JSON数据没有明确指出类型约束，LLM可能输出任意JSON，这有时不符合我们的需求。LangChain还支持根据Pydantic模型创建类型约束的提示词，引导LLM输出正确的字段信息，下面是一个例子。

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

model = ChatOpenAI(
    model="qwen3:30b-a3b",
    base_url="http://localhost:11434/v1/",
    api_key="dummy",
    temperature=1,
    top_p=1,
    max_tokens=16384,
    timeout=120,
    max_retries=6
)


class Student(BaseModel):
    name: str = Field(description='Student name')
    age: int = Field(description='Student age')


parser = JsonOutputParser(pydantic_object=Student)

prompt_template = PromptTemplate.from_template(
    template="""
    Print a json object with random name and age. Print JSON only, DO NOT include any other text.
    format instructions:
    {format_instructions}
    """,
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

chain = prompt_template | model | parser

result = chain.invoke({})
print(result)

代码中我们创建了一个Pydantic模型Student，在JsonOutputParser对象上，我们调用了get_format_instructions()方法，它能自动生成和类型相关的提示词，引导LLM输出正确的内容，调用“Chain”后也能自动根据类型约束解析数据。当然，它的本质还是模型直接输出JSON文本，这非常依赖LLM的能力。代码中的parser.get_format_instructions()会生成一系列引导模型输出JSON的提示词，通过LangSmith我们可以看到自动生成了类似如下的提示词文本。

STRICT OUTPUT FORMAT:
- Return only the JSON value that conforms to the schema. Do not include any additional text, explanations, headings, or separators.
- Do not wrap the JSON in Markdown or code fences (no ``` or ```json).
- Do not prepend or append any text (e.g., do not write "Here is the JSON:").
- The response must be a single top-level JSON value exactly as required by the schema (object/array/etc.), with no trailing commas or comments.

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]} the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema (shown in a code block for readability only — do not include any backticks or Markdown in your output):

{"properties": {"name": {"description": "Student name", "title": "Name", "type": "string"}, "age": {"description": "Student age", "title": "Age", "type": "integer"}}, "required": ["name", "age"]}

前面解析JSON时我们使用过Pydantic模型进行类型约束，但输出的还是字典类型数据，LangChain也支持直接输出Pydantic模型对象，我们使用PydanticOutputParser即可，其余用法和之前的JsonOutputParser完全一致。

parser = PydanticOutputParser(pydantic_object=Student)

LangChain 1.x对结构化输出的支持

下面例子中，我们使用LangChain 1.x实现了结构化输出，它会返回我们指定的Pydantic模型实例。

from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

model = ChatOpenAI(
    model="qwen3:30b-a3b",
    base_url="http://localhost:11434/v1/",
    api_key="dummy",
    temperature=1,
    top_p=1,
    max_tokens=16384,
    timeout=120,
    max_retries=6
)


class Student(BaseModel):
    name: str = Field(description='Student name')
    age: int = Field(description='Student age')


model = model.with_structured_output(Student)

prompt_template = PromptTemplate.from_template("Print a json object with random name and age.")

chain = prompt_template | model

result = chain.invoke({})
print(result)

运行代码后，我们可以通过LangSmith观察其与之前通过提示词引导方式实现的区别。

作者：Gacfox