Spring AI框架全面学习笔记

发表于 2022/09/12 更新于 2026/05/09

作者

64 分钟阅读

Spring AI框架全面学习笔记

一、Spring AI 简介

1.1 什么是 Spring AI

Spring AI 是 Spring 生态中专为 AI 工程设计的应用框架，于2025年5月正式发布1.0 GA版本，2025年11月发布1.1 GA版本。它的目标是将 Spring 生态的设计原则（如可移植性、模块化、POJO驱动）应用到 AI 领域。

Spring AI 解决的核心挑战：将企业的数据和API与AI模型连接起来。

Spring AI 从 Python 生态的 LangChain、LlamaIndex 等项目中汲取灵感，但并非直接移植，而是为 Java 开发者打造的 AI 开发框架。

官网：https://spring.io/projects/spring-ai

文档：https://docs.spring.io/spring-ai/reference/

GitHub：https://github.com/spring-projects/spring-ai

1.2 核心特性一览

特性	说明
Chat Completion	支持同步/流式对话，兼容主流AI提供商
Embedding	文本向量化表示
Text to Image	文生图（DALL-E、Stability AI等）
Audio Transcription	语音转文字
Text to Speech	文字转语音
Moderation	内容审核
Structured Output	AI输出映射为Java POJO
Vector Store	支持20+向量数据库
Tool/Function Calling	工具调用，连接外部API
RAG	检索增强生成
Chat Memory	对话记忆管理
Advisors API	拦截/增强AI交互的中间件模式
MCP	Model Context Protocol集成
Observability	可观测性（Metrics/Tracing/Logging）
ETL Pipeline	文档抽取-转换-加载管道
Model Evaluation	AI输出评估，防幻觉
Agentic Patterns	智能体模式（Chain、Routing、Parallel等）

1.3 版本演进

版本	时间	里程碑
0.8.x	2024	早期里程碑版本
1.0.0 GA	2025-05	首个正式发布版本
1.0.1	2025-07	Bug修复
1.1.0 GA	2025-11	MCP集成、Prompt缓存、递归Advisor、推理模式
1.1.1	2025-12	13个新特性、16个Bug修复
2.0.0-M1	2025-12	基于Spring Boot 4.0 + Spring Framework 7.0 + Jakarta EE 11

二、快速入门

2.1 环境要求

JDK 17+（推荐21）
Spring Boot 3.3+
Maven 或 Gradle

2.2 创建项目

方式一：Spring Initializr

访问 https://start.spring.io/ ，选择依赖：

Spring Web
Spring AI (选择对应的模型提供商 Starter)

方式二：手动创建Maven项目

pom.xml 核心配置：

  
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
         https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.4.3</version>
    </parent>

    <properties>
        <java.version>17</java.version>
        <spring-ai.version>1.0.0</spring-ai.version>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.ai</groupId>
                <artifactId>spring-ai-bom</artifactId>
                <version>${spring-ai.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <!-- Spring Boot Web -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <!-- Spring AI OpenAI -->
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
        </dependency>
    </dependencies>
</project>

技巧：使用 spring-ai-bom 管理版本，避免各starter版本不一致。

2.3 Hello World

application.yml：

  
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
          temperature: 0.7

Controller：

  
@RestController
public class HelloController {

    private final ChatClient chatClient;

    // Spring AI 自动注入 ChatClient.Builder
    public HelloController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    @GetMapping("/hello")
    public String hello(@RequestParam(defaultValue = "Tell me a joke") String message) {
        return chatClient.prompt()
                .user(message)
                .call()
                .content();
    }
}

启动应用，访问 http://localhost:8080/hello 即可体验。

实用技巧：API Key 不要硬编码，使用环境变量 ${OPENAI_API_KEY} 或配置中心管理。

三、核心概念

3.1 Models（模型）

AI模型是处理和生成信息的算法。Spring AI 按输入输出类型分类：

输入	输出	模型类型	示例
文本	文本	Chat Completion	GPT-4o、Claude、通义千问
文本	向量	Embedding	text-embedding-ada-002
文本	图像	Text to Image	DALL-E 3、Stable Diffusion
音频	文本	Audio Transcription	Whisper
文本	音频	Text to Speech	OpenAI TTS
文本	标签	Moderation	OpenAI Moderation

Spring AI 模型层次：

Model (顶层接口)
  ├── ChatModel (对话模型)
  │     ├── OpenAiChatModel
  │     ├── OllamaChatModel
  │     ├── AnthropicChatModel
  │     └── ZhiPuChatModel 等
  ├── EmbeddingModel (嵌入模型)
  ├── ImageModel (图像模型)
  ├── AudioModel (音频模型)
  └── ModerationModel (审核模型)

3.2 Prompts（提示词）

Prompt是引导AI模型产生特定输出的语言输入。在Spring AI中，Prompt不仅是简单字符串，而是包含多个Message的结构化对象。

Message角色类型：

角色	说明
System	系统指令，设定AI的行为和上下文
User	用户输入
Assistant	AI的回复
Tool	工具调用的结果

示例：

  
Prompt prompt = new Prompt(
    List.of(
        new SystemMessage("你是一个专业的Java开发助手，回答要简洁精准。"),
        new UserMessage("解释一下Spring AI的ChatClient")
    )
);
ChatResponse response = chatModel.call(prompt);

3.3 Prompt Templates（提示词模板）

Spring AI 使用 StringTemplate 引擎来管理提示词模板，类似Spring MVC中View的概念：

  
// 模板定义
String template = "Tell me a {adjective} joke about {content}";

// 使用PromptTemplate渲染
PromptTemplate promptTemplate = new PromptTemplate(template);
Map<String, Object> variables = Map.of("adjective", "funny", "content", "Java");
Prompt prompt = promptTemplate.create(variables);

ChatClient中使用模板：

  
String response = chatClient.prompt()
    .user(u -> u.text("Tell me a {adjective} joke about {content}")
                  .param("adjective", "funny")
                  .param("content", "Java"))
    .call()
    .content();

3.4 Embeddings（嵌入/向量化）

Embedding是将文本、图像等转换为浮点数向量（数组），捕获语义信息。通过计算向量距离判断文本相似度。

  
// 调用嵌入模型
EmbeddingResponse response = embeddingModel.embedForResponse(
    List.of("Hello world", "你好世界")
);

// 获取向量
float[] embedding = response.getResult().getOutput();
// 如: [0.0023, -0.0145, 0.0378, ...]

通俗理解：把文本变成坐标系中的一个点，语义相近的文本在坐标系中距离也近。

3.5 Tokens（令牌）

Token是AI模型处理文本的基本单位：

英文中1 Token ≈ 0.75个单词
中文中1 Token ≈ 1-2个汉字
Token = 钱：API调用按Token数量计费
Context Window（上下文窗口）：模型单次处理的最大Token数

模型	上下文窗口
GPT-4o	128K
Claude 3.5	200K
通义千问-Max	128K
GPT-3.5	4K(已过时)

3.6 Structured Output（结构化输出）

AI模型默认输出纯字符串，Structured Output将输出映射为Java POJO：

  
// 定义输出结构
record ActorFilms(String actor, List<String> movies) {}

// 自动映射
ActorFilms films = chatClient.prompt()
    .user("Generate the filmography for a random actor.")
    .call()
    .entity(ActorFilms.class);

// 支持泛型集合
List<ActorFilms> filmList = chatClient.prompt()
    .user("Generate the filmography of 5 movies for Tom Hanks and Bill Murray.")
    .call()
    .entity(new ParameterizedTypeReference<List<ActorFilms>>() {});

原生结构化输出（需要模型支持）：

  
ActorFilms films = chatClient.prompt()
    .user("Generate the filmography for a random actor.")
    .advisors(a -> a.param(AdvisorParams.ENABLE_NATIVE_STRUCTURED_OUTPUT, true))
    .call()
    .entity(ActorFilms.class);

企业级技巧：结构化输出是与业务系统集成的关键，确保AI返回的数据可以直接映射到业务对象，避免手动解析。

四、ChatClient API 详解

4.1 ChatClient 概述

ChatClient 是与AI模型通信的流畅API（Fluent API），设计灵感来自 WebClient 和 RestClient，是Spring AI中最常用的API。

  
// 创建ChatClient
ChatClient chatClient = ChatClient.create(chatModel);

// 或使用Builder自定义
ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultSystemPrompt("你是一个专业的技术助手")
    .defaultAdvisors(...)
    .build();

4.2 同步调用

  
// 基础调用 - 返回字符串
String content = chatClient.prompt()
    .user("What is Spring AI?")
    .call()
    .content();

// 返回完整ChatResponse（含元数据）
ChatResponse chatResponse = chatClient.prompt()
    .user("What is Spring AI?")
    .call()
    .chatResponse();

// 返回实体对象
record BookInfo(String title, String author, int year) {}
BookInfo book = chatClient.prompt()
    .user("推荐一本Java书籍")
    .call()
    .entity(BookInfo.class);

4.3 流式调用

  
// 流式响应
Flux<String> stream = chatClient.prompt()
    .user("Write a poem about Spring")
    .stream()
    .content();

// 在Controller中使用SSE
@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> chatStream(@RequestParam String message) {
    return chatClient.prompt()
        .user(message)
        .stream()
        .content();
}

4.4 System Message

  
// 构建时设置默认系统消息
ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultSystemPrompt("""
        你是一个电商客服助手，请用礼貌专业的语气回答问题。
        回答要简洁，不超过200字。
        如果不知道答案，请说'我需要转接人工客服'。
        """)
    .build();

// 调用时动态设置
String response = chatClient.prompt()
    .system("请用JSON格式回答")
    .user("列出中国四大名著")
    .call()
    .content();

4.5 多模型切换

企业应用常需要在不同场景使用不同模型：

  
@Configuration
public class ChatClientConfig {

    @Bean
    public ChatClient openAiChatClient(OpenAiChatModel chatModel) {
        return ChatClient.create(chatModel);
    }

    @Bean
    public ChatClient anthropicChatClient(AnthropicChatModel chatModel) {
        return ChatClient.create(chatModel);
    }
}

// 使用时按需注入
@Service
public class AiService {

    @Qualifier("openAiChatClient")
    private final ChatClient openAiClient;

    @Qualifier("anthropicChatClient")
    private final ChatClient anthropicClient;

    public String ask(String question, String modelType) {
        ChatClient client = "openai".equals(modelType) ? openAiClient : anthropicClient;
        return client.prompt().user(question).call().content();
    }
}

OpenAI兼容API多端点（使用mutate方法）：

  
@Service
public class MultiModelService {

    @Autowired
    private OpenAiChatModel baseChatModel;

    @Autowired
    private OpenAiApi baseOpenAiApi;

    public void multiClientFlow() {
        // Groq端点（Llama3）
        OpenAiApi groqApi = baseOpenAiApi.mutate()
            .baseUrl("https://api.groq.com/openai")
            .apiKey(System.getenv("GROQ_API_KEY"))
            .build();

        OpenAiChatModel groqModel = baseChatModel.mutate()
            .openAiApi(groqApi)
            .defaultOptions(OpenAiChatOptions.builder()
                .model("llama3-70b-8192")
                .temperature(0.5)
                .build())
            .build();

        String response = ChatClient.builder(groqModel).build()
            .prompt("What is the capital of France?")
            .call()
            .content();
    }
}

企业场景：复杂推理用GPT-4o，简单问答用轻量模型降低成本；不同区域使用不同合规模型。

五、模型提供商集成

5.1 OpenAI

Maven依赖：

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

配置：

  
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
          temperature: 0.7
      image:
        options:
          model: dall-e-3

支持的能力：Chat、Embedding、Image Generation、Audio Transcription、TTS、Moderation

兼容OpenAI格式的国产API（如通义千问、DeepSeek、智谱等，通过修改base-url接入）：

  
spring:
  ai:
    openai:
      chat:
        base-url: https://api.deepseek.com
        api-key: ${DEEPSEEK_API_KEY}
        options:
          model: deepseek-chat

5.2 Ollama（本地部署）

Ollama 是本地运行大模型的工具，适合隐私敏感场景和离线开发。

安装Ollama：https://ollama.ai/

# 拉取并运行模型
ollama run llama3
ollama run qwen2.5
ollama run deepseek-r1

Maven依赖：

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>

配置：

  
spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        model: llama3
        options:
          temperature: 0.7

Controller：

  
@RestController
@RequestMapping("/ollama")
public class OllamaController {

    private final ChatClient chatClient;

    public OllamaController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    @GetMapping("/chat")
    public String chat(@RequestParam String message) {
        return chatClient.prompt()
            .user(message)
            .call()
            .content();
    }

    @GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> stream(@RequestParam String message) {
        return chatClient.prompt()
            .user(message)
            .stream()
            .content();
    }
}

企业场景：金融、医疗等隐私敏感行业，数据不能上云，通过Ollama本地部署满足合规要求。

5.3 Anthropic Claude

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
</dependency>

  
spring:
  ai:
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}
      chat:
        options:
          model: claude-sonnet-4-20250514

5.4 国产大模型集成

5.4.1 Spring AI Alibaba（通义千问）

Spring AI Alibaba 是阿里官方提供的Spring AI扩展：

  
<dependency>
    <groupId>com.alibaba.cloud.ai</groupId>
    <artifactId>spring-ai-alibaba-starter</artifactId>
</dependency>

  
spring:
  ai:
    dashscope:
      api-key: ${DASHSCOPE_API_KEY}
      chat:
        options:
          model: qwen-max

5.4.2 DeepSeek

DeepSeek的API兼容OpenAI格式，可直接使用OpenAI starter：

  
spring:
  ai:
    openai:
      chat:
        base-url: https://api.deepseek.com
        api-key: ${DEEPSEEK_API_KEY}
        options:
          model: deepseek-chat

5.4.3 智谱AI（GLM）

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-zhipuai-spring-boot-starter</artifactId>
</dependency>

  
spring:
  ai:
    zhipuai:
      api-key: ${ZHIPU_API_KEY}
      chat:
        options:
          model: glm-4

5.4.4 百度文心一言

通过兼容OpenAI格式的代理API接入：

  
spring:
  ai:
    openai:
      chat:
        base-url: https://qianfan.baidubce.com/v1
        api-key: ${QIANFAN_API_KEY}
        options:
          model: ernie-4.0

5.5 模型提供商汇总

提供商	Chat	Embedding	Image	Audio	TTS
OpenAI	✅	✅	✅	✅	✅
Anthropic	✅	✅	-	-	-
Ollama	✅	✅	-	-	-
阿里通义千问	✅	✅	✅	-	✅
智谱AI	✅	✅	-	-	-
Google GenAI	✅	✅	✅	✅	✅
Azure OpenAI	✅	✅	✅	✅	✅
Mistral	✅	✅	-	-	-
DeepSeek	✅	✅	-	-	-
Minimax	✅	✅	✅	-	✅

六、Prompt Engineering（提示词工程）

6.1 提示词工程基础

提示词工程是与AI模型有效沟通的艺术和科学。良好的Prompt可以显著提升输出质量。

关键技巧：

角色设定：明确AI的角色和专业领域
任务描述：清晰定义要完成的具体任务
输出格式：指定期望的输出格式和结构
约束条件：设定回答的范围和限制
示例引导：给出1-2个期望输出的示例

6.2 System Prompt 最佳实践

  
ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultSystemPrompt("""
        你是一个{name}，专注于{domain}领域的专业顾问。
        
        回答规则：
        1. 回答必须基于事实，不确定时明确说明
        2. 使用专业但易懂的语言
        3. 如果问题超出{domain}范围，建议用户咨询相关专家
        4. 回答长度控制在{maxLength}字以内
        """)
    .build();

// 运行时传入参数
String response = chatClient.prompt()
    .user("解释微服务架构")
    .system(s -> s
        .param("name", "架构师")
        .param("domain", "软件架构")
        .param("maxLength", "500"))
    .call()
    .content();

6.3 Few-Shot Prompting

  
String response = chatClient.prompt()
    .user(u -> u.text("""
        根据以下示例，分析用户情绪：
        
        示例1：
        用户输入：这个产品太棒了！
        输出：{{"sentiment": "positive", "confidence": 0.95}}
        
        示例2：
        用户输入：等了两天还没发货，太差了
        输出：{{"sentiment": "negative", "confidence": 0.90}}
        
        请分析：{userInput}
        """)
        .param("userInput", "还行吧，一般般"))
    .call()
    .content();

6.4 Chain of Thought（思维链）

  
String response = chatClient.prompt()
    .user("""
        请一步步推理以下问题：
        一个商店打8折促销，一件原价250元的商品，
        促销价是多少？如果再使用一张满200减30的优惠券，
        最终价格是多少？
        """)
    .call()
    .content();

研究证明：以”Take a deep breath and work on this step by step”开头的提示词是最有效的Prompt之一。

6.5 企业级Prompt管理

在企业应用中，Prompt通常需要版本管理、A/B测试和动态调整：

  
@Service
public class PromptManager {

    // 从配置文件或数据库加载Prompt模板
    @Value("${app.prompts.customer-service}")
    private String customerServicePrompt;

    // 支持A/B测试的Prompt版本
    public String getPrompt(String scene, String version) {
        return promptRepository.findBySceneAndVersion(scene, version)
            .getTemplate();
    }
}

实战建议：将Prompt模板存储在外部（配置中心、数据库），而非硬编码在代码中，便于运营人员调整而无需重新部署。

七、结构化输出

7.1 基础用法

将AI的文本输出映射为Java对象，是与业务系统集成的关键能力：

  
// 定义输出结构
record BookRecommendation(
    String title,
    String author,
    String reason,
    List<String> tags
) {}

// 自动映射
BookRecommendation book = chatClient.prompt()
    .user("推荐一本关于微服务的书")
    .call()
    .entity(BookRecommendation.class);

7.2 复杂嵌套结构

  
record OrderAnalysis(
    String orderId,
    double totalAmount,
    CustomerInfo customer,
    List<OrderItem> items,
    String riskLevel
) {}

record CustomerInfo(String name, String phone, String level) {}

record OrderItem(String product, int quantity, double price) {}

OrderAnalysis analysis = chatClient.prompt()
    .user("分析订单 ORD-20240101-001 的风险等级")
    .call()
    .entity(OrderAnalysis.class);

7.3 列表和泛型

  
List<BookRecommendation> books = chatClient.prompt()
    .user("推荐3本Java书籍")
    .call()
    .entity(new ParameterizedTypeReference<List<BookRecommendation>>() {});

7.4 枚举映射

  
enum Sentiment { POSITIVE, NEUTRAL, NEGATIVE }

record SentimentResult(Sentiment sentiment, double confidence, String reason) {}

SentimentResult result = chatClient.prompt()
    .user("分析这段评论的情感倾向：'这个产品真的很好用，推荐！'")
    .call()
    .entity(SentimentResult.class);

7.5 企业实战：智能表单提取

  
record InvoiceInfo(
    String invoiceNumber,
    String supplierName,
    LocalDate invoiceDate,
    BigDecimal totalAmount,
    BigDecimal taxAmount,
    List<InvoiceItem> items
) {}

record InvoiceItem(String description, int quantity, BigDecimal unitPrice, BigDecimal amount) {}

@Service
public class InvoiceExtractService {

    private final ChatClient chatClient;

    public InvoiceInfo extract(String invoiceText) {
        return chatClient.prompt()
            .system("你是一个发票信息提取专家，请从文本中提取发票信息。日期格式为yyyy-MM-dd，金额保留两位小数。")
            .user("提取以下发票信息：\n" + invoiceText)
            .call()
            .entity(InvoiceInfo.class);
    }
}

实用技巧：结构化输出依赖模型能力，建议使用支持Function Calling的模型（如GPT-4o），准确率更高。

八、流式响应

8.1 流式输出原理

流式响应（Streaming）允许AI模型的输出逐步返回，而不是等待完整响应，用户体验更佳（类似ChatGPT的打字机效果）。

8.2 SSE打字机效果

  
@RestController
public class StreamController {

    private final ChatClient chatClient;

    public StreamController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    @GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> streamChat(@RequestParam String message) {
        return chatClient.prompt()
            .user(message)
            .stream()
            .content();
    }
}

8.3 前端配合（Vue示例）

  
// 使用EventSource接收SSE流
const eventSource = new EventSource(`/chat/stream?message=${encodeURIComponent(msg)}`);

eventSource.onmessage = (event) => {
    this.reply += event.data;
};

eventSource.onerror = () => {
    eventSource.close();
};

8.4 Flux完整流式ChatResponse

  
Flux<ChatResponse> stream = chatClient.prompt()
    .user("Write a poem")
    .stream()
    .chatResponse();

企业级技巧：流式响应需要考虑超时、断连重试、背压控制等生产级问题。在Nginx等反向代理层需要关闭缓冲：proxy_buffering off;。

九、Tool/Function Calling（工具调用）

9.1 什么是工具调用

Tool Calling 允许AI模型调用外部工具/函数，获取实时信息或执行操作。模型本身不直接执行工具，而是请求客户端应用执行并将结果返回。

两大用途：

信息检索：查询数据库、调用API获取实时数据
执行操作：发送邮件、创建订单、触发工作流

9.2 @Tool注解快速使用

  
class DateTimeTools {

    @Tool(description = "获取用户时区的当前日期时间")
    String getCurrentDateTime() {
        return LocalDateTime.now()
            .atZone(LocaleContextHolder.getTimeZone().toZoneId())
            .toString();
    }

    @Tool(description = "设置闹钟，时间格式为ISO-8601")
    void setAlarm(String time) {
        LocalDateTime alarmTime = LocalDateTime.parse(time, DateTimeFormatter.ISO_DATE_TIME);
        System.out.println("Alarm set for " + alarmTime);
    }
}

// 使用
String response = ChatClient.create(chatModel)
    .prompt("明天是星期几？")
    .tools(new DateTimeTools())
    .call()
    .content();

9.3 带参数的工具

  
class WeatherTools {

    @Tool(description = "获取指定城市的当前天气信息")
    String getWeather(
        @ToolParam(description = "城市名称") String city,
        @ToolParam(description = "温度单位，celsius或fahrenheit") String unit
    ) {
        // 调用天气API
        return weatherService.getCurrentWeather(city, unit);
    }
}

// 使用
String response = chatClient.prompt()
    .user("北京今天天气怎么样？")
    .tools(new WeatherTools())
    .call()
    .content();

9.4 Spring Bean作为工具

  
@Component
public class OrderTools {

    @Autowired
    private OrderService orderService;

    @Tool(description = "根据订单号查询订单详情")
    public OrderDTO getOrder(
        @ToolParam(description = "订单编号") String orderId
    ) {
        return orderService.getById(orderId);
    }

    @Tool(description = "创建新订单")
    public String createOrder(
        @ToolParam(description = "商品ID") String productId,
        @ToolParam(description = "购买数量") int quantity
    ) {
        return orderService.create(productId, quantity);
    }
}

// 注册为工具
String response = chatClient.prompt()
    .user("帮我查一下订单 ORD-001 的详情")
    .tools(orderTools)  // 传入Spring Bean
    .call()
    .content();

9.5 工具调用流程

客户端发送请求 → 包含工具定义（名称、描述、参数Schema）
模型决定调用工具 → 返回工具名称和参数
应用执行工具 → 识别并调用对应方法
返回工具结果 → 将结果发给模型
模型生成最终响应 → 结合工具结果回答用户

9.6 企业实战：智能客服系统

  
@Component
public class CustomerServiceTools {

    @Autowired
    private ProductService productService;
    @Autowired
    private LogisticsService logisticsService;
    @Autowired
    private RefundService refundService;

    @Tool(description = "查询商品信息")
    public ProductInfo getProduct(@ToolParam(description = "商品名称或ID") String keyword) {
        return productService.search(keyword);
    }

    @Tool(description = "查询物流信息")
    public LogisticsInfo getLogistics(@ToolParam(description = "快递单号") String trackingNo) {
        return logisticsService.track(trackingNo);
    }

    @Tool(description = "提交退款申请")
    public RefundResult applyRefund(
        @ToolParam(description = "订单号") String orderId,
        @ToolParam(description = "退款原因") String reason
    ) {
        return refundService.apply(orderId, reason);
    }
}

  
@Service
public class CustomerServiceAgent {

    private final ChatClient chatClient;

    public CustomerServiceAgent(ChatClient.Builder builder, CustomerServiceTools tools) {
        this.chatClient = builder
            .defaultSystemPrompt("你是电商客服，可以查询商品、物流和退款。用中文回答。")
            .defaultTools(tools)
            .build();
    }

    public String serve(String userMessage) {
        return chatClient.prompt()
            .user(userMessage)
            .call()
            .content();
    }
}

安全提示：模型永远不会直接访问你的API，它只能请求调用。应用端负责权限校验和参数验证。

十、RAG（检索增强生成）

10.1 什么是RAG

RAG（Retrieval Augmented Generation）是解决大模型知识过时和幻觉问题的核心技术。流程：

用户提问 → 从知识库检索相关文档
将检索结果作为上下文 → 与用户问题一起发给模型
模型基于检索到的上下文 → 生成回答

用户问题 → [检索] → 向量数据库 → [相关文档] → 
与问题拼接 → [AI模型] → 基于文档的回答

10.2 QuestionAnswerAdvisor（简单RAG）

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>

  
// 假设已有VectorStore和数据
ChatResponse response = ChatClient.builder(chatModel)
    .build().prompt()
    .advisors(QuestionAnswerAdvisor.builder(vectorStore).build())
    .user("Spring AI支持哪些模型？")
    .call()
    .chatResponse();

自定义搜索参数：

  
var qaAdvisor = QuestionAnswerAdvisor.builder(vectorStore)
    .searchRequest(SearchRequest.builder()
        .similarityThreshold(0.8)  // 相似度阈值
        .topK(6)                   // 返回前6条
        .build())
    .build();

动态过滤表达式：

  
String content = chatClient.prompt()
    .user("Please answer my question XYZ")
    .advisors(a -> a.param(QuestionAnswerAdvisor.FILTER_EXPRESSION, "type == 'Spring'"))
    .call()
    .content();

10.3 RetrievalAugmentationAdvisor（高级RAG）

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-rag</artifactId>
</dependency>

Naive RAG（基础版）：

  
Advisor ragAdvisor = RetrievalAugmentationAdvisor.builder()
    .documentRetriever(VectorStoreDocumentRetriever.builder()
        .similarityThreshold(0.50)
        .vectorStore(vectorStore)
        .build())
    .build();

String answer = chatClient.prompt()
    .advisors(ragAdvisor)
    .user("Spring AI如何使用？")
    .call()
    .content();

Advanced RAG（查询改写+检索）：

  
Advisor ragAdvisor = RetrievalAugmentationAdvisor.builder()
    .queryTransformers(RewriteQueryTransformer.builder()
        .chatClientBuilder(chatClientBuilder.build().mutate())
        .build())
    .documentRetriever(VectorStoreDocumentRetriever.builder()
        .similarityThreshold(0.50)
        .vectorStore(vectorStore)
        .build())
    .build();

10.4 查询改写（Query Transformation）

压缩查询：将对话历史压缩为独立查询

  
Query query = Query.builder()
    .text("那它的第二大城市呢？")
    .history(new UserMessage("丹麦的首都是什么？"),
             new AssistantMessage("哥本哈根是丹麦的首都。"))
    .build();

QueryTransformer transformer = CompressionQueryTransformer.builder()
    .chatClientBuilder(chatClientBuilder)
    .build();

Query transformedQuery = transformer.transform(query);
// 结果: "丹麦的第二大城市是什么？"

改写查询：让查询更适合检索

  
QueryTransformer rewriteTransformer = RewriteQueryTransformer.builder()
    .chatClientBuilder(chatClientBuilder)
    .build();

10.5 文档后处理

  
Advisor ragAdvisor = RetrievalAugmentationAdvisor.builder()
    .documentRetriever(VectorStoreDocumentRetriever.builder()
        .vectorStore(vectorStore)
        .build())
    .documentPostProcessors(new RankingDocumentPostProcessor())  // 重排序
    .build();

10.6 自定义RAG模板

  
PromptTemplate customTemplate = PromptTemplate.builder()
    .renderer(StTemplateRenderer.builder()
        .startDelimiterToken('<').endDelimiterToken('>')
        .build())
    .template("""
        <query>

        Context information is below.
        ---------------------
        <question_answer_context>
        ---------------------

        Given the context information and no prior knowledge, answer the query.
        Follow these rules:
        1. If the answer is not in the context, just say that you don't know.
        2. Avoid statements like "Based on the context...".
        """)
    .build();

QuestionAnswerAdvisor qaAdvisor = QuestionAnswerAdvisor.builder(vectorStore)
    .promptTemplate(customTemplate)
    .build();

10.7 企业实战：知识库问答系统

  
@Service
public class KnowledgeBaseService {

    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    public KnowledgeBaseService(ChatClient.Builder builder, VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        this.chatClient = builder
            .defaultSystemPrompt("你是企业知识库助手，仅根据提供的文档内容回答问题。如果文档中没有相关信息，请明确告知用户。")
            .defaultAdvisors(
                QuestionAnswerAdvisor.builder(vectorStore)
                    .searchRequest(SearchRequest.builder()
                        .similarityThreshold(0.7)
                        .topK(5)
                        .build())
                    .build())
            .build();
    }

    public String ask(String question) {
        return chatClient.prompt()
            .user(question)
            .call()
            .content();
    }
}

十一、ETL Pipeline（数据管道）

11.1 ETL概述

ETL（Extract-Transform-Load）是RAG的数据处理基础，将原始文档转换为向量存储的格式。

原始文档 → [DocumentReader读取] → [DocumentTransformer转换/切分] → [DocumentWriter写入向量库]

11.2 三大核心接口

  
// 读取：Supplier<List<Document>>
public interface DocumentReader {
    List<Document> read();
}

// 转换：Function<List<Document>, List<Document>>
public interface DocumentTransformer {
    List<Document> transform(List<Document> documents);
}

// 写入：Consumer<List<Document>>
public interface DocumentWriter {
    void write(List<Document> documents);
}

11.3 DocumentReader（文档读取）

PDF读取

  
// 添加PDF依赖
// spring-ai-pdf-document-reader

PagePdfDocumentReader pdfReader = new PagePdfDocumentReader(
    new ClassPathResource("company-policy.pdf"),
    PdfDocumentReaderConfig.builder()
        .withPageExtractedTextFormatter(
            new ExtractedTextFormatter.Builder()
                .withNumberOfPagesAtEnd(true)
                .build())
        .withPagesPerDocument(1)
        .build()
);

JSON读取

  
JsonReader jsonReader = new JsonReader(
    new ClassPathResource("products.json"),
    "description", "content"  // 指定JSON中作为文本内容的key
);
List<Document> documents = jsonReader.read();

文本读取

  
TextReader textReader = new TextReader(new ClassPathResource("knowledge.txt"));
textReader.getCustomMetadata().put("filename", "knowledge.txt");
List<Document> documents = textReader.read();

HTML读取

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-jsoup-document-reader</artifactId>
</dependency>

  
JsoupDocumentReader htmlReader = new JsoupDocumentReader(
    new ClassPathResource("page.html")
);

11.4 DocumentTransformer（文档转换/切分）

TokenTextSplitter：按Token数切分

  
TokenTextSplitter splitter = new TokenTextSplitter(
    800,   // 每个块的目标Token数
    200,   // 相邻块的重叠Token数
    5,     // 最大块数
    10000, // 最大字符数
    true   // 是否保留分隔符
);
List<Document> chunks = splitter.split(documents);

技巧：重叠(overlap)设置很重要，太低会丢失上下文，太高会增加冗余和Token消耗。通常设置为块大小的10-25%。

11.5 完整ETL示例

  
@Service
public class DocumentIngestionService {

    private final VectorStore vectorStore;

    public void ingestPdf(String filePath) {
        // 1. 读取PDF
        PagePdfDocumentReader reader = new PagePdfDocumentReader(
            new FileSystemResource(filePath)
        );

        // 2. 切分文档
        TokenTextSplitter splitter = new TokenTextSplitter(800, 200, 5, 10000, true);

        // 3. 写入向量库
        vectorStore.accept(splitter.apply(reader.get()));
        // 或使用更语义化的方法名
        // vectorStore.write(splitter.split(reader.read()));
    }

    public void ingestDirectory(String dirPath) {
        // 批量导入目录下所有PDF
        File dir = new File(dirPath);
        for (File file : dir.listFiles((d, name) -> name.endsWith(".pdf"))) {
            ingestPdf(file.getAbsolutePath());
        }
    }
}

11.6 企业实战：多源数据导入

  
@Service
public class EnterpriseDataIngestion {

    private final VectorStore vectorStore;
    private final EmbeddingModel embeddingModel;

    /**
     * 导入数据库数据到知识库
     */
    public void ingestFromDatabase(DataSource dataSource, String sql) {
        // 从数据库查询数据
        // 转换为Document
        // 写入向量库
        List<Document> documents = jdbcTemplate.query(sql, (rs, rowNum) -> {
            Document doc = new Document(rs.getString("content"));
            doc.getMetadata().put("source", "database");
            doc.getMetadata().put("table", rs.getString("table_name"));
            return doc;
        });

        TokenTextSplitter splitter = new TokenTextSplitter(800, 200, 5, 10000, true);
        vectorStore.accept(splitter.apply(documents));
    }
}

十二、向量数据库

12.1 向量数据库概述

向量数据库是RAG的核心组件，存储文档的Embedding向量，支持相似度搜索。

Spring AI 支持的向量数据库：

向量数据库	特点	适用场景
PGVector	基于PostgreSQL，运维简单	已有PG基础设施的团队
Redis	高性能，低延迟	实时检索
Milvus	分布式，高性能	大规模数据
Chroma	开源轻量	开发测试
Pinecone	全托管	无运维需求
Elasticsearch	全文+向量混合搜索	已有ES基础设施
MongoDB Atlas	文档+向量	已有MongoDB的团队
Qdrant	Rust实现，高性能	性能敏感场景
Weaviate	丰富的模块化功能	复杂搜索需求
Oracle	企业级数据库	已有Oracle基础设施
MariaDB	MySQL兼容	已有MariaDB的团队
Neo4j	图+向量	知识图谱场景
Apache Cassandra	高可用，分布式	大规模分布式场景

12.2 VectorStore接口

  
public interface VectorStore extends DocumentWriter {

    // 添加文档
    void add(List<Document> documents);

    // 删除文档
    void delete(List<String> idList);

    // 相似度搜索
    List<Document> similaritySearch(String query);

    // 带条件的相似度搜索
    List<Document> similaritySearch(SearchRequest request);
}

12.3 PGVector示例

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>

  
spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/vectordb
    username: postgres
    password: postgres
  ai:
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1536

  
@Service
public class VectorSearchService {

    private final VectorStore vectorStore;

    // 搜索相似文档
    public List<Document> search(String query) {
        return vectorStore.similaritySearch(
            SearchRequest.builder()
                .query(query)
                .topK(5)
                .similarityThreshold(0.7)
                .build()
        );
    }

    // 带过滤条件搜索
    public List<Document> searchByType(String query, String type) {
        return vectorStore.similaritySearch(
            SearchRequest.builder()
                .query(query)
                .topK(5)
                .similarityThreshold(0.7)
                .filterExpression("type == '" + type + "'")
                .build()
        );
    }
}

12.4 Redis向量库示例

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-redis-store-spring-boot-starter</artifactId>
</dependency>

  
spring:
  ai:
    vectorstore:
      redis:
        uri: redis://localhost:6379
        index: spring-ai-index
        prefix: "doc:"

12.5 SQL-like元数据过滤

Spring AI 提供了创新的SQL-like元数据过滤API，可跨所有VectorStore使用：

  
// 等于
SearchRequest.builder().filterExpression("type == 'Spring'").build()

// 比较
SearchRequest.builder().filterExpression("year > 2023").build()

// 逻辑运算
SearchRequest.builder().filterExpression("type == 'Spring' AND year > 2023").build()

// IN
SearchRequest.builder().filterExpression("type IN ['Spring', 'Java']").build()

选型建议：开发阶段用Chroma或PGVector，生产环境根据团队技术栈选择。已有PostgreSQL选PGVector，已有Redis选Redis，大规模数据选Milvus。

十三、对话记忆（Chat Memory）

13.1 为什么需要对话记忆

大模型是无状态的，每次请求都是独立的。对话记忆让AI能够记住上下文，实现多轮对话。

注意区分：Chat Memory（对话记忆，维持上下文） vs Chat History（完整聊天记录，用于审计回溯）

13.2 MessageChatMemoryAdvisor

  
ChatMemory chatMemory = MessageWindowChatMemory.builder()
    .maxMessages(20)  // 保留最近20条消息
    .build();

ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(
        MessageChatMemoryAdvisor.builder(chatMemory).build()
    )
    .build();

// 使用时指定会话ID
String response = chatClient.prompt()
    .user("我叫张三")
    .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, "user-001"))
    .call()
    .content();

// 同一会话的后续对话会记住上下文
String response2 = chatClient.prompt()
    .user("我叫什么名字？")
    .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, "user-001"))
    .call()
    .content();
// AI回答：你叫张三

13.3 记忆存储后端

内存存储（默认）

  
ChatMemoryRepository repository = new InMemoryChatMemoryRepository();
ChatMemory chatMemory = MessageWindowChatMemory.builder()
    .chatMemoryRepository(repository)
    .maxMessages(20)
    .build();

注意：内存存储在应用重启后丢失，仅适合开发测试。

JDBC存储

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-jdbc</artifactId>
</dependency>

  
spring:
  ai:
    chat:
      memory:
        repository:
          jdbc:
            initialize-schema: always  # embedded/always/never

  
@Autowired
JdbcChatMemoryRepository chatMemoryRepository;

ChatMemory chatMemory = MessageWindowChatMemory.builder()
    .chatMemoryRepository(chatMemoryRepository)
    .maxMessages(20)
    .build();

支持数据库：PostgreSQL、MySQL/MariaDB、SQL Server、HSQLDB、Oracle

Cassandra存储

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-cassandra</artifactId>
</dependency>

Neo4j存储

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-neo4j</artifactId>
</dependency>

13.4 VectorStoreChatMemoryAdvisor（向量检索记忆）

将对话历史存入向量库，通过语义相似度检索相关记忆：

  
ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(
        VectorStoreChatMemoryAdvisor.builder(vectorStore)
            .topK(10)
            .build()
    )
    .build();

13.5 企业实战：多会话客服系统

  
@Service
public class ChatSessionService {

    private final ChatClient chatClient;
    private final ChatMemory chatMemory;

    public ChatSessionService(ChatClient.Builder builder, JdbcChatMemoryRepository repository) {
        this.chatMemory = MessageWindowChatMemory.builder()
            .chatMemoryRepository(repository)
            .maxMessages(50)  // 保留最近50条
            .build();

        this.chatClient = builder
            .defaultSystemPrompt("你是专业客服，请礼貌解答用户问题。")
            .defaultAdvisors(
                MessageChatMemoryAdvisor.builder(chatMemory).build()
            )
            .build();
    }

    public String chat(String sessionId, String userMessage) {
        return chatClient.prompt()
            .user(userMessage)
            .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, sessionId))
            .call()
            .content();
    }
}

十四、Advisors API

14.1 Advisor概述

Advisors API 是Spring AI的核心中间件模式，类似于Servlet Filter或Spring Interceptor，可以拦截、修改和增强AI交互。

核心能力：

封装常见AI模式
转换发送到/来自LLM的数据
提供跨模型和用例的可移植性

14.2 Advisor链式执行

用户请求 → [Advisor1(request)] → [Advisor2(request)] → ... → [LLM]
                                                                  ↓
用户响应 ← [Advisor1(response)] ← [Advisor2(response)] ← ... ← [LLM Response]

Advisor按 getOrder() 值排序，值越小优先级越高
链式处理采用栈模式：第一个处理请求的Advisor，最后处理响应

14.3 内置Advisor

Advisor	作用
MessageChatMemoryAdvisor	对话记忆管理
QuestionAnswerAdvisor	简单RAG
RetrievalAugmentationAdvisor	高级模块化RAG
VectorStoreChatMemoryAdvisor	向量检索记忆

14.4 自定义Advisor

日志Advisor

  
public class SimpleLoggerAdvisor implements CallAdvisor, StreamAdvisor {

    private static final Logger logger = LoggerFactory.getLogger(SimpleLoggerAdvisor.class);

    @Override
    public String getName() {
        return this.getClass().getSimpleName();
    }

    @Override
    public int getOrder() {
        return 0;  // 最高优先级
    }

    @Override
    public ChatClientResponse adviseCall(ChatClientRequest request, CallAdvisorChain chain) {
        logger.debug("Request: {}", request);
        ChatClientResponse response = chain.nextCall(request);
        logger.debug("Response: {}", response);
        return response;
    }

    @Override
    public Flux<ChatClientResponse> adviseStream(ChatClientRequest request, StreamAdvisorChain chain) {
        logger.debug("Request: {}", request);
        Flux<ChatClientResponse> responses = chain.nextStream(request);
        return new ChatClientMessageAggregator()
            .aggregateChatClientResponse(responses, this::logResponse);
    }
}

内容审核Advisor

  
public class ContentModerationAdvisor implements CallAdvisor {

    private final ChatModel moderationModel;

    @Override
    public String getName() {
        return "contentModeration";
    }

    @Override
    public int getOrder() {
        return Ordered.HIGHEST_PRECEDENCE;  // 最先执行
    }

    @Override
    public ChatClientResponse adviseCall(ChatClientRequest request, CallAdvisorChain chain) {
        // 检查用户输入是否包含敏感内容
        String userInput = request.prompt().getUserMessage();
        if (containsSensitiveContent(userInput)) {
            // 阻止请求
            return new ChatClientResponse(
                new ChatResponse(List.of(new Generation(new AssistantMessage("抱歉，您的问题包含敏感内容。")))),
                request.context()
            );
        }
        return chain.nextCall(request);
    }
}

14.5 递归Advisor（1.1新增）

Spring AI 1.1引入递归Advisor，允许Advisor调用其他Advisor链，实现自我修正推理：

// 递归Advisor可以在内部调用ChatClient形成循环
// 适用于自我纠错、多步推理等场景

企业场景：利用Advisor模式实现统一的日志记录、权限校验、内容审核、Token计数等横切关注点。

十五、多模态处理

15.1 图像理解

Spring AI支持GPT-4o等模型的图像理解能力：

  
// 分析URL图片
String response = chatClient.prompt()
    .user(u -> u.text("描述这张图片的内容")
                  .media(MimeTypeUtils.IMAGE_PNG,
                         new URL("https://example.com/photo.png")))
    .call()
    .content();

// 分析本地图片
String response = chatClient.prompt()
    .user(u -> u.text("分析这张图片")
                  .media(MimeTypeUtils.IMAGE_JPEG,
                         new FileSystemResource("/path/to/image.jpg")))
    .call()
    .content();

15.2 文生图

OpenAI DALL-E

  
@RestController
@RequestMapping("/image")
public class ImageController {

    @Resource
    private OpenAiImageModel imageModel;

    @GetMapping("/generate")
    public ResponseEntity<String> generate(@RequestParam String prompt) {
        ImageResponse response = imageModel.call(new ImagePrompt(prompt));
        String imageUrl = response.getResult().getOutput().getUrl();
        return ResponseEntity.ok(imageUrl);
    }
}

Stability AI

  
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-stabilityai-spring-boot-starter</artifactId>
</dependency>

  
spring:
  ai:
    stabilityai:
      api-key: ${STABILITY_API_KEY}

15.3 语音转文字

  
// 使用OpenAI Whisper
String transcription = transcriptionModel.call(
    new AudioTranscriptionRequest(
        new FileSystemResource("audio.mp3")
    )
).getResult().getOutput();

15.4 文字转语音

  
// 使用OpenAI TTS
byte[] audio = speechModel.call(
    new SpeechRequest("你好，欢迎使用Spring AI")
).getResult().getOutput();

15.5 企业实战：智能图片审核

  
@Service
public class ImageModerationService {

    private final ChatClient chatClient;

    public ImageModerationResult moderate(byte[] imageData) {
        return chatClient.prompt()
            .system("你是一个图片审核员，检查图片是否包含违规内容。")
            .user(u -> u.text("请审核这张图片，返回审核结果")
                          .media(MimeTypeUtils.IMAGE_JPEG,
                                 new ByteArrayResource(imageData)))
            .call()
            .entity(ImageModerationResult.class);
    }
}

record ImageModerationResult(
    boolean approved,
    String reason,
    String riskLevel
) {}

十六、MCP（模型上下文协议）

16.1 MCP概述

Model Context Protocol（MCP）是Anthropic提出的标准协议，让AI模型以结构化方式与外部工具和资源交互。Spring AI 1.1全面支持MCP。

核心架构：

┌─────────────────┐     ┌─────────────────┐
│    MCP Client    │ ←→ │    MCP Server    │
│  (AI应用端)      │     │  (工具/资源端)    │
└────────┬────────┘     └────────┬────────┘
         │                       │
    ┌────┴────┐            ┌────┴────┐
    │Session  │            │Session  │
    └────┬────┘            └────┬────┘
         │                       │
    ┌────┴────┐            ┌────┴────┐
    │Transport│            │Transport│
    │(STDIO/  │            │(STDIO/  │
    │HTTP/SSE)│            │HTTP/SSE)│
    └─────────┘            └─────────┘

16.2 MCP Client配置

  
<!-- STDIO + HTTP/SSE 支持 -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-client</artifactId>
</dependency>

<!-- WebFlux支持 -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-client-webflux</artifactId>
</dependency>

  
spring:
  ai:
    mcp:
      client:
        stdio:
          servers-configuration: classpath:mcp-servers.json

mcp-servers.json：

  
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"]
    }
  }
}

16.3 MCP Server配置

  
<!-- WebMVC Server -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-server-webmvc</artifactId>
</dependency>

  
spring:
  ai:
    mcp:
      server:
        protocol: STREAMABLE  # SSE / STREAMABLE / STATELESS

16.4 MCP注解

Spring AI 1.1提供声明式MCP注解：

  
@Component
public class MyMcpTools {

    @McpTool(description = "获取用户信息")
    public UserInfo getUserInfo(
        @ToolParam(description = "用户ID") String userId
    ) {
        return userService.getById(userId);
    }

    @McpResource(uri = "config://app-settings")
    public String getAppSettings() {
        return "App Settings...";
    }

    @McpPrompt(description = "生成问候语")
    public String greeting(@ToolParam(description = "名字") String name) {
        return "请为 " + name + " 生成一段问候语";
    }
}

企业价值：MCP让AI应用和工具提供方解耦，企业可以为AI应用提供统一的MCP工具服务，多个AI应用可以复用同一套工具。

十七、Agentic Patterns（智能体模式）

17.1 什么是AI Agent

AI Agent是能自主决策、调用工具、多步推理的智能体。Spring AI 1.1提供了5种基础Agent模式：

17.2 Chain Workflow（链式工作流）

将复杂任务分解为顺序执行的步骤，前一步的输出作为下一步的输入：

  
@Service
public class ChainWorkflow {

    private final ChatClient chatClient;

    public String execute(String input) {
        // Step 1: 提取关键信息
        String extracted = chatClient.prompt()
            .system("从用户输入中提取关键实体和信息，以JSON格式返回")
            .user(input)
            .call()
            .content();

        // Step 2: 基于提取信息生成方案
        String plan = chatClient.prompt()
            .system("根据提取的信息，生成解决方案")
            .user(extracted)
            .call()
            .content();

        // Step 3: 优化方案
        String refined = chatClient.prompt()
            .system("优化和完善方案，使其更具体可行")
            .user(plan)
            .call()
            .content();

        return refined;
    }
}

17.3 Routing Workflow（路由工作流）

根据输入内容路由到不同的处理逻辑：

  
@Service
public class RoutingWorkflow {

    private final ChatClient chatClient;

    enum Route { TECHNICAL, BILLING, GENERAL }

    public String handle(String userQuery) {
        // Step 1: 分类
        Route route = chatClient.prompt()
            .system("将用户问题分类为：TECHNICAL(技术)、BILLING(账单)、GENERAL(一般)")
            .user(userQuery)
            .call()
            .entity(Route.class);

        // Step 2: 根据分类路由到不同处理
        return switch (route) {
            case TECHNICAL -> technicalHandler(userQuery);
            case BILLING -> billingHandler(userQuery);
            case GENERAL -> generalHandler(userQuery);
        };
    }
}

17.4 Parallelization Workflow（并行工作流）

多个子任务并行执行，最后汇总结果：

  
@Service
public class ParallelWorkflow {

    private final ChatClient chatClient;

    public String analyze(String text) {
        // 并行执行多个分析任务
        CompletableFuture<String> sentimentFuture =
            CompletableFuture.supplyAsync(() ->
                chatClient.prompt()
                    .system("分析情感倾向")
                    .user(text)
                    .call().content());

        CompletableFuture<String> keywordFuture =
            CompletableFuture.supplyAsync(() ->
                chatClient.prompt()
                    .system("提取关键词")
                    .user(text)
                    .call().content());

        CompletableFuture<String> summaryFuture =
            CompletableFuture.supplyAsync(() ->
                chatClient.prompt()
                    .system("生成摘要")
                    .user(text)
                    .call().content());

        // 汇总结果
        String combined = String.join("\n",
            sentimentFuture.join(),
            keywordFuture.join(),
            summaryFuture.join());

        return chatClient.prompt()
            .system("整合以下分析结果，生成综合报告")
            .user(combined)
            .call().content();
    }
}

17.5 Orchestrator-Worker（编排者-工作者）

编排者分解任务，工作者并行执行，编排者汇总：

  
@Service
public class OrchestratorWorker {

    private final ChatClient chatClient;

    record TaskPlan(String task, List<String> subtasks) {}

    public String execute(String task) {
        // 编排者：分解任务
        TaskPlan plan = chatClient.prompt()
            .system("将任务分解为可并行的子任务列表")
            .user(task)
            .call()
            .entity(TaskPlan.class);

        // 工作者：并行执行子任务
        List<CompletableFuture<String>> futures = plan.subtasks().stream()
            .map(subtask -> CompletableFuture.supplyAsync(() ->
                chatClient.prompt()
                    .system("执行子任务")
                    .user(subtask)
                    .call().content()))
            .toList();

        List<String> results = futures.stream()
            .map(CompletableFuture::join)
            .toList();

        // 编排者：汇总
        return chatClient.prompt()
            .system("整合所有子任务的结果")
            .user(String.join("\n", results))
            .call().content();
    }
}

17.6 Evaluator-Optimizer（评估者-优化者）

循环评估和优化，直到满足质量标准：

  
@Service
public class EvaluatorOptimizer {

    private final ChatClient chatClient;
    private static final int MAX_ITERATIONS = 3;

    record Evaluation(boolean pass, String feedback) {}

    public String execute(String task) {
        String current = chatClient.prompt()
            .user(task)
            .call().content();

        for (int i = 0; i < MAX_ITERATIONS; i++) {
            // 评估
            Evaluation eval = chatClient.prompt()
                .system("评估输出质量，返回pass(是否合格)和feedback(改进建议)")
                .user("任务: " + task + "\n输出: " + current)
                .call()
                .entity(Evaluation.class);

            if (eval.pass()) break;

            // 优化
            current = chatClient.prompt()
                .system("根据反馈改进输出")
                .user("原输出: " + current + "\n反馈: " + eval.feedback())
                .call().content();
        }
        return current;
    }
}

企业场景：代码生成+自动测试评估循环、文案创作+质量审核循环、数据分析+结果校验循环。

十八、可观测性（Observability）

18.1 概述

Spring AI 提供了完整的可观测性支持，包括Metrics、Traces和Logs，基于Micrometer和OpenTelemetry实现。

18.2 核心指标

组件	Metrics	说明
ChatModel	ai.chat.model.token.usage	Token使用量
ChatModel	ai.chat.model.latency	响应延迟
EmbeddingModel	ai.embedding.model.latency	嵌入延迟
VectorStore	ai.vectorstore.latency	向量搜索延迟
Advisor	ai.advisor.latency	Advisor执行延迟

18.3 接入Prometheus + Grafana

  
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-zipkin</artifactId>
</dependency>

  
management:
  endpoints:
    web:
      exposure:
        include: prometheus
  tracing:
    sampling:
      probability: 1.0

18.4 OpenTelemetry集成

Spring Boot 4.0原生支持OpenTelemetry，Spring AI的Span和Metric遵循Semantic Conventions：

  
# application.yml
spring:
  ai:
    chat:
      observations:
        include-completion: true   # 记录完整响应
        include-prompt: true       # 记录完整提示词

生产建议：监控Token消耗（成本控制）、P99延迟（用户体验）、错误率（稳定性），设置告警阈值。

十九、评估与测试

19.1 RelevancyEvaluator（相关性评估）

评估AI响应是否与提供的上下文相关：

  
@Test
void evaluateRelevancy() {
    String question = "Spring AI支持哪些模型？";

    // 获取RAG响应
    ChatResponse chatResponse = chatClient.prompt()
        .advisors(ragAdvisor)
        .user(question)
        .call()
        .chatResponse();

    // 构建评估请求
    EvaluationRequest evalRequest = new EvaluationRequest(
        question,                                                          // 用户问题
        chatResponse.getMetadata().get(RetrievalAugmentationAdvisor.DOCUMENT_CONTEXT),  // 检索的上下文
        chatResponse.getResult().getOutput().getText()                    // AI的响应
    );

    // 评估
    RelevancyEvaluator evaluator = new RelevancyEvaluator(
        ChatClient.builder(chatModel)
    );
    EvaluationResponse evalResponse = evaluator.evaluate(evalRequest);

    assertThat(evalResponse.isPass()).isTrue();
}

19.2 FactCheckingEvaluator（事实核查）

检测AI输出中的幻觉（hallucination）：

  
@Test
void testFactChecking() {
    // 使用专用的小模型做事实核查，降低成本
    OllamaApi ollamaApi = new OllamaApi("http://localhost:11434");
    ChatModel factCheckModel = new OllamaChatModel(ollamaApi,
        OllamaChatOptions.builder()
            .model("bespoke-minicheck")
            .temperature(0.0)
            .build());

    FactCheckingEvaluator evaluator = new FactCheckingEvaluator(
        ChatClient.builder(factCheckModel)
    );

    String context = "Spring AI 1.0于2025年5月发布。";
    String claim = "Spring AI 1.0于2024年发布。";  // 错误的声明

    EvaluationRequest request = new EvaluationRequest(
        context, Collections.emptyList(), claim
    );
    EvaluationResponse response = evaluator.evaluate(request);

    assertFalse(response.isPass());  // 应该检测到不一致
}

19.3 自动化测试策略

  
@SpringBootTest
class AiApplicationTests {

    @Autowired
    private ChatClient chatClient;

    @Test
    void testStructuredOutput() {
        BookInfo book = chatClient.prompt()
            .user("推荐一本Spring书籍")
            .call()
            .entity(BookInfo.class);

        assertNotNull(book.title());
        assertNotNull(book.author());
    }

    @Test
    void testToolCalling() {
        String response = chatClient.prompt()
            .user("现在几点了？")
            .tools(new DateTimeTools())
            .call()
            .content();

        // 验证响应包含时间信息
        assertNotNull(response);
        assertTrue(response.contains("202"));  // 包含年份
    }
}

最佳实践：生成用强模型，评估用小模型（降本）；评估是持续过程，不是一次性任务；建立评估基准线（baseline）。

二十、Spring AI 1.1 新特性

20.1 Prompt缓存

缓存重复Prompt的响应，可降低90%的成本：

  
spring:
  ai:
    openai:
      chat:
        options:
          cache-prompt: true

20.2 推理和思考模式

支持DeepSeek-R1等推理模型的思考过程：

  
// 推理模型会返回thinking内容
ChatResponse response = chatClient.prompt()
    .user("Solve: 25 * 37 + 148")
    .call()
    .chatResponse();
// response中包含reasoning_content

20.3 递归Advisor

允许Advisor内部调用ChatClient，形成自我修正的闭环：

// 1.1新增的递归Advisor
// 可用于：自我纠错、多步推理、质量提升循环

20.4 新增模型提供商

Google GenAI（Gemini系列）
ElevenLabs（TTS）
更多向量数据库支持

二十一、企业级最佳实践

21.1 架构设计

┌──────────┐     ┌──────────────┐     ┌──────────────┐
│  前端UI   │ ←→ │ API Gateway  │ ←→ │ AI Service   │
│ (Vue/React)│     │  (Nginx)     │     │  (Spring AI) │
└──────────┘     └──────────────┘     └───────┬──────┘
                                              │
                              ┌───────────────┼───────────────┐
                              │               │               │
                        ┌─────┴────┐  ┌──────┴─────┐  ┌─────┴────┐
                        │ChatModel │  │VectorStore │  │ToolService│
                        │(GPT-4o)  │  │(PGVector)  │  │(业务系统)  │
                        └──────────┘  └────────────┘  └──────────┘

21.2 安全实践

  
// 1. API Key管理 - 使用Vault或配置中心
@Value("${spring.ai.openai.api-key}")
private String apiKey;  // 从环境变量或配置中心获取

// 2. 输入校验 - 防止Prompt注入
public String chat(String userInput) {
    // 限制输入长度
    if (userInput.length() > 2000) {
        throw new IllegalArgumentException("输入过长");
    }
    // 过滤敏感词
    String sanitized = sanitize(userInput);
    return chatClient.prompt().user(sanitized).call().content();
}

// 3. 输出审核 - 检查AI响应是否合规

21.3 成本优化

  
// 1. 模型分级使用
@Service
public class CostOptimizedService {

    // 简单任务用轻量模型
    private final ChatClient lightModel;  // gpt-4o-mini
    // 复杂任务用强力模型
    private final ChatClient heavyModel;  // gpt-4o

    public String ask(String question, Complexity complexity) {
        ChatClient client = complexity == Complexity.HIGH ? heavyModel : lightModel;
        return client.prompt().user(question).call().content();
    }
}

// 2. 缓存重复查询
// 3. 控制maxTokens
// 4. 使用Prompt缓存（1.1新特性）

21.4 高可用部署

  
# 多模型故障转移
spring:
  ai:
    retry:
      max-attempts: 3
      backoff:
        initial-interval: 1s
        multiplier: 2

  
@Configuration
public class FailoverConfig {

    @Bean
    @Primary
    public ChatModel failoverChatModel(
            OpenAiChatModel openAi,
            OllamaChatModel ollama) {
        // 主模型不可用时切换到备用模型
        return new FailoverChatModel(openAi, ollama);
    }
}

21.5 生产环境检查清单

项目	要求
API Key安全	不硬编码，使用密钥管理服务
输入校验	限制长度、过滤敏感词、防注入
输出审核	内容合规检查
Token监控	设置消耗告警
模型降级	主模型不可用时有备选
对话记忆	使用持久化存储（JDBC/Cassandra）
日志脱敏	不记录敏感用户信息
响应缓存	减少重复调用
速率限制	防止滥用
评估测试	定期运行自动化评估

21.6 企业实战案例汇总

场景	技术方案	核心能力
智能客服	ChatClient + Tool Calling + Memory	多轮对话、查询订单、申请退款
知识库问答	RAG + VectorStore + ETL	文档导入、语义搜索、精准回答
合同审核	多模态 + Structured Output	图片/文本理解、结构化提取
代码助手	Tool Calling + Agent Pattern	代码生成、执行验证、自我纠错
数据分析	Structured Output + Chart	自然语言查询、图表生成
邮件助手	Agent + Tool Calling	起草邮件、发送、日程管理
智能审核	Evaluator-Optimizer	多轮评估优化、质量保障
呼叫中心	TTS + STT + RAG	语音交互、知识检索

二十二、常见问题与踩坑

22.1 常见报错

API Key无效：

401 Unauthorized - Check your API key

解决：检查key是否正确、是否过期、是否与环境变量对应。

Token超限：

400 Bad Request - This model's maximum context length is 128000 tokens

解决：减少输入长度，使用TokenTextSplitter分块，选择更大上下文的模型。

模型不可用：

429 Too Many Requests - Rate limit exceeded

解决：实现重试机制，添加限流，使用多key轮换。

22.2 性能优化

流式响应：使用stream()替代call()，提升用户体验
批处理嵌入：批量调用embedding接口，减少API调用次数
缓存：对相同问题缓存响应
异步：使用WebFlux + 虚拟线程提升并发
分块策略：合理设置chunk size和overlap

22.3 开发技巧

开发时用Ollama本地模型，省API费用
使用Spring AI BOM统一版本管理
对话记忆用内存存储开发，上线切JDBC
RAG开发先用简单模式，再升级高级模式
工具调用先在ChatClient级别测试，再封装成Advisor
结构化输出搭配record类，简洁高效

二十三、学习资源

官方文档：https://docs.spring.io/spring-ai/reference/
GitHub仓库：https://github.com/spring-projects/spring-ai
Spring Blog：https://spring.io/blog/category/spring-ai
Awesome Spring AI：https://github.com/spring-ai-community/awesome-spring-ai
Spring AI Alibaba：https://java2ai.com/
Baeldung教程：https://www.baeldung.com/spring-ai
Spring Academy：https://spring.academy/

附录：快速参考

A. Spring Boot Starter清单

Starter	说明
spring-ai-openai-spring-boot-starter	OpenAI
spring-ai-ollama-spring-boot-starter	Ollama
spring-ai-anthropic-spring-boot-starter	Anthropic Claude
spring-ai-zhipuai-spring-boot-starter	智谱AI
spring-ai-stabilityai-spring-boot-starter	Stability AI
spring-ai-pgvector-store-spring-boot-starter	PGVector
spring-ai-redis-store-spring-boot-starter	Redis Vector
spring-ai-advisors-vector-store	RAG Advisor
spring-ai-rag	高级RAG
spring-ai-starter-mcp-client	MCP客户端
spring-ai-starter-mcp-server-webmvc	MCP服务端(WebMVC)
spring-ai-starter-model-chat-memory-repository-jdbc	JDBC对话记忆

B. 常用配置属性

  
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        base-url: https://api.openai.com
        options:
          model: gpt-4o
          temperature: 0.7
          max-tokens: 2048
          top-p: 1.0
    ollama:
      base-url: http://localhost:11434
      chat:
        model: llama3
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
    chat:
      memory:
        repository:
          jdbc:
            initialize-schema: always
    mcp:
      client:
        stdio:
          servers-configuration: classpath:mcp-servers.json

spring

本文由作者按照 CC BY 4.0 进行授权