If you’ve been keeping an eye on the Java ecosystem lately, you’ve probably noticed that AI integration is no longer just a Python story. Spring AI is the framework’s answer to that.
The goal of this blog is to share what Spring AI is, how it’s architected, and what it actually looks like to build something real with it. Whether you’re evaluating it for an internal project or just curious about how the Spring team approached AI, I hope this gives you a solid foundation.
What Is Spring AI?
At its heart, Spring AI is an application framework for AI engineering. It draws inspiration from Python projects like LangChain and LlamaIndex, but it isn’t a port of them. The Spring team built something that feels native to the Java and Spring Boot ecosystem — with all the familiar patterns like dependency injection, auto-configuration, and portable abstractions.
The core problem it solves is this: how do you connect your enterprise data and APIs with AI models without rebuilding plumbing from scratch every time?
Spring AI gives you:
- A unified, portable API across major AI providers (OpenAI, Anthropic, Azure, Google, Ollama, and more)
- Spring Boot auto-configuration so setup is mostly just adding a dependency and an API key
- Vector database integrations out of the box (PGVector, MongoDB Atlas, Redis, Pinecone, Weaviate, and many more)
- RAG (Retrieval Augmented Generation) support built right into the framework
- Tool/Function Calling so your AI model can trigger real application logic
- Structured Outputs mapped directly to Java POJOs
- Observability via Micrometer for monitoring AI interactions in production
Architecture Overview
Before diving into code, it’s worth understanding how the pieces fit together.
User Request
│
▼
ChatClient (Fluent API)
│
├── Advisors (pre/post processing)
│ ├── QuestionAnswerAdvisor (RAG)
│ ├── MessageChatMemoryAdvisor (Memory)
│ └── SimpleLoggerAdvisor (Logging)
│
├── PromptTemplate (variable substitution)
│
▼
ChatModel (portable abstraction)
│
├── OpenAI
├── Anthropic
├── Azure OpenAI
├── Google Gemini
└── Ollama (local)The key insight here is that ChatModel is just an interface. You write your application code against that interface, and swapping from OpenAI to Anthropic (or to a locally hosted Ollama model) is mostly a config change. That kind of portability is something Python developers often have to build themselves.<h2>Setting Up Your First Spring AI Project</h2>
Step 1: Add Dependencies
Head over to start.spring.io and add:
- Spring Web
- Spring AI OpenAI (or whichever provider you’re using)
Or manually add to your pom.xml:
xml
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>Also add the Spring AI BOM for version management:
xml
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>1.1.2</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>Step 2: Configure Your API Key
In application.properties:
properties
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4oSecurity tip: Never hardcode API keys. Use environment variables or Spring Cloud Config / Vault in production.
Step 3: Your First AI-Powered Endpoint
java
@RestController
public class ChatController {
private final ChatClient chatClient;
public ChatController(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
@GetMapping("/chat")
public String chat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.call()
.content();
}
}That’s it. No boilerplate HTTP client setup, no JSON wrangling. The ChatClient.Builder is auto-configured by Spring Boot, and the fluent API feels familiar if you’ve used WebClient or RestClient.
Key Concepts You Need to Know
1. ChatClient — The Entry Point
ChatClient is the primary API you’ll interact with. It has a fluent builder-style interface that lets you compose your prompt, add advisors, configure options, and choose how you want to receive the response.
java
// Simple string response
String response = chatClient.prompt()
.user("Explain microservices in 2 sentences")
.call()
.content();
// Rich response with metadata
ChatResponse chatResponse = chatClient.prompt()
.user("Tell me a joke")
.call()
.chatResponse();
// Stream tokens as they arrive
Flux<String> stream = chatClient.prompt()
.user("Write a short story")
.stream()
.content();One thing that tripped me up initially: call() doesn’t actually execute the request. It just tells Spring AI whether to use synchronous or streaming mode. The actual model call happens when you chain .content(), .chatResponse(), or .entity().
2. Prompt Templates
Hard-coding prompt strings is fine for demos but quickly gets messy. Spring AI has first-class support for prompt templates with variable substitution:
java
String answer = ChatClient.create(chatModel).prompt()
.user(u -> u
.text("Summarize the following topic in {language}: {topic}")
.param("language", "Hindi")
.param("topic", "Kubernetes"))
.call()
.content();By default it uses the StringTemplate engine. If your prompts contain JSON, you can switch delimiters to avoid conflicts:
java
.templateRenderer(
StTemplateRenderer.builder()
.startDelimiterToken('<')
.endDelimiterToken('>')
.build()
)3. Structured Outputs — AI Response as a Java Object
One of the more practical features. Instead of parsing a raw string, you can map the AI’s response directly to a Java record or class:
java
record ProductRecommendation(String name, String reason, double priceEstimate) {}
ProductRecommendation recommendation = chatClient.prompt()
.user("Recommend a laptop for a Java developer who values performance")
.call()
.entity(ProductRecommendation.class);
System.out.println(recommendation.name()); // e.g., "MacBook Pro M3"
Spring AI handles the format instructions and parsing under the hood. For a list of objects, use ParameterizedTypeReference:java
List<ProductRecommendation> recommendations = chatClient.prompt()
.user("Give me 3 laptop recommendations for a Java developer")
.call()
.entity(new ParameterizedTypeReference<List<ProductRecommendation>>() {});4. Advisors — Middleware for AI Calls
Advisors are one of the most powerful concepts in Spring AI. Think of them as a middleware chain that wraps your AI interactions. They can inspect, modify, or enrich the request before it hits the model, and post-process the response on the way back.
Spring AI ships with several built-in advisors:
Chat Memory Advisor — gives your bot a conversation history:
java
ChatMemory memory = new MessageWindowChatMemory();
String response = chatClient.prompt()
.advisors(
MessageChatMemoryAdvisor.builder(memory)
.conversationId("user-session-123")
.build()
)
.user("My name is Rohan")
.call()
.content();
// In a follow-up call, it will remember "Rohan"
String followUp = chatClient.prompt()
.advisors(
MessageChatMemoryAdvisor.builder(memory)
.conversationId("user-session-123")
.build()
)
.user("What's my name?")
.call()
.content();Question Answer Advisor (RAG) — augments the prompt with relevant documents from your vector store:
java
String answer = chatClient.prompt()
.advisors(
QuestionAnswerAdvisor.builder(vectorStore).build()
)
.user("What is our refund policy?")
.call()
.content();SimpleLoggerAdvisor — logs request/response for debugging:
java
chatClient.prompt()
.advisors(new SimpleLoggerAdvisor())
.user("Hello!")
.call()
.content();Add this to application.properties to see the logs:
properties
logging.level.org.springframework.ai.chat.client.advisor=DEBUG
The order in which advisors are added matters. They execute in the order added, so put memory advisors before RAG advisors if you want memory context to influence the retrieval.
Evaluate Spring AI For Your Enterprise Use Case, Book A Strategy Call.
5. RAG — Retrieval Augmented Generation
RAG is the pattern where you take a user’s question, search your own knowledge base for relevant content, and include that content in the prompt before sending it to the model. This lets you build a chatbot that answers questions about your data — not just what the model was trained on.
The Spring AI RAG pipeline has three stages:
Ingest (ETL) Load your documents, split them into chunks, generate embeddings, and store them in a vector database:
java
@Bean
public ApplicationRunner ingestDocuments(
VectorStore vectorStore,
ResourcePatternResolver resolver) {
return args -> {
Resource[] resources = resolver.getResources("classpath:/docs/*.pdf");
List<Document> docs = new TokenTextSplitter().apply(
new TikaDocumentReader(resources[0]).get()
);
vectorStore.add(docs);
};
}Retrieve At query time, the QuestionAnswerAdvisor automatically retrieves the top-k most relevant chunks from the vector store.
Generate The retrieved chunks are injected into the prompt as context, and the LLM generates a grounded answer.
java
@RestController
public class DocumentQAController {
private final ChatClient chatClient;
private final VectorStore vectorStore;
public DocumentQAController(ChatClient.Builder builder, VectorStore vectorStore) {
this.vectorStore = vectorStore;
this.chatClient = builder
.defaultAdvisors(QuestionAnswerAdvisor.builder(vectorStore).build())
.build();
}
@GetMapping("/ask")
public String ask(@RequestParam String question) {
return chatClient.prompt()
.user(question)
.call()
.content();
}
}6. Tool/Function Calling
Tool calling lets the AI model invoke your application’s methods to fetch real-time data or trigger actions. The model decides when to call a tool based on the conversation context — you just declare the tools.
java
@Bean
@Description("Get current stock price for a given ticker symbol")
public Function<StockRequest, StockResponse> getStockPrice() {
return request -> {
// Call your actual stock price service here
double price = stockService.getPrice(request.ticker());
return new StockResponse(request.ticker(), price);
};
}
record StockRequest(String ticker) {}
record StockResponse(String ticker, double price) {}Then reference the function by name in your ChatClient call:
java
String response = chatClient.prompt()
.functions("getStockPrice")
.user("What is the current price of INFY?")
.call()
.content();The model will call your function automatically if it determines it needs that information to answer.
Setting Up Defaults — A Clean Pattern for Production
Rather than configuring system prompts and advisors in every controller, a cleaner approach is to set defaults at the ChatClient bean level:
java
@Configuration
public class AiConfig {
@Bean
public ChatClient chatClient(ChatClient.Builder builder, VectorStore vectorStore) {
return builder
.defaultSystem("""
You are a helpful assistant for our internal developer portal.
Always respond in a professional tone.
If you don't know something, say so clearly.
""")
.defaultAdvisors(
QuestionAnswerAdvisor.builder(vectorStore).build(),
new SimpleLoggerAdvisor()
)
.build();
}
}Controllers can then inject this pre-configured bean and just call .user() without worrying about the rest.
Working with Multiple AI Providers
There are cases where you might want to use different models for different tasks — maybe GPT-4o for complex reasoning and a faster/cheaper model for classification. Spring AI handles this well.
First, disable auto-configuration of a single ChatClient.Builder:
properties
spring.ai.chat.client.enabled=falseThen define your beans explicitly:
java
@Configuration
public class MultiModelConfig {
@Bean("complexTaskClient")
public ChatClient complexTaskClient(OpenAiChatModel model) {
return ChatClient.builder(model)
.defaultOptions(OpenAiChatOptions.builder()
.model("gpt-4o")
.temperature(0.2)
.build())
.build();
}
@Bean("fastTaskClient")
public ChatClient fastTaskClient(OpenAiChatModel model) {
return ChatClient.builder(model)
.defaultOptions(OpenAiChatOptions.builder()
.model("gpt-4o-mini")
.temperature(0.7)
.build())
.build();
}
}Inject using @Qualifier:
java
@Service
public class AiService {
private final ChatClient complexClient;
private final ChatClient fastClient;
public AiService(
@Qualifier("complexTaskClient") ChatClient complexClient,
@Qualifier("fastTaskClient") ChatClient fastClient) {
this.complexClient = complexClient;
this.fastClient = fastClient;
}
}Observability
Spring AI integrates with Micrometer for observability. You get traces, metrics, and spans for AI interactions out of the box. In a Spring Boot app with Actuator:
properties
management.tracing.sampling.probability=1.0
spring.ai.chat.observations.include-prompt=true
spring.ai.chat.observations.include-completion=truePair this with a Zipkin or Grafana Tempo backend and you can trace exactly how much time your application is spending waiting on AI models — which is invaluable for performance tuning.
Best Practices Summary
- Externalize AI configuration (API keys, model names, temperatures) to application.properties and never hardcode them.
- Use @Configuration beans for ChatClient setup to centralize defaults and keep controllers clean.
- Keep system prompts versioned alongside your application code — they are as important as any business logic.
- Monitor token usage in production via Micrometer metrics. Uncontrolled prompt growth is a hidden cost driver.
- Prefer streaming for user-facing chat UI — it dramatically improves perceived performance.
- Test your prompts like code — Spring AI provides MockChatModel utilities for unit testing AI-driven flows without hitting real endpoints.
- Use Ollama locally during development to avoid API costs and work offline.

Conclusion
Spring AI genuinely delivers on its promise. As a Java/Spring Boot developer, you don’t have to abandon the ecosystem you know or learn Python to build serious AI-powered applications. The portability model, fluent API, and first-class RAG and memory support cover the most common patterns you’ll need.
The framework is still maturing version 1.1.2 is stable, and 2.0 is in preview so it’s worth keeping an eye on the release notes. But for internal tooling, developer productivity applications, and document Q&A systems, it’s absolutely ready to use today.
Start simple: one model, one endpoint, one system prompt. Get that working, then layer in RAG, memory, and tool calling. The framework scales well as your use case grows.
































