Spring AI Explained: Architecture, RAG, ChatClient & Real-World Examples

If you’ve been keeping an eye on the Java ecosystem lately, you’ve probably noticed that AI integration is no longer just a Python story. Spring AI is the framework’s answer to that.

The goal of this blog is to share what Spring AI is, how it’s architected, and what it actually looks like to build something real with it. Whether you’re evaluating it for an internal project or just curious about how the Spring team approached AI, I hope this gives you a solid foundation.

What Is Spring AI?

At its heart, Spring AI is an application framework for AI engineering. It draws inspiration from Python projects like LangChain and LlamaIndex, but it isn’t a port of them. The Spring team built something that feels native to the Java and Spring Boot ecosystem — with all the familiar patterns like dependency injection, auto-configuration, and portable abstractions.

The core problem it solves is this: how do you connect your enterprise data and APIs with AI models without rebuilding plumbing from scratch every time?

Spring AI gives you:

A unified, portable API across major AI providers (OpenAI, Anthropic, Azure, Google, Ollama, and more)
Spring Boot auto-configuration so setup is mostly just adding a dependency and an API key
Vector database integrations out of the box (PGVector, MongoDB Atlas, Redis, Pinecone, Weaviate, and many more)
RAG (Retrieval Augmented Generation) support built right into the framework
Tool/Function Calling so your AI model can trigger real application logic
Structured Outputs mapped directly to Java POJOs
Observability via Micrometer for monitoring AI interactions in production

Architecture Overview

Before diving into code, it’s worth understanding how the pieces fit together.

User Request
    │
    ▼
ChatClient (Fluent API)
    │
    ├── Advisors (pre/post processing)
    │       ├── QuestionAnswerAdvisor (RAG)
    │       ├── MessageChatMemoryAdvisor (Memory)
    │       └── SimpleLoggerAdvisor (Logging)
    │
    ├── PromptTemplate (variable substitution)
    │
    ▼
ChatModel (portable abstraction)
    │
    ├── OpenAI
    ├── Anthropic
    ├── Azure OpenAI
    ├── Google Gemini
    └── Ollama (local)

The key insight here is that ChatModel is just an interface. You write your application code against that interface, and swapping from OpenAI to Anthropic (or to a locally hosted Ollama model) is mostly a config change. That kind of portability is something Python developers often have to build themselves.<h2>Setting Up Your First Spring AI Project</h2>

Step 1: Add Dependencies

Head over to start.spring.io and add:

Spring Web
Spring AI OpenAI (or whichever provider you’re using)

Or manually add to your pom.xml:

xml
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

Also add the Spring AI BOM for version management:

xml
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.1.2</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

Step 2: Configure Your API Key

In application.properties:

properties

spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4o

Security tip: Never hardcode API keys. Use environment variables or Spring Cloud Config / Vault in production.

Step 3: Your First AI-Powered Endpoint

java

@RestController
public class ChatController {

    private final ChatClient chatClient;

    public ChatController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    @GetMapping("/chat")
    public String chat(@RequestParam String message) {
        return chatClient.prompt()
                .user(message)
                .call()
                .content();
    }
}

That’s it. No boilerplate HTTP client setup, no JSON wrangling. The ChatClient.Builder is auto-configured by Spring Boot, and the fluent API feels familiar if you’ve used WebClient or RestClient.

Key Concepts You Need to Know

1. ChatClient — The Entry Point

ChatClient is the primary API you’ll interact with. It has a fluent builder-style interface that lets you compose your prompt, add advisors, configure options, and choose how you want to receive the response.

java

// Simple string response
String response = chatClient.prompt()
        .user("Explain microservices in 2 sentences")
        .call()
        .content();

// Rich response with metadata
ChatResponse chatResponse = chatClient.prompt()
        .user("Tell me a joke")
        .call()
        .chatResponse();

// Stream tokens as they arrive
Flux<String> stream = chatClient.prompt()
        .user("Write a short story")
        .stream()
        .content();

One thing that tripped me up initially: call() doesn’t actually execute the request. It just tells Spring AI whether to use synchronous or streaming mode. The actual model call happens when you chain .content(), .chatResponse(), or .entity().

2. Prompt Templates

Hard-coding prompt strings is fine for demos but quickly gets messy. Spring AI has first-class support for prompt templates with variable substitution:

java

String answer = ChatClient.create(chatModel).prompt()
        .user(u -> u
                .text("Summarize the following topic in {language}: {topic}")
                .param("language", "Hindi")
                .param("topic", "Kubernetes"))
        .call()
        .content();

By default it uses the StringTemplate engine. If your prompts contain JSON, you can switch delimiters to avoid conflicts:

java

.templateRenderer(
    StTemplateRenderer.builder()
        .startDelimiterToken('<')
        .endDelimiterToken('>')
        .build()
)

3. Structured Outputs — AI Response as a Java Object

One of the more practical features. Instead of parsing a raw string, you can map the AI’s response directly to a Java record or class:

java

record ProductRecommendation(String name, String reason, double priceEstimate) {}

ProductRecommendation recommendation = chatClient.prompt()
        .user("Recommend a laptop for a Java developer who values performance")
        .call()
        .entity(ProductRecommendation.class);

System.out.println(recommendation.name()); // e.g., "MacBook Pro M3"
Spring AI handles the format instructions and parsing under the hood. For a list of objects, use ParameterizedTypeReference:

java

List<ProductRecommendation> recommendations = chatClient.prompt()
        .user("Give me 3 laptop recommendations for a Java developer")
        .call()
        .entity(new ParameterizedTypeReference<List<ProductRecommendation>>() {});

4. Advisors — Middleware for AI Calls

Advisors are one of the most powerful concepts in Spring AI. Think of them as a middleware chain that wraps your AI interactions. They can inspect, modify, or enrich the request before it hits the model, and post-process the response on the way back.

Spring AI ships with several built-in advisors:

Chat Memory Advisor — gives your bot a conversation history:

java

ChatMemory memory = new MessageWindowChatMemory();

String response = chatClient.prompt()
        .advisors(
            MessageChatMemoryAdvisor.builder(memory)
                .conversationId("user-session-123")
                .build()
        )
        .user("My name is Rohan")
        .call()
        .content();

// In a follow-up call, it will remember "Rohan"
String followUp = chatClient.prompt()
        .advisors(
            MessageChatMemoryAdvisor.builder(memory)
                .conversationId("user-session-123")
                .build()
        )
        .user("What's my name?")
        .call()
        .content();

Question Answer Advisor (RAG) — augments the prompt with relevant documents from your vector store:

java

String answer = chatClient.prompt()
        .advisors(
            QuestionAnswerAdvisor.builder(vectorStore).build()
        )
        .user("What is our refund policy?")
        .call()
        .content();

SimpleLoggerAdvisor — logs request/response for debugging:

java

chatClient.prompt()
        .advisors(new SimpleLoggerAdvisor())
        .user("Hello!")
        .call()
        .content();

Add this to application.properties to see the logs:

properties

logging.level.org.springframework.ai.chat.client.advisor=DEBUG

The order in which advisors are added matters. They execute in the order added, so put memory advisors before RAG advisors if you want memory context to influence the retrieval.

Evaluate Spring AI For Your Enterprise Use Case, Book A Strategy Call.

5. RAG — Retrieval Augmented Generation

RAG is the pattern where you take a user’s question, search your own knowledge base for relevant content, and include that content in the prompt before sending it to the model. This lets you build a chatbot that answers questions about your data — not just what the model was trained on.

The Spring AI RAG pipeline has three stages:

Ingest (ETL) Load your documents, split them into chunks, generate embeddings, and store them in a vector database:

java

@Bean
public ApplicationRunner ingestDocuments(
        VectorStore vectorStore,
        ResourcePatternResolver resolver) {
    return args -> {
        Resource[] resources = resolver.getResources("classpath:/docs/*.pdf");
        List<Document> docs = new TokenTextSplitter().apply(
                new TikaDocumentReader(resources[0]).get()
        );
        vectorStore.add(docs);
    };
}

Retrieve At query time, the QuestionAnswerAdvisor automatically retrieves the top-k most relevant chunks from the vector store.

Generate The retrieved chunks are injected into the prompt as context, and the LLM generates a grounded answer.

java

@RestController
public class DocumentQAController {

    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    public DocumentQAController(ChatClient.Builder builder, VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        this.chatClient = builder
                .defaultAdvisors(QuestionAnswerAdvisor.builder(vectorStore).build())
                .build();
    }

    @GetMapping("/ask")
    public String ask(@RequestParam String question) {
        return chatClient.prompt()
                .user(question)
                .call()
                .content();
    }
}

6. Tool/Function Calling

Tool calling lets the AI model invoke your application’s methods to fetch real-time data or trigger actions. The model decides when to call a tool based on the conversation context — you just declare the tools.

java

@Bean
@Description("Get current stock price for a given ticker symbol")
public Function<StockRequest, StockResponse> getStockPrice() {
    return request -> {
        // Call your actual stock price service here
        double price = stockService.getPrice(request.ticker());
        return new StockResponse(request.ticker(), price);
    };
}

record StockRequest(String ticker) {}
record StockResponse(String ticker, double price) {}

Then reference the function by name in your ChatClient call:

java

String response = chatClient.prompt()
        .functions("getStockPrice")
        .user("What is the current price of INFY?")
        .call()
        .content();

The model will call your function automatically if it determines it needs that information to answer.

Setting Up Defaults — A Clean Pattern for Production

Rather than configuring system prompts and advisors in every controller, a cleaner approach is to set defaults at the ChatClient bean level:

java

@Configuration
public class AiConfig {

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder, VectorStore vectorStore) {
        return builder
                .defaultSystem("""
                        You are a helpful assistant for our internal developer portal.
                        Always respond in a professional tone.
                        If you don't know something, say so clearly.
                        """)
                .defaultAdvisors(
                        QuestionAnswerAdvisor.builder(vectorStore).build(),
                        new SimpleLoggerAdvisor()
                )
                .build();
    }
}

Controllers can then inject this pre-configured bean and just call .user() without worrying about the rest.

Working with Multiple AI Providers

There are cases where you might want to use different models for different tasks — maybe GPT-4o for complex reasoning and a faster/cheaper model for classification. Spring AI handles this well.

First, disable auto-configuration of a single ChatClient.Builder:

properties

spring.ai.chat.client.enabled=false

Then define your beans explicitly:

java

@Configuration
public class MultiModelConfig {

    @Bean("complexTaskClient")
    public ChatClient complexTaskClient(OpenAiChatModel model) {
        return ChatClient.builder(model)
                .defaultOptions(OpenAiChatOptions.builder()
                        .model("gpt-4o")
                        .temperature(0.2)
                        .build())
                .build();
    }

    @Bean("fastTaskClient")
    public ChatClient fastTaskClient(OpenAiChatModel model) {
        return ChatClient.builder(model)
                .defaultOptions(OpenAiChatOptions.builder()
                        .model("gpt-4o-mini")
                        .temperature(0.7)
                        .build())
                .build();
    }
}

Inject using @Qualifier:

java

@Service
public class AiService {

    private final ChatClient complexClient;
    private final ChatClient fastClient;

    public AiService(
            @Qualifier("complexTaskClient") ChatClient complexClient,
            @Qualifier("fastTaskClient") ChatClient fastClient) {
        this.complexClient = complexClient;
        this.fastClient = fastClient;
    }
}

Observability

Spring AI integrates with Micrometer for observability. You get traces, metrics, and spans for AI interactions out of the box. In a Spring Boot app with Actuator:

properties

management.tracing.sampling.probability=1.0
spring.ai.chat.observations.include-prompt=true
spring.ai.chat.observations.include-completion=true

Pair this with a Zipkin or Grafana Tempo backend and you can trace exactly how much time your application is spending waiting on AI models — which is invaluable for performance tuning.

Best Practices Summary

Externalize AI configuration (API keys, model names, temperatures) to application.properties and never hardcode them.
Use @Configuration beans for ChatClient setup to centralize defaults and keep controllers clean.
Keep system prompts versioned alongside your application code — they are as important as any business logic.
Monitor token usage in production via Micrometer metrics. Uncontrolled prompt growth is a hidden cost driver.
Prefer streaming for user-facing chat UI — it dramatically improves perceived performance.
Test your prompts like code — Spring AI provides MockChatModel utilities for unit testing AI-driven flows without hitting real endpoints.
Use Ollama locally during development to avoid API costs and work offline.

Conclusion

Spring AI genuinely delivers on its promise. As a Java/Spring Boot developer, you don’t have to abandon the ecosystem you know or learn Python to build serious AI-powered applications. The portability model, fluent API, and first-class RAG and memory support cover the most common patterns you’ll need.

The framework is still maturing version 1.1.2 is stable, and 2.0 is in preview so it’s worth keeping an eye on the release notes. But for internal tooling, developer productivity applications, and document Q&A systems, it’s absolutely ready to use today.

Start simple: one model, one endpoint, one system prompt. Get that working, then layer in RAG, memory, and tool calling. The framework scales well as your use case grows.

Who We Are

What We Do

Solutions

Resources

Partners

Getting Started with Spring AI: Concepts & Architecture

What Is Spring AI?

Architecture Overview

Step 1: Add Dependencies

Step 2: Configure Your API Key

Step 3: Your First AI-Powered Endpoint

Key Concepts You Need to Know

1. ChatClient — The Entry Point

2. Prompt Templates

3. Structured Outputs — AI Response as a Java Object

4. Advisors — Middleware for AI Calls

Evaluate Spring AI For Your Enterprise Use Case, Book A Strategy Call.

5. RAG — Retrieval Augmented Generation

6. Tool/Function Calling

Setting Up Defaults — A Clean Pattern for Production

Working with Multiple AI Providers

Observability

Best Practices Summary

Conclusion

Rohit Kumar

Read More Similar Blogs

Understanding Pagination in Epic FHIR APIs: A Developer’s Guide

TypeScript: Using Typia for Runtime Validation in a NestJS API

Real-World Cerner FHIR API Troubleshooting: Errors, Version Conflicts, and Clinical Document Handling

Let’s Transform
Healthcare,Together.

Location

Contact

Contact form

What Is Spring AI?

Architecture Overview

Step 1: Add Dependencies

Step 2: Configure Your API Key

Step 3: Your First AI-Powered Endpoint

Key Concepts You Need to Know

1. ChatClient — The Entry Point

2. Prompt Templates

3. Structured Outputs — AI Response as a Java Object

4. Advisors — Middleware for AI Calls

Evaluate Spring AI For Your Enterprise Use Case, Book A Strategy Call.

5. RAG — Retrieval Augmented Generation

6. Tool/Function Calling

Setting Up Defaults — A Clean Pattern for Production

Working with Multiple AI Providers

Observability

Best Practices Summary

Conclusion

Rohit Kumar

Read More Similar Blogs

Understanding Pagination in Epic FHIR APIs: A Developer’s Guide

TypeScript: Using Typia for Runtime Validation in a NestJS API

Real-World Cerner FHIR API Troubleshooting: Errors, Version Conflicts, and Clinical Document Handling

Let’s TransformHealthcare,Together.

Location

Contact

Contact form

Let’s Transform
Healthcare,Together.