In today’s world, building an intelligent and context-aware search system has become a priority for businesses across various industries. Traditional search solutions based on keywords often fail to capture the intent behind user queries. However, by combining OpenSearch with OpenAI embeddings, we can create a search system that understands the meaning of the query, enabling more relevant results. This guide will show you how to integrate OpenSearch with OpenAI embeddings in Java to build an efficient, semantic search system.
OpenSearch is an open-source, distributed search and analytics engine designed for large-scale, high-performance searches. It is a fork of Elasticsearch, with enhanced features and improved security, maintained by the community and Amazon Web Services (AWS). OpenSearch allows you to store, search, and analyze large volumes of data quickly and in real-time. It’s widely used for log and event data analysis, full-text search, and application monitoring.
Key Features of OpenSearch:
OpenAI is an artificial intelligence research and deployment company that provides advanced machine learning models, including language models like GPT (Generative Pretrained Transformer). OpenAI’s API offers access to powerful models capable of understanding and generating human-like text, as well as embeddings that represent text as high-dimensional vectors.
OpenAI Embeddings are vector representations of text that capture semantic meaning, making it easier to compare, search, and analyze text data. These embeddings allow for more relevant and accurate search results based on the meaning of words or sentences, not just keyword matching.
Key Features of OpenAI:
OpenAI embeddings combined with OpenSearch provide a powerful solution for semantic search, enabling systems to understand and return results based on meaning rather than keywords alone. Here’s why combining OpenAI with OpenSearch makes sense:
Traditional search engines rely on keyword matching, but OpenAI embeddings understand the context and meaning of words, allowing for semantically relevant search results. For example, a query about “summer jackets” could return results for “lightweight coats” or “rain jackets” if they are contextually relevant.
OpenSearch’s k-NN (k-nearest neighbor) plugin is optimized for vector searches, which is ideal for the high-dimensional data generated by OpenAI embeddings. This allows for fast and scalable similarity searches in large datasets.
By storing OpenAI embeddings in OpenSearch, you can perform real-time semantic searches. As users enter queries, their embeddings are compared to the indexed embeddings in OpenSearch, enabling quick retrieval of the most relevant documents.
OpenSearch provides horizontal scalability, meaning it can efficiently handle an ever-growing number of documents with embeddings, while OpenAI offers a deep semantic understanding of text data. Together, they provide an ideal setup for scalable and intelligent search.
When combining OpenAI’s understanding of natural language with OpenSearch’s powerful search engine, you can build systems that provide users with highly relevant, accurate, and meaningful search results, leading to a better overall user experience.
Example 1: E-commerce Product Search
Imagine an e-commerce store where users search for products not by name but by intent. With OpenAI embeddings, a user query such as “I need a warm winter jacket” can be semantically matched with products like “Down jacket” or “Winter coat”, even if those exact words don’t appear in the product descriptions.
Example 2: Document Retrieval
For knowledge management or customer support systems, OpenAI embeddings can help find the most relevant documents or responses based on the content’s meaning, not just keyword matches.
Example 3: Customer Feedback Analysis
OpenAI embeddings can be used to analyze customer feedback by matching similar reviews or comments, helping businesses identify recurring issues or popular products.
OpenAI provides APIs to generate embeddings, which are high-dimensional vector representations of text. These embeddings capture the semantic meaning of text, enabling advanced tasks like semantic search. For example:
OpenSearch’s k-NN (k-nearest neighbor) plugin allows the storage and retrieval of these vectors. When a query is received, its embedding is calculated, and OpenSearch performs a similarity search to return the closest matches.
Since OpenAI does not offer a dedicated Java client, we will use an HTTP client to interact with the OpenAI API. Here’s how to generate embeddings for text using OpenAI’s API in Java.
@Service
public class OpenAIEmbeddingService {
private static final String OPENAI_API_KEY = "your_openai_api_key";
private static final String OPENAI_API_URL = "https://api.openai.com/v1/embeddings";
public JSONArray getEmbedding(String text) throws IOException {
CloseableHttpClient client = HttpClients.createDefault();
HttpPost postRequest = new HttpPost(OPENAI_API_URL);
JSONObject payload = new JSONObject();
payload.put("input", text);
payload.put("model", "text-embedding-ada-002");
StringEntity entity = new StringEntity(payload.toString());
postRequest.setEntity(entity);
postRequest.setHeader("Authorization", "Bearer " + OPENAI_API_KEY);
postRequest.setHeader("Content-Type", "application/json");
try (CloseableHttpResponse response = client.execute(postRequest)) {
String responseStr = EntityUtils.toString(response.getEntity());
JSONObject jsonResponse = new JSONObject(responseStr);
JSONArray embeddings = jsonResponse.getJSONArray("data");
return embeddings.getJSONObject(0).getJSONArray("embedding");
}
}
}
While the above example uses the Apache HTTP client, there are multiple ways to call OpenAI APIs, depending on your requirements and preferred tools.
For more details on embedding generation, payload specifications, and other features of OpenAI APIs, refer to the official documentation.
Now let’s implement the service layer for interacting with OpenSearch. We will manage document creation, deletion, updating, and searching.
To integrate with OpenSearch, include the following dependency in your:
For Maven
<dependency>
<groupId>org.opensearch.client</groupId>
<artifactId>opensearch-java</artifactId>
<version>2.10.4</version>
</dependency>
For Gradle
dependencies {
implementation 'org.opensearch.client:opensearch-java:2.10.4'
}
Below are some example methods that demonstrate common operations for handling indexing in OpenSearch. These methods can be implemented based on your specific requirements:
This method creates an index with the provided data document.
public void createIndex(Long entityId, String indexName, IndexData document) {
IndexRequest<IndexData> indexRequest = new IndexRequest.Builder<IndexData>()
.index(indexName)
.document(document)
.build();
openSearchClient.index(indexRequest);
}
This method deletes an index by its ID.
public void deleteIndex(String indexId, String indexName) {
if (indexId != null) {
openSearchClient.delete(d -> d.index(indexName).id(indexId));
}
}
This method updates an existing index document by ID.
public void updateIndex(String indexId, String indexName, IndexData document) {
UpdateRequest<IndexData, IndexData> updateRequest = new UpdateRequest.Builder<IndexData, IndexData>()
.index(indexName)
.id(indexId)
.doc(document)
.build();
openSearchClient.update(updateRequest, IndexData.class);
}
This method searches for documents in an OpenSearch index using a query.
public <T> List<Hit<T>> searchIndexDocumentByIndexQueryAndIndexName(Query query, String indexName, Class<T> clazz) {
SearchRequest searchRequest = new SearchRequest.Builder()
.index(indexName)
.query(query)
.build();
SearchResponse<T> searchResponse = openSearchClient.search(searchRequest, clazz);
HitsMetadata<T> hits = searchResponse.hits();
return hits.hits();
}
The above methods are provided as examples and can be modified or expanded to suit your specific use cases. OpenSearch offers a wide range of functionalities beyond these examples. For more detailed information and additional methods available in the OpenSearch Java client, refer to the official documentation.
If your OpenSearch instance is hosted on AWS, you need to authenticate requests using AWS credentials. Here’s how to set up the connection:
@Bean
public OpenSearchClient openSearchClient() {
AwsBasicCredentials awsCreds = AwsBasicCredentials.create(accessKeyId, secretKey);
StaticCredentialsProvider credentialsProvider = StaticCredentialsProvider.create(awsCreds);
SdkHttpClient httpClient = ApacheHttpClient.builder().build();
return new OpenSearchClient(
new AwsSdk2Transport(
httpClient,
openSearchEndpoint,
Region.US_EAST_1,
AwsSdk2TransportOptions.builder()
.setCredentials(credentialsProvider)
.build()
)
);
}
Using embeddings in OpenSearch makes search queries faster and more meaningful by leveraging vector representations of text for semantic understanding. This example demonstrates a flow where we index and search for company data using OpenAI embeddings and OpenSearch.
Imagine you have a database of companies, each with a name and description. The goal is to allow users to search for companies based on a query, not just by matching keywords but by understanding the semantic meaning of their query. For example, if the user searches for “software startups,” the system can return companies that match this intent, even if the exact words “software startups” are not in their descriptions.
To achieve this, we will:
In this step, we take the descriptions of each company and convert them into vector embeddings using OpenAI’s API. These embeddings capture the semantic meaning of the text.
JSONArray embedding = openAIEmbeddingService.getEmbedding(companyDescription);
IndexData indexData = new IndexData(companyName, embedding.toList());
Here, getEmbedding is a method that sends the text to OpenAI’s API and retrieves the vector embedding. The IndexData object contains the company name and its embedding, ready for indexing.
The next step is to store the company data, including the generated embedding, into an OpenSearch index. This prepares the data for efficient vector-based searches.
openSearchService.createIndex(companyId, "company-index", indexData);
When the user enters a search query, we convert it into an embedding using OpenAI. This embedding represents the meaning of the query and will be compared to the stored embeddings.
JSONArray queryEmbedding = openAIEmbeddingService.getEmbedding(userQuery);
Using the query embedding, we perform a k-NN (k-nearest neighbors) search in OpenSearch. This finds the stored embeddings closest to the query embedding, retrieving the most semantically relevant companies.
List<Hit<IndexData>> results = openSearchService.searchIndexDocumentByIndexQueryAndIndexName(
new Query.Builder().knn(q -> q.field("embedding").vector(queryEmbedding.toList()).k(5)).build(),"company-index", IndexData.class);
Below is the complete flow demonstrating how embeddings and OpenSearch work together:
1. Indexing Company Data
2. Processing User Query
3. Searching for Results
This example demonstrates one way of using embeddings with OpenSearch for semantic search. However, there are many other approaches you could follow, such as:
Using OpenSearch with embeddings unlocks the potential for advanced, semantic search capabilities, offering intent-based, real-time, and scalable search solutions. While it excels in handling large datasets and diverse use cases, it requires careful consideration of storage, computational costs, and setup complexity. Proper integration and optimization make it a powerful choice for modern search applications.
The team at Mindbowser was highly professional, patient, and collaborative throughout our engagement. They struck the right balance between offering guidance and taking direction, which made the development process smooth. Although our project wasn’t related to healthcare, we clearly benefited...
Founder, Texas Ranch Security
Mindbowser played a crucial role in helping us bring everything together into a unified, cohesive product. Their commitment to industry-standard coding practices made an enormous difference, allowing developers to seamlessly transition in and out of the project without any confusion....
CEO, MarketsAI
I'm thrilled to be partnering with Mindbowser on our journey with TravelRite. The collaboration has been exceptional, and I’m truly grateful for the dedication and expertise the team has brought to the development process. Their commitment to our mission is...
Founder & CEO, TravelRite
The Mindbowser team's professionalism consistently impressed me. Their commitment to quality shone through in every aspect of the project. They truly went the extra mile, ensuring they understood our needs perfectly and were always willing to invest the time to...
CTO, New Day Therapeutics
I collaborated with Mindbowser for several years on a complex SaaS platform project. They took over a partially completed project and successfully transformed it into a fully functional and robust platform. Throughout the entire process, the quality of their work...
President, E.B. Carlson
Mindbowser and team are professional, talented and very responsive. They got us through a challenging situation with our IOT product successfully. They will be our go to dev team going forward.
Founder, Cascada
Amazing team to work with. Very responsive and very skilled in both front and backend engineering. Looking forward to our next project together.
Co-Founder, Emerge
The team is great to work with. Very professional, on task, and efficient.
Founder, PeriopMD
I can not express enough how pleased we are with the whole team. From the first call and meeting, they took our vision and ran with it. Communication was easy and everyone was flexible to our schedule. I’m excited to...
Founder, Seeke
We had very close go live timeline and Mindbowser team got us live a month before.
CEO, BuyNow WorldWide
If you want a team of great developers, I recommend them for the next project.
Founder, Teach Reach
Mindbowser built both iOS and Android apps for Mindworks, that have stood the test of time. 5 years later they still function quite beautifully. Their team always met their objectives and I'm very happy with the end result. Thank you!
Founder, Mindworks
Mindbowser has delivered a much better quality product than our previous tech vendors. Our product is stable and passed Well Architected Framework Review from AWS.
CEO, PurpleAnt
I am happy to share that we got USD 10k in cloud credits courtesy of our friends at Mindbowser. Thank you Pravin and Ayush, this means a lot to us.
CTO, Shortlist
Mindbowser is one of the reasons that our app is successful. These guys have been a great team.
Founder & CEO, MangoMirror
Kudos for all your hard work and diligence on the Telehealth platform project. You made it possible.
CEO, ThriveHealth
Mindbowser helped us build an awesome iOS app to bring balance to people’s lives.
CEO, SMILINGMIND
They were a very responsive team! Extremely easy to communicate and work with!
Founder & CEO, TotTech
We’ve had very little-to-no hiccups at all—it’s been a really pleasurable experience.
Co-Founder, TEAM8s
Mindbowser was very helpful with explaining the development process and started quickly on the project.
Executive Director of Product Development, Innovation Lab
The greatest benefit we got from Mindbowser is the expertise. Their team has developed apps in all different industries with all types of social proofs.
Co-Founder, Vesica
Mindbowser is professional, efficient and thorough.
Consultant, XPRIZE
Very committed, they create beautiful apps and are very benevolent. They have brilliant Ideas.
Founder, S.T.A.R.S of Wellness
Mindbowser was great; they listened to us a lot and helped us hone in on the actual idea of the app. They had put together fantastic wireframes for us.
Co-Founder, Flat Earth
Ayush was responsive and paired me with the best team member possible, to complete my complex vision and project. Could not be happier.
Founder, Child Life On Call
The team from Mindbowser stayed on task, asked the right questions, and completed the required tasks in a timely fashion! Strong work team!
CEO, SDOH2Health LLC
Mindbowser was easy to work with and hit the ground running, immediately feeling like part of our team.
CEO, Stealth Startup
Mindbowser was an excellent partner in developing my fitness app. They were patient, attentive, & understood my business needs. The end product exceeded my expectations. Thrilled to share it globally.
Owner, Phalanx
Mindbowser's expertise in tech, process & mobile development made them our choice for our app. The team was dedicated to the process & delivered high-quality features on time. They also gave valuable industry advice. Highly recommend them for app development...
Co-Founder, Fox&Fork