Hibernate Search 6 With Spring Boot

This article will go through Hibernate Search 6, set it up, and some sample queries in this tutorial.

For better understanding, let’s take a look at the basics first.

For any web application or business to provide superior service, your users need to be able to search quickly for their preferred service or product. Delay in retrieving information leads to poor user experience.

Hibernate Search can be used to create search experiences like what one would expect from Google or e-commerce platforms like Flipkart, Amazon, etc.

Why Hibernate Search?

When it comes to the huge dataset and where the data is scattered among multiple tables, relational databases work comparatively slow, leading to slower fetching of search results through queries from the database. RDBMS can be optimized, but there are certain limitations.

When we search using full-text search, similar words, and keywords, Hibernate search provides many features.

Hibernate Search is used to implement “full-text search”, such as matching free text input provided by the users from the search box.

We just need to tell Hibernate Search which entities to index by using some annotations.

Hibernate Search provides you with both Lucene and ElasticSearch implementations that are highly optimized for full-text search.

We will discuss Hibernate Search implementation with Lucene.

The below diagram of Lucene flow explains the workflow of Hibernate search query working with Lucene and indexes.

Lucene flowFig: Lucene flow diagram

Configurations

  • Maven Dependencies

Before getting started we need to add the necessary dependencies to our pom.xml

<dependency>

   <groupId>org.Hibernate.search</groupId>

   <artifactId>Hibernate-search-mapper-orm</artifactId>

   <version>6.0.2.Final</version>

</dependency>

<dependency>

   <groupId>org.Hibernate.search</groupId>

   <artifactId>Hibernate-search-backend-lucene</artifactId>

   <version>6.0.2.Final</version>

</dependency>
  • Application Properties file

If you want to store indexes in the specified path, you can below the property with a path for storing indexes.

spring.jpa.properties.Hibernate.search.backend.directory.root=/home/indexes/

Important Terms Related to Hibernate Search

Before going ahead, let’s see some important terms.

  • Text and Keyword

The primary difference between text and keyword is text can be tokenized while keyword cannot.

We can use the keyword type to perform some sorting and filtering operations on an entity.

Suppose we have a String field called message and its value as “Welcome to Hibernate Search’.

In that case, If we choose a message as a text type then we will be able to tokenize it [‘Welcome’,’ to’,’ Hibernate’,’ Search’] and we can perform a search using any word.

However, if we make it a keyword type, we can only find a match if we pass the entire text.

  • Analyzers and Tokenizers

The analyzer is how text and keywords are supposed to be processed before indexing and searching them. The default analyzer is a good fit for most languages, but it is not very advanced. To get most of the analyzer, you will need to define a custom analyzer by using a filter and tokenizer factory.

An example, let’s say one of your entities has the title “Refactoring: Improving the Design of Existing Code“ and you want to hit for any of the following search terms: “Refactors”, “refactored” and “refactoring”. Using an analyzer with the following components is one approach to accomplish this:

  • A “standard” tokenizer, which splits words at whitespaces, punctuation characters, and hyphens. It is a good general-purpose tokenizer.
  • Every character is converted to lowercase using a “lowercase” filter.
  • A “snowball” filter, which applies language-specific stemming.

Normalizers are identical to analyzers, except that normalizers do not use a tokenizer. The below diagram shows the sample example of how the string is tokenized and then indexed.

How Analyzer WorksFig: How does the analyzer work?

We can use either an analyzer or a normalizer in a particular field.

Preparing Entities For Indexing

As mentioned above, we just need to annotate the entities and their fields with a couple of annotations.

Let’s have a look at those annotations.

@Indexed annotation

@Entity

@Indexed(index = "post_index")

class Post {

  ....

}

We make this entity eligible for indexing. The index name is not required. By default, Hibernate Search uses the fully qualified class name as the index name by default.

@Analyzer
@FullTextField(analyzer = “custom_analyzer”)
private String message;

The property is mapped to a full-text index field with the same name and type using FullTextField. Full-text fields are tokenized and broken down into tokens. Here we have added a custom analyzer for tokenizing strings into different parts.

In Hibernate search 6 we have to create a class that will implement the LuceneAnalysisConfigurer interface. After that, we can use the same analyzer with different entities using the analyzer name.

MyLuceneAnalysisConfigurer.java

@Configuration

public class MyLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {

@Override

public void configure(LuceneAnalysisConfigurationContext context) {

context.analyzer("custom_analyzer").custom().tokenizer(StandardTokenizerFactory.class)

.tokenFilter(LowerCaseFilterFactory.class).tokenFilter(SnowballPorterFilterFactory.class)

.param("language", "English").tokenFilter(ASCIIFoldingFilterFactory.class);

}

}

We have added only one custom analyzer method with the name “custom_analyzer”. We can add any number of analyzers with different names. As mentioned in the analyzer section we can add different filter factory classes for the tokenizing string as per our requirement.

@IndexedEmbedded

@Entity

@Indexed(index = "idx_post")

class Post {

  ...

  @ManyToOne

  @IndexedEmbedded

  private User user;  

  ...

}

We use @IndexedEmbedded when we want to perform a search over nested object fields. For instance, let’s say we want to search for all posts made by a user with the first name is ‘Jack’. You have to use “user.name” in the field name to search using the user name.

These are the basic annotations that are mostly used.

Loading Existing Data Into Hibernate Search

Suppose If you have a web application with a huge database and you want to use Hibernate search in it, then there will be a question of how to add data to Hibernate index. MassIndexer does the job for us. We need to add MassIndexer at the start of our application.

Let’s create a class that will load all database records of indexed entities into Hibernate search indexes.

HibernateSearchIndexBuild.java

@Configuration

public class HibernateSearchIndexBuild implements ApplicationListener<ApplicationReadyEvent> {

private Logger logger = LogManager.getLogger();

@Autowired

private EntityManager entityManager;

@Override

@Transactional

public void onApplicationEvent(ApplicationReadyEvent event) {

logger.info("Started Initializing Indexes");

SearchSession searchSession = Search.session(entityManager);

MassIndexer indexer = searchSession.massIndexer().idFetchSize(150).batchSizeToLoadObjects(25)

.threadsToLoadObjects(12);

try {

indexer.startAndWait();

} catch (InterruptedException e) {

logger.warn(“Failed to load data from database”);

Thread.currentThread().interrupt();

}

logger.info("Completed Indexing");

}

}

This is going to be a one-time thing. After then, Hibernate Search will maintain both sources’ entities in sync. Unless, of course, our database is out of sync with Hibernate Search for some reason.

Performing Queries

  • Basic Search Query

Now let’s say we want to write a query to fetch all records from post_index where the title contains the word “hello”.

SearchSession searchSession = Search.session( entityManager ); 

SearchResult<Post> result = searchSession.search(Post.class ) 

        .where( f -> f.match() 

                .field( "title" )

                .matching( "hello" ) )

        .fetchAll(); 

long totalHitCount = result.total().hitCount(); 

List<Post> hits = result.hits();

Let’s go through this code example:

  1. The EntityManager can provide you with a Hibernate Search session called SearchSession.
  2. Initiate a search query on the index map to the Post entity.
  3. Define which documents should be returned only if they fit the provided criteria.
  4. Build the query and fetch all the results.
  5. Retrieve the total number of matching entities. See Fetching the total (hit count, …​) for ways to optimize the computation of the total hit count.
  6. Retrieve matching entities.

One thing to note here is that although we are performing a query on Hibernate Search, Hibernate will still fire a query on the database to fetch the full entity. Because we didn’t store all the fields of the Post entity in the index and those fields still need to be retrieved.

  • Pagination And Sorting

When we don’t want to retrieve millions of records simultaneously, we will use pagination.

To perform pagination, we need two things: page offset and page size.

  1. Offset = zero-based-page-number * page-size
  2. Page size
SearchResult<Post> result = searchSession.search( Post.class )

        .where( f -> f.matchAll() )

        .sort( f -> f.field( "pageCount" ).desc())

        .fetch( 40, 20 );

The above query for 40 is offset and 20 is the result size. The query’s results should be ordered in decreasing order on the parameter “pageCount.”

  • Range Queries

The range predicate finds documents where a given field’s value falls inside a specified range.

List<Post> hits = searchSession.search( Post.class )

        .where( f -> f.range().field( "tags" )

                .between( 210, 250 ) )

        .fetchHits( 20 );

bool: combine predicates (and/or/..)

If a query includes different and/or clauses then writing the query is a complex process. The bool predicate matches documents that match one or more inner predicates, called “clauses”. Only must clauses in a bool predicate make it act like an AND operator.

  • Should Clause

If there are just should clauses in a bool predicate, it will behave as an OR operator.

List<Post> hits = searchSession.search( Post.class )

        .where( f -> f.bool()

                .should( f.match().field( "title" )

                        .matching( "robot" ) ) 

                .should( f.match().field( "description" )

                        .matching( "investigation" ) ) 

        )

        .fetchHits( 20 );

All returned hits will match at least one of the clauses above: they will have a title matching with “robot” or they will have a description matching with “investigation”.

  • Must Clause

A bool predicate with only must clauses will behave as an AND operator.

List<Post> hits = searchSession.search( Post.class )

        .where( f -> f.bool()

                .must( f.match().field( "title" )

                        .matching( "robot" ) ) 

                .must( f.match().field( "description" )

                        .matching( "investigation" ) ) 

        )

        .fetchHits( 20 );

All returned hits will match all of the clauses above: they will have a title matching with “robot” and they will have a description matching with “investigation”.

Further Reading

That’s it. I mean this is not everything, but I believe this enough to get you started. For further reading you can explore the following:

  1. Phrase Queries – For a matching sequence of words
  2. Wildcard Queries – Match a simple pattern
  3. Simple Query String – You can allow your platform to take queries directly from users.
  4. Within Query – Match points within a circle, box, polygon. When you want to work with geo points (latitude and longitude).

Key Points to Remember

  1. When you are using @IndexedEmbedded annotation for nested entities, make sure that relationship mapping must be bidirectional.
  2. Use MassIndexer to load objects from the database to the Hibernate index.
  3. Since indexed entities are in sync with hibernate search, it’s also updated in the stored index when you update the entity. Use the save() method to update or add the entity.
  4. Use Sharding, to improve performance when dealing with large amounts of data. Sharding is a process to split index data into multiple “smaller indexes”.

Github Example

You can refer to a working project on the GitHub repository by using the link below.
GitHub Repository

coma

Conclusion

In this article, we discussed the basics of Hibernate Search and important query types.

The more advanced topics can be found in the official documentation.

Thank you for reading! In case you have some questions feel free to comment below.

Rohit

Full Stack Developer (Java | Angular)

Rohit is having 2.8 years of experience as a Full Stack Developer. He has good knowledge of creating APIs, well-designed microservices, payment gateway integrations and web technology like Angular 4+. He is a passionate coder who writes clean, optimized code. In his free time, he does solve technical problems on hacker rank, StackOverflow, geeks for geeks etc.

Upcoming Webinar On "How To Achieve Project Success With Your Outsourced Team!"

Register Now

Get in touch for a detailed discussion.

Hear From Our 100+ Customers
coma

Mindbowser helped us build an awesome iOS app to bring balance to people’s lives.

author
ADDIE WOOTTEN
CEO, SMILINGMIND
coma

We had very close go live timeline and MindBowser team got us live a month before.

author
Shaz Khan
CEO, BuyNow WorldWide
coma

They were a very responsive team! Extremely easy to communicate and work with!

author
Kristen M.
Founder & CEO, TotTech
coma

We’ve had very little-to-no hiccups at all—it’s been a really pleasurable experience.

author
Chacko Thomas
Co-Founder, TEAM8s
coma

Mindbowser is one of the reasons that our app is successful. These guys have been a great team.

author
Dave Dubier
Founder & CEO, MangoMirror
coma

Mindbowser was very helpful with explaining the development process and started quickly on the project.

author
Hieu Le
Executive Director of Product Development, Innovation Lab
coma

The greatest benefit we got from Mindbowser is the expertise. Their team has developed apps in all different industries with all types of social proofs.

author
Alex Gobel
Co-Founder, Vesica
coma

Mindbowser is professional, efficient and thorough. 

author
MacKenzie R
Consultant at XPRIZE
coma

Very committed, they create beautiful apps and are very benevolent. They have brilliant Ideas.

author
Laurie Mastrogiani
Founder, S.T.A.R.S of Wellness
coma

MindBowser was great; they listened to us a lot and helped us hone in on the actual idea of the app.” “They had put together fantastic wireframes for us.

author
Bennet Gillogly
Co-Founder, Flat Earth
coma

They're very tech-savvy, yet humble.

author
Uma Nidmarty
CEO, GS Advisorate, Inc.
coma

Ayush was responsive and paired me with the best team member possible, to complete my complex vision and project. Could not be happier.

author
Katie Taylor
Founder, Child Life On Call
coma

As a founder of a budding start-up, it has been a great experience working with Mindbower Inc under Ayush's leadership for our online digital platform design and development activity.

author
Radhika Kotwal
Founder of Courtyardly
coma

The team from Mindbowser stayed on task, asked the right questions, and completed the required tasks in a timely fashion! Strong work team!

author
Michael Wright
Chief Executive Officer, SDOH2Health LLC
coma

They are focused, patient and; they are innovative. Please give them a shot if you are looking for someone to partner with, you can go along with Mindbowser.

author
David Cain
CEO, thirty2give
coma

We are a small non-profit on a budget and they were able to deliver their work at our prescribed budgets. Their team always met their objectives and I'm very happy with the end result. Thank you, Mindbowser team!!

author
Bart Mendel
Founder, Mindworks
coma

Mindbowser was easy to work with and hit the ground running, immediately feeling like part of our team.

author
George Hodulik
CEO, Stealth Startup, Ex-Google