Hibernate Search 6 With Spring Boot

This article will go through Hibernate Search 6, set it up, and some sample queries in this tutorial.

For better understanding, let’s take a look at the basics first.

For any web application or business to provide superior service, your users need to be able to search quickly for their preferred service or product. Delay in retrieving information leads to poor user experience.

Hibernate Search can be used to create search experiences like what one would expect from Google or e-commerce platforms like Flipkart, Amazon, etc.

Why Hibernate Search?

When it comes to the huge dataset and where the data is scattered among multiple tables, relational databases work comparatively slow, leading to slower fetching of search results through queries from the database. RDBMS can be optimized, but there are certain limitations.

When we search using full-text search, similar words, and keywords, Hibernate search provides many features.

Hibernate Search is used to implement “full-text search”, such as matching free text input provided by the users from the search box.

We just need to tell Hibernate Search which entities to index by using some annotations.

Hibernate Search provides you with both Lucene and ElasticSearch implementations that are highly optimized for full-text search.

We will discuss Hibernate Search implementation with Lucene.

The below diagram of Lucene flow explains the workflow of Hibernate search query working with Lucene and indexes.

Lucene flowFig: Lucene flow diagram

Configurations

  • Maven Dependencies

Before getting started we need to add the necessary dependencies to our pom.xml

<dependency>

   <groupId>org.Hibernate.search</groupId>

   <artifactId>Hibernate-search-mapper-orm</artifactId>

   <version>6.0.2.Final</version>

</dependency>

<dependency>

   <groupId>org.Hibernate.search</groupId>

   <artifactId>Hibernate-search-backend-lucene</artifactId>

   <version>6.0.2.Final</version>

</dependency>
  • Application Properties file

If you want to store indexes in the specified path, you can below the property with a path for storing indexes.

spring.jpa.properties.Hibernate.search.backend.directory.root=/home/indexes/

Important Terms Related to Hibernate Search

Before going ahead, let’s see some important terms.

  • Text and Keyword

The primary difference between text and keyword is text can be tokenized while keyword cannot.

We can use the keyword type to perform some sorting and filtering operations on an entity.

Suppose we have a String field called message and its value as “Welcome to Hibernate Search’.

In that case, If we choose a message as a text type then we will be able to tokenize it [‘Welcome’,’ to’,’ Hibernate’,’ Search’] and we can perform a search using any word.

However, if we make it a keyword type, we can only find a match if we pass the entire text.

  • Analyzers and Tokenizers

The analyzer is how text and keywords are supposed to be processed before indexing and searching them. The default analyzer is a good fit for most languages, but it is not very advanced. To get most of the analyzer, you will need to define a custom analyzer by using a filter and tokenizer factory.

An example, let’s say one of your entities has the title “Refactoring: Improving the Design of Existing Code“ and you want to hit for any of the following search terms: “Refactors”, “refactored” and “refactoring”. Using an analyzer with the following components is one approach to accomplish this:

  • A “standard” tokenizer, which splits words at whitespaces, punctuation characters, and hyphens. It is a good general-purpose tokenizer.
  • Every character is converted to lowercase using a “lowercase” filter.
  • A “snowball” filter, which applies language-specific stemming.

Normalizers are identical to analyzers, except that normalizers do not use a tokenizer. The below diagram shows the sample example of how the string is tokenized and then indexed.

How Analyzer WorksFig: How does the analyzer work?

We can use either an analyzer or a normalizer in a particular field.

Preparing Entities For Indexing

As mentioned above, we just need to annotate the entities and their fields with a couple of annotations.

Let’s have a look at those annotations.

@Indexed annotation

@Entity

@Indexed(index = "post_index")

class Post {

  ....

}

We make this entity eligible for indexing. The index name is not required. By default, Hibernate Search uses the fully qualified class name as the index name by default.

@Analyzer
@FullTextField(analyzer = “custom_analyzer”)
private String message;

The property is mapped to a full-text index field with the same name and type using FullTextField. Full-text fields are tokenized and broken down into tokens. Here we have added a custom analyzer for tokenizing strings into different parts.

In Hibernate search 6 we have to create a class that will implement the LuceneAnalysisConfigurer interface. After that, we can use the same analyzer with different entities using the analyzer name.

MyLuceneAnalysisConfigurer.java

@Configuration

public class MyLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {

@Override

public void configure(LuceneAnalysisConfigurationContext context) {

context.analyzer("custom_analyzer").custom().tokenizer(StandardTokenizerFactory.class)

.tokenFilter(LowerCaseFilterFactory.class).tokenFilter(SnowballPorterFilterFactory.class)

.param("language", "English").tokenFilter(ASCIIFoldingFilterFactory.class);

}

}

We have added only one custom analyzer method with the name “custom_analyzer”. We can add any number of analyzers with different names. As mentioned in the analyzer section we can add different filter factory classes for the tokenizing string as per our requirement.

@IndexedEmbedded

@Entity

@Indexed(index = "idx_post")

class Post {

  ...

  @ManyToOne

  @IndexedEmbedded

  private User user;  

  ...

}

We use @IndexedEmbedded when we want to perform a search over nested object fields. For instance, let’s say we want to search for all posts made by a user with the first name is ‘Jack’. You have to use “user.name” in the field name to search using the user name.

These are the basic annotations that are mostly used.

Loading Existing Data Into Hibernate Search

Suppose If you have a web application with a huge database and you want to use Hibernate search in it, then there will be a question of how to add data to Hibernate index. MassIndexer does the job for us. We need to add MassIndexer at the start of our application.

Let’s create a class that will load all database records of indexed entities into Hibernate search indexes.

HibernateSearchIndexBuild.java

@Configuration

public class HibernateSearchIndexBuild implements ApplicationListener<ApplicationReadyEvent> {

private Logger logger = LogManager.getLogger();

@Autowired

private EntityManager entityManager;

@Override

@Transactional

public void onApplicationEvent(ApplicationReadyEvent event) {

logger.info("Started Initializing Indexes");

SearchSession searchSession = Search.session(entityManager);

MassIndexer indexer = searchSession.massIndexer().idFetchSize(150).batchSizeToLoadObjects(25)

.threadsToLoadObjects(12);

try {

indexer.startAndWait();

} catch (InterruptedException e) {

logger.warn(“Failed to load data from database”);

Thread.currentThread().interrupt();

}

logger.info("Completed Indexing");

}

}

This is going to be a one-time thing. After then, Hibernate Search will maintain both sources’ entities in sync. Unless, of course, our database is out of sync with Hibernate Search for some reason.

Performing Queries

  • Basic Search Query

Now let’s say we want to write a query to fetch all records from post_index where the title contains the word “hello”.

SearchSession searchSession = Search.session( entityManager ); 

SearchResult<Post> result = searchSession.search(Post.class ) 

        .where( f -> f.match() 

                .field( "title" )

                .matching( "hello" ) )

        .fetchAll(); 

long totalHitCount = result.total().hitCount(); 

List<Post> hits = result.hits();

Let’s go through this code example:

  1. The EntityManager can provide you with a Hibernate Search session called SearchSession.
  2. Initiate a search query on the index map to the Post entity.
  3. Define which documents should be returned only if they fit the provided criteria.
  4. Build the query and fetch all the results.
  5. Retrieve the total number of matching entities. See Fetching the total (hit count, …​) for ways to optimize the computation of the total hit count.
  6. Retrieve matching entities.

One thing to note here is that although we are performing a query on Hibernate Search, Hibernate will still fire a query on the database to fetch the full entity. Because we didn’t store all the fields of the Post entity in the index and those fields still need to be retrieved.

  • Pagination And Sorting

When we don’t want to retrieve millions of records simultaneously, we will use pagination.

To perform pagination, we need two things: page offset and page size.

  1. Offset = zero-based-page-number * page-size
  2. Page size
SearchResult<Post> result = searchSession.search( Post.class )

        .where( f -> f.matchAll() )

        .sort( f -> f.field( "pageCount" ).desc())

        .fetch( 40, 20 );

The above query for 40 is offset and 20 is the result size. The query’s results should be ordered in decreasing order on the parameter “pageCount.”

  • Range Queries

The range predicate finds documents where a given field’s value falls inside a specified range.

List<Post> hits = searchSession.search( Post.class )

        .where( f -> f.range().field( "tags" )

                .between( 210, 250 ) )

        .fetchHits( 20 );

bool: combine predicates (and/or/..)

If a query includes different and/or clauses then writing the query is a complex process. The bool predicate matches documents that match one or more inner predicates, called “clauses”. Only must clauses in a bool predicate make it act like an AND operator.

  • Should Clause

If there are just should clauses in a bool predicate, it will behave as an OR operator.

List<Post> hits = searchSession.search( Post.class )

        .where( f -> f.bool()

                .should( f.match().field( "title" )

                        .matching( "robot" ) ) 

                .should( f.match().field( "description" )

                        .matching( "investigation" ) ) 

        )

        .fetchHits( 20 );

All returned hits will match at least one of the clauses above: they will have a title matching with “robot” or they will have a description matching with “investigation”.

  • Must Clause

A bool predicate with only must clauses will behave as an AND operator.

List<Post> hits = searchSession.search( Post.class )

        .where( f -> f.bool()

                .must( f.match().field( "title" )

                        .matching( "robot" ) ) 

                .must( f.match().field( "description" )

                        .matching( "investigation" ) ) 

        )

        .fetchHits( 20 );

All returned hits will match all of the clauses above: they will have a title matching with “robot” and they will have a description matching with “investigation”.

Further Reading

That’s it. I mean this is not everything, but I believe this enough to get you started. For further reading you can explore the following:

  1. Phrase Queries – For a matching sequence of words
  2. Wildcard Queries – Match a simple pattern
  3. Simple Query String – You can allow your platform to take queries directly from users.
  4. Within Query – Match points within a circle, box, polygon. When you want to work with geo points (latitude and longitude).

Key Points to Remember

  1. When you are using @IndexedEmbedded annotation for nested entities, make sure that relationship mapping must be bidirectional.
  2. Use MassIndexer to load objects from the database to the Hibernate index.
  3. Since indexed entities are in sync with hibernate search, it’s also updated in the stored index when you update the entity. Use the save() method to update or add the entity.
  4. Use Sharding, to improve performance when dealing with large amounts of data. Sharding is a process to split index data into multiple “smaller indexes”.

Github Example

You can refer to a working project on the GitHub repository by using the link below.
GitHub Repository

coma

Conclusion

In this article, we discussed the basics of Hibernate Search and important query types.

The more advanced topics can be found in the official documentation.

Thank you for reading! In case you have some questions feel free to comment below.

Keep Reading

Keep Reading

  • Service
  • Career
  • Let's create something together!

  • We’re looking for the best. Are you in?