Boosting Application Resilience: Your Guide to Resilience4j

Imagine you have a cool app on your phone that lets you buy things online. When you make a purchase, the app needs to communicate with a payment service to process your payment and generate an invoice. However, sometimes things don’t happen as expected. The payment service might be down or taking too long to respond due to some technical issues.

To make sure your shopping experience doesn’t get ruined by these hiccups, Resilience4j comes to the rescue. It’s like having a superhero that protects the app and keeps things running smoothly even when the payment service is acting up. Resilience4j is a fault tolerance library inspired by Hystrix. It is lightweight and easy to use and integrate into the application. It is designed to help developers write resilient and fault-tolerant applications.

In a microservice architecture, we must have fault tolerance because services are loosely coupled so if one service is down we must handle that scenario. Also sometimes we have to limit the number of calls to some APIs and retry calling another service after some time. Resilience4j provides these features. Let’s learn how to implement them in our project.

To start, let’s add some dependencies to our Spring Boot project;

<dependency>
<groupId>io.github.resilience4j</groupId> <artifactId>resilience4j-spring-boot2</artifactId>
</dependency>

<dependency>
<groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-aop</artifactId>
</dependency>

We will add the spring-boot-starter-actuator to monitor the application state through endpoints;

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Related read: 19 Point Checklist: Starting a New Project in Spring Boot

Investigate the Fundamentals of Building a Winning App

Key Features of Resilience4j

Key Features of Resilience4j

1.  Circuit Breaker

A circuit breaker allows developers to build fault-tolerant applications by preventing cascading failures in applications. It breaks the circuit when it detects the system failures. It helps in preventing the call to the microservice which may be down or giving timeouts. We can also track circuit breaker status using actuator endpoints. The circuit breaker has 3 states;

  • Close state: It is the normal and initial state of the circuit breaker.
  • Open state: The circuit opens if the failures or the response time is slower than the required thresholds. In this state, all calls are restricted.
  • Half-open: After some amount of time in the open state it allows some requests to pass through it. If failures/slow response is lower than the threshold it will be moved to closed state. If failures/slow response is greater than the threshold it will be moved to an open state.

We have to add some properties in the application.properties file;

resilience4j.circuitbreaker:
  instances: 
      generateInvoice:
         registerHealthIndicator: true
         slidingWindowSize: 60
         slidingWindowType: TIME_BASED
         permittedNumberOfCallsInHalfOpenState: 3 
         minimumNumberOfCalls: 10
         waitDurationInOpenState: 5s
         slowCallRateThreshold: 100
         slowCallDurationThreshold: 2000 
         failureRateThreshold: 50

@GetMapping("/generate-invoice")
@CircuitBreaker(name="generateInvoice",fallbackMethod = "onGenerateInvoiceFailed")
public ResponseEntity<Map<String,Object>> sampleApi() {
  ResponseEntity<Map<String,Object>> paymentDetails = new RestTemplate().getForEntity("http://localhost:8080/get-payment-details", Map.class);
  return invoiceService.generateInvoice(paymentDetails.getBody());
}
 
private Map<String,Object> onGenerateInvoiceFailed()
{
Map<String,Object> res = new HashMap<>();
res.put("message", "Payment Service Down");
res.put("status", "ServiceDown");
}

2. Rate Limiter

The rate limiter is used to control the rate of incoming requests to a service. When the rate limit is reached then incoming requests are rejected or delayed until the rate drops below the limit. It can be configured by setting maximum requests in a given time.

resilience4j.ratelimiter:
   instances:
     sendMessages:
        limitRefreshPeriod: 1s
        limitForPeriod: 5
        timeoutDuration: 3s
        registerHealthIndicator: true

@PostMapping("/send-messages")
@RateLimiter(name="sendMessages")
public String sendMessage(@RequestParam String message) {
  return message;
}

We can also add a fallback method so that we can return the appropriate response once the limit exceeds.

3. Bulkhead

Bulkhead allows you to limit the number of concurrent calls to services, preventing system overloading. It offers 2 bulkheads one is a thread pool and another one is a semaphore to control resource allocation.

Semaphore is the default bulkhead. It uses the same use request thread and will not create a new thread. Where the thread pool bulkhead creates the new thread.

resilience4j.bulkhead:
  instances:
    service-name: 
      maxConcurrentCalls: 2
      maxWaitDuration: 5s

@GetMapping("/country-names")
@Bulkhead(name="countryNames")
public ResponseEntity<List<String>> getCountryNames() {
return coutryService.getNames();
}

4. Retry

A retry pattern is used to retry calling service after some time. We can also set how many times we can call the service along with the time in between the calls.

resilience4j.retry:
  instances:
    retryTest:
      maxAttempts: 3
      waitDuration: 1s

@GetMapping("/country-names")
@Retry(name="retryTest")
public String getCountryNames() {
     return countyService.getNames();
}

You can also consider alternatives such as Hystrix, the Netflix OSS (Open Source Software) Stack, Spring Cloud Circuit Breaker, Sentinel, and Akka.

These are just a few alternatives to Resilience4j, and the choice depends on your specific requirements, technology stack, and preferences. It’s essential to evaluate each alternative based on factors such as community support, maturity, active development, compatibility, and integration with your existing infrastructure and frameworks.

coma

Conclusion

In this blog, we have learned how to use a circuit breaker when another service is down or taking a long time to respond and went through the states used in circuit breakers. Rate limiter which serves as our traffic controller, ensuring we don’t overwhelm services with a barrage of requests.

Additionally, we’ve delved into the concept of bulkheads, acting as sturdy partitions that prevent system overload by managing concurrent requests. Lastly, we’ve grasped the power of retrying requests, a resilient technique that gives our applications a second chance, with well-defined intervals, when an initial attempt fails.

Keep Reading

Keep Reading

Struggling with EHR integration? Learn about next-gen solutions in our upcoming webinar on Mar 6, at 11 AM EST.

Register Now

Let's create something together!