Unveiling Spring Batch: Understanding Its Concepts and Advantages

Spring Batch is a web application that collects and stores large amounts of customer data. You want to perform various operations on this data like importing resources, validating and transforming data and then exporting or storing it in other resources, all conducted in a batch format.

Here in this scenario, we can use Spring Batch as it provides features to process the data in a batch like automating the process, handling errors, reading data, validating data and then exporting it to resources.

Spring Batch is a lightweight batch processing framework useful for processing large numbers of records. It also provides features that support extremely high volume and high-performance batch jobs through its optimization and partitioning techniques.

Advantages of Spring Batch

1. Robust and Scalable: It’s designed to process large volumes of data and process data in chunks reducing memory usage and improving performance. It also provides parallel processing that enables concurrent execution of tasks.

2. Fault-Tolerant: It offers error-handling features to ensure the reliability of processing. It provides features like automatic restart, and skip/retry functionality. In case of failure, it recovers the execution state and resumes processing from that point of failure.

3. Monitoring: It provides built-in monitoring features that allow developers to track job progress like the number of records read, written and processed per file.

Related read: How To Improve Speed & Reliability Of A Product Using Monitoring Tools

Key Concepts Used in the Spring Batch

1. Job: A job is a batch process that can be executed manually or scheduled. The job can contain one or more steps.

2. Step: A step is part of a job that consists of a reader, processor, writer, and listener.

3. Reader: The reader reads the data from different sources such as CSV, txt, db etc. It provides multiple built-in readers and we can create our own custom reader.

4. Processor: The processor is used for filtering, validating, and transforming the input data into the desired format required for writing.

5. Writer: The writer takes the data from the processor and stores it in a database, CSV etc. sources.

6. Listeners: The listeners allow developers to hook into the batch processing lifecycle and perform actions at specific events. Examples include before/after job execution, step execution, or chunk completion.

Unlock the Power of Java Development: Hire Our Expert Java Developers Today!

Spring Batch uses tables to store batch job information which contains information like files processed, number of rows processed, read, written, job status, and step status.

Spring Batch Job Execution
Fig. Spring Batch Job Execution
  • BATCH_JOB_INSTANCE: This table stores information about instances of a job that has been executed. It includes details such as the job instance ID, the job name, and the job parameters.
  • BATCH_JOB_EXECUTION: It stores information related to job execution which contains information like job ID, start time, end time, and job status.
  • BATCH_STEP_EXECUTION: It stores information about steps in a job like step ID, name, start time, end time and step status.
  • BATCH_JOB_EXECUTION_CONTEXT: It stores the serialized context data associated with each job execution. It can hold any custom data that needs to be persisted and shared across steps within a job.
  • BATCH_STEP_EXECUTION_CONTEXT: Similar to the job execution context, the BATCH_STEP_EXECUTION_CONTEXT table stores the serialized context data associated with each step execution. It can hold custom data specific to a particular step.
  • BATCH_JOB_EXECUTION_SEQ: It contains the execution sequence of the job.
  • BATCH_STEP_EXECUTION_SEQ: It contains a sequence for step execution.

Implementing Spring Batch into Our Application

Let’s start implementing Spring Batch into our application. For that, we have to add Spring Batch dependency in pom.xml,

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>

Now we will create a config file for the Spring Batch,

@Configuration
public class EmployeeBatchConfig {
@Value("${employee-batch.step-name}")
private String stepName;
@Value("${employee-batch.chunk-size}")
private int chunkSize;
@Value("${employee-batch.files-path}")
private String filesPath;
@Autowired
private StepBuilderFactory stepBuilderFactory;
@Autowired
private JobBuilderFactory jobBuilderFactory;

@Autowired
private EmployeeProcessor employeeProcessor;
@Autowired
private EmployeeWriter employeeWriter;

public Job runJob() throws IOException {
return jobBuilderFactory.get("employee").incrementer(new RunIdIncrementer()).flow(employeeStep()).end().build();
}
@Bean
Step employeeStep() throws IOException {
return stepBuilderFactory
.get(stepName)
.<Employee,EmployeeOutput>chunk(chunkSize)
.reader(multiResourceItemreader())
.processor(employeeProcessor)
.writer(employeeWriter)
.build();
}
@Bean
public MultiResourceItemReader<Employee> multiResourceItemreader() throws IOException {
MultiResourceItemReader<Employee> reader = new MultiResourceItemReader<>();
reader.setDelegate(employeeReader());
reader.setResources(getResources());
return reader;
}
private Resource[] getResources() throws IOException {
ResourcePatternResolver patternResolver = new PathMatchingResourcePatternResolver();
Resource[] resources = patternResolver.getResources("file:" + filesPath);
return resources;
}
@Bean
@StepScope
FlatFileItemReader<Employee> employeeReader()
throws MalformedURLException {
FlatFileItemReader<Employee> reader = new FlatFileItemReader<>();
reader.setLinesToSkip(1);
reader.setLineMapper(lineMapper());
return reader;
}
public LineMapper<Employee> lineMapper() {
DefaultLineMapper<Employee> lineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setDelimiter(",");
lineTokenizer.setStrict(false);
lineTokenizer.setNames(getFileHeader());
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper());
return lineMapper;
}
public FieldSetMapper<Employee> fieldSetMapper() {
return new EmployeeInputMapper();
}
public String[] getFileHeader() {
return new String[] { "id", "name", "email", "gender", "age" };
}
}

In the above config, we have created one job using JobBuilderFactory. The job contains one step. In that step, we have mentioned our reader, processor and writer. MultiResourceItemReader is used for reading multiple resources and that resource will be passed to FlatFileItemReader for reading. In the linemapper we have mentioned delimiter, file headers, and one input mapper for mapping data to the required object.

Now we will create a processor for processing the data after reading it from the given resource,

public class EmployeeProcessor implements ItemProcessor<Employee, EmployeeOutput> {
@Override
public EmployeeOutput process(Employee item) throws Exception {
EmployeeOutput out = new EmployeeOutput();
out.setAge(item.getAge());
out.setDateOfBirth(getDateOfBirth(item.getAge()));
out.setEmail(item.getEmail());
out.setGender(item.getGender().toUpperCase());
out.setId(item.getId());
out.setName(item.getName());

return out;
}

private String getDateOfBirth(int age)
{
LocalDate now = LocalDate.now();
LocalDate dob = now.minusYears(age);
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("dd/MM/yyyy");
return dob.format(formatter);
}
}

Here for processing, we are converting employee to employee output and calculating its date of birth from age.

Now we will create one writer for storing employees in db,

public class EmployeeWriter implements ItemWriter<EmployeeOutput> {
@Override
public void write(List<? extends EmployeeOutput> items) throws Exception {

employeeRepository.saveAll(items);

items.forEach(e->{
System.out.println(e.getDateOfBirth());
});
}

}
coma

Conclusion

In this blog, we learned about Spring Batch, its components like job, step, reader, writer, processor and tables created by Spring Batch and how to read data from files using FlatFileReader, process data using ItemProcessor and write data to db or any other resources using ItemWriter.

In summary, now that you have this knowledge, you’re ready to use Spring Batch to make your web applications better at handling lots of data at once. Spring Batch is really good at this and can help your organization create high-speed and reliable data processing solutions. It’s like having a strong partner to assist you in achieving top-notch data processing results.

Keep Reading

Mindbowser is excited to meet healthcare industry leaders and experts from across the globe. Join us from Feb 25th to 28th, 2024, at ViVE 2024 Los Angeles.

Learn More

Let's create something together!