Spring Batch: making massive batch processing on Java

Standard

Welcome, dear reader, to another post from my technology blog. In this post, we discuss a framework that may not be very familiar to everyone, but it is a very powerful feature in the construction of batch applications made in Java: The Spring Batch.

Batch Application: what it is

A batch application, in general, is nothing more than a program whose goal is to make the processing of large amounts of data, on a scheduled basis, usually through programmed trigger mechanisms (scheduling).

Typically, on companies, we see many such programs been built directly into the database layer, using languages such as PL \ SQL, for example. This method has its advantages, but there are several advantages that can draw to build a batch program in a technology like Java. One advantage we get is the ease of application scalation, as a batch built in this language will typically run as a standalone program or within an application server, so you can have your memory, CPU, etc more easily scaled than the alternative of a batch in PL \ SQL. Moreover, the alternative of making a batch on Java offers more opportunities of reuse, as the same logic can be applied to  batch, web, REST, etc.

So, having made our introduction to the subject, let’s proceed and start talking about the framework.

Framework architecture

In the figure below, taken from the framework documentation, can we see the main components that make up the architecture of a Spring Batch job. Let’s see in better detail.

 

As we can see above, when we build a job – a term commonly used to describe a batch program, we will use from now – in the framework, you must implement three types of artifacts: a job script, which consists of an implementation plan with steps, which makes up the job execution, connection settings for the data sources that the job will process such as databases, JMS queues, etc. and of course, classes that implement the processing logic.

When we use the framework for the first time, a step of the setup is to create a set of database tables, whose function is to provide the basis for a repository of jobs. The framework focuses on the concept where you can, through these tables, control the status of different jobs, through the different executions, allowing a restartability mechanism, that is, it allows a job to be restarted from the point at which it stopped in the last run, in case of failure. To achieve this control, the Spring Batch provides the following control structure represented by a set of classes:

JobRunner: Class responsible to make the execution of a job by external request. Has several implementations to provide method invocation call for different modes such as a shell script, for example. Performs the instantiation of a JobLauncher;

JobLocator: Class responsible for getting the configuration information, such as the implementation plan (job script), for a given job passed by parameter. Works in conjunction with the JobRunner;

JobLauncher: Class responsible for managing the start and manage the actual execution of the job, is instantiated by JobRunner;

JobRepository: Facade class that interface the access of the framework classes to the tables of the repository, it is through this class that jobs communicate the progress of its executions, thus ensuring that it could make his restart;

Thanks to this mechanism of control, Spring provides a web application, developed in Java, which allows actions like view execution logs of batches and start / stop / restart jobs through the interface, called Spring Batch Admin. More information about the application can be found at the end of the post.

Now that we have clarified the framework architecture, let’s talk about the main components (classes / interfaces that the developer must implement) that the developer has at his disposal for the construction of the processing logic itself.

Components

Tasklet: Basic unit of a step, can be created for the development of specific actions of the batch, like calling a webservice which data is to be used for all steps of the implementation, for example.

ItemReader: Component used in a structure known as chunk, where we have a data source that is read, processed and written in an iterative fashion, into blocks – chunks – until all the data has been processed. This component is the logic of reading, that read sources such as databases. The framework comes with a set of pre-build readers, but the developer can also develop your own if necessary.

ItemProcessor: Component used in a structure known as chunk, where we have a data source that is read, processed and written in an iterative fashion, into blocks – chunks – until all the data has been processed. This component is the processing logic, which typically consists of the execution of business rules, calls to external resources for enrichment of data, such as web services, among others.

ItemWriter: Component used in a structure known as chunk, where we have a data source that is read, processed and written in an iterative fashion, into blocks – chunks – until all the data has been processed. This component is for the writte logic of the processed data, like with the ItemReaders, the framework also comes with a set of pre-build ItemWriters to write on sources such as databases, but the developer can also develop your own writer, if necessary.

Decider: Component responsible for making use of logic to perform tasks like “go to the step 1 if a value equal to X, if equal to Y go to the step 2, and ends the execution if the value is equal to Z “.

Classifier: Component that can be used in conjunction with other components, such as a ItemWriter and perform classification logic, such as “run ItemWriterA for the item if it has the property X = true, otherwise, execute the ItemWriterB “. IMPORTANT: IN THIS SCENARIO, THE ORDER OF EXECUTION OF THE ITEMS WITHIN THE CHUNK IS MODIFIED, BECAUSE THE FRAMEWORK  MAKES ALL THE CLASSIFICATION OF THE ITEMS FIRST, AND THEN EXECUTE 1 ITEMWRITER AT A TIME!

Split: Component used when you want, at a certain point of execution, a set of steps to run in parallel through multithreading.

About the Java EE 7 Batch specification

Some readers may be familiar with the new API called “Batch”, the JSR-352, which introduces a new batch processing API in Java EE 7 platform, having very similar concepts to Spring Batch, it fills an important gap in the implementation of reference of the Java technology. More than a philosophical question, some attention points should be considered before you choose to use one or the other framework, such as the requirement of a Java EE container (server) for its implementation, the lack of support for the use via jdbc in access to the databases, and the absence of support for reading outsourced properties in files, which the Spring Batch can use through components called PropertyPlaceHolders. In the links at the end of the post, you can read an article detailing the differences of the two in more depth.

Conclusion

Unfortunately, you can not detail, in a single post, all the power of the framework. Several things were left out, such as support for event listeners in the execution of jobs, error treatments allowing certain exceptions have retries policies or being “ignored” (retry, skip), among other features. I hope, however, be transmitted to the reader a good initial view of the framework, sharpening his curiosity. Massive data processing has always been, and always will be a major challenge for companies, and our mission, as IT professionals, is the constant learning of the best resources we have available. Thank you for your attention, and until next time.

Continue reading

Hands-on: JSON Java API

Standard

JSON (JavaScript Object Notation) is a notation for data communication, as well as XML, for example. Its popularity has grown with the growth of the REST Web Services, and today has long been used in the development of APIs.

In this hands-on, we will learn how to use a JSON Java API, present in Java EE 7. With it, you can parse JSON structures for reading the data, and generate their own structures.

Creating the project

In this hands-on we will use Eclipse. Create a Maven project in New> Other> Maven Project. If you do not have this option, open the Eclipse Marketplace on the IDE itself (Help menu), and look for the plugin “Maven Integration for Eclipse” on your version. At the end of this post, you can find a link to the source code of hands-on.

With the project done, we will add in the pom the following dependencies:

<dependencies>
<dependency>
<groupId>javax.json</groupId>
<artifactId>javax.json-api</artifactId>
<version>1.0</version>
</dependency>
<dependency>
<groupId>org.glassfish</groupId>
<artifactId>javax.json</artifactId>
<version>1.0.4</version>
</dependency>
</dependencies>

With the dependencies created, we will begin to explore the API.

JsonParser

The first class we will talk about is the JsonParser. With this class, we can, from a JSON input, perform a parse of the structure. The code below demonstrates the use of the class:

.....
FileInputStream file = new FileInputStream("dados.json");
JsonParser parser = Json.createParser(file);
while (parser.hasNext()) {
Event evento = parser.next();
switch (evento) {
case KEY_NAME: {
System.out.print(parser.getString() + "=");
break;
}
case VALUE_STRING: {
System.out.println(parser.getString());
break;
}
case VALUE_NUMBER: {
System.out.println(parser.getString());
break;
}
case VALUE_NULL: {
System.out.println("null");
break;
}
case START_ARRAY: {
System.out.println("Inicio do Array de Telefone");
break;
}
case END_ARRAY: {
System.out.println("Final do Array de Telefone");
break;
}
case END_OBJECT: {
System.out.println("Final do Objeto Json");
break;
}
}
}
.....

As we can see in the code above, through the class we followed the whole json structure contained within the file “dados.json”. For example, with a file which has the following structure:

{
“id”:123,
“descricao”:”Produto 1″,
“Classificacao”:{
“nivel”:1,
“subnivel”:2,
“secao”:”eletrodomesticos”
},
“fornecedores”:[
{
“id”:1,
“descricao”:”brastemp”
},
{
“id”:2,
“descricao”:”consul”
},
{
“id”:3,
“descricao”:”eletrolux”
}
]
}

We have the following print on the console:

id:
123
descricao:
Produto 1
Classificacao:
nivel:
1
subnivel:
2
secao:
eletrodomesticos
Final do Objeto Json
fornecedores:
começo de um array
id:
1
descricao:
brastemp
Final do Objeto Json
id:
2
descricao:
consul
Final do Objeto Json
id:
3
descricao:
eletrolux
Final do Objeto Json
final de um array
Final do Objeto Json

JsonGenerator

With the JsonGenerator class, you can generate JSON structures.The usage is made by putting the openings and closings of the tags in a manual way,  through the API methods, generating the structure in a  sequentially way:

.....
JsonGeneratorFactory factory = Json.createGeneratorFactory(properties);
JsonGenerator jsonGen = factory.createGenerator(System.out);
jsonGen.writeStartObject().write("id", 123).write("descricao", "Produto 1").writeStartObject("Classificacao").write("nivel", 1).write("subnivel", 2).write("secao", "eletrodomesticos").writeEnd().writeStartArray("fornecedores").writeStartObject().write("id", 1).write("descricao", "brastemp").writeEnd().writeStartObject().write("id", 2).write("descricao", "consul").writeEnd().writeStartObject().write("id", 3).write("descricao", "eletrolux").writeEnd().writeEnd().writeEnd().close();
.....

The above code will generate an identical Json than shown above.

JsonObjectBuilder

In the example above, although the API facilitates the creation of the JSON, we have some problems. As we have to manually put the openings and closings of the tags, the result is a somewhat laborious code, which requires the developer to careful not generate invalid results. A better alternative is to generate Jsons with the JsonObjectBuilder class, which use a nearest OO API format, and therefore easier to program in the language:

.....
JsonBuilderFactory jBuilderFactory = Json.createBuilderFactory(null);
JsonObjectBuilder jObjectBuilder = jBuilderFactory
.createObjectBuilder();
jObjectBuilder
.add("id", 123)
.add("descricao", "Produto 1")
.add("Classificacao",
jBuilderFactory.createObjectBuilder().add("nivel", 1)
.add("subnivel", 2)
.add("secao", "eletrodomesticos"))
.add("fornecedores",
jBuilderFactory
.createArrayBuilder()
.add(jBuilderFactory.createObjectBuilder()
.add("id", 1)
.add("descricao", "brastemp"))
.add(jBuilderFactory.createObjectBuilder()
.add("id", 2)
.add("descricao", "consul"))
.add(jBuilderFactory.createObjectBuilder()
.add("id", 3)
.add("descricao", "eletrolux")));
JsonObject jObject = jObjectBuilder.build();
JsonWriter jWriterOut = Json.createWriter(System.out);
jWriterOut.writeObject(jObject);
jWriterOut.close();
.....

As in the other example, this code will generate the same JSON shown at the beginning of the post.

Conclusion

In this hands-on, we saw a sample of a JSON manipulation API of the Java language. With it, we can create Jsons more simply, beyond reading them. The reader may be wondering “but it is not easier to use the JAX-RS 2.0 to produce / consume Jsons”? It is true that the JAX-RS 2.0 has brought a simpler interface than the one presented here, where, basically, simply create a POJO to have a ready Json structure. The reader should remember, however, that the JSON is not a unique structure for use with REST services, and therefore, for scenarios where the use of the RS 2.0 is not appropriate, this API can become a good option. Out of curiosity, the JAX-RS 2.0 uses this API “under the hood”.

And so we ended our hands-on. Thanks to all who attended this post, until next time

Continue reading