Apache Camel: integrating systems with Java

Standard

Hi, dear readers! Welcome to my blog. On this post, we will talk about Apache Camel, a robust solution for deploying system integrations across various technologies, such as REST, WS, JMS, JDBC, AWS Products, LDAP, SMTP, Elasticsearch etc.

So, let’s get start!

Origin

Apache Camel was created on Apache Service Mix. Apache Service Mix was a project powered by the Spring Framework and implemented following the JBI specification. The Java Business Integration specification specifies a plug and play platform for systems integrations, following the EIP (Enterprise Integration Patterns) patterns.

Terminology

Exchange

Exchanges – or MEPs(Message Exchange Patterns) – are like frames where we transport our data across the integrations on Camel. A Exchange can have 2 messages inside, one representing the input and another one representing the output of a integration.

The output message on Camel is optional, since we could have a integration that doesn’t have a response. Also, a Exchange can have properties, represented as key-value entries, that can be used as data that will be used across the whole route (we will see more about routes very soon)

Message

Messages are the data itself that is transferred inside a Camel route. A Message can have a body, which is the data itself and headers, which are, like properties on a Exchange, key-value entries that can be used along the processing.

One important aspect to keep in mind, however, is that along a Camel route our Messages are changed – when we convert the body with a Type converter, for instance – and when this happens, we lose all our headers. So, Message headers must be seen as ephemeral data, that will not be used through the whole route. For that type of data, it is better to use Exchange properties.

The Message body can de made of several types of data, such as binaries, JSON, etc.

Camel context

The Camel context is the runtime container where Camel runs it. It initializes type converters, routes, endpoints, EIPs etc.

A Camel context has 3 possible status: started, suspended and stopped. When started, the context will serve the routes processing as normal.

When on suspended status, the Camel context will stop the processing – after the Exchanges already on processing are completed – , but keep all the caches, resources etc still loaded. A suspended context can be restarted.

Finally, there’s the stop status. When stopped, the context will stop the processing like the suspended status, but also will release all the resources caches etc, making a complete shutdown. As with the suspended status, Camel will also guarantee that all the Exchanges being processing will be finished before the shutdown.

Route

Routes on Camel are the heart of the processing. It consists of a flow, that start on a endpoint, pass through a stream of processors/convertors and finishes on another endpoint. it is possible to chain routes by calling another route as the final endpoint of a previous route.

A route can also use other features, such as EIPs, asynchronous and parallel processing.

Channel

When Camel executes a route, the controller in which it executes the route is called Channel.

A Channel is responsible for chaining the processors execution, passing the Exchange from one to another, alongside monitoring the route execution. It also allow us to implement interceptors to run any logic on some route’s events, such as when a Exchange is going to a specific Endpoint.

Processor

Processors are the primary extension points on Camel. By creating classes that extend the org.apache.camel.Processor interface, we create programming units that we can use to include our own code on a Camel route, inside a convenient execute method.

Component

A Component act like a factory to instantiate Endpoints for our use. We don’t directly use a Component, we reference instead by defining a Endpoint URI, that makes Camel infer about the Component that it needs to be using in order to create the Endpoint.

Camel provides dozens of Components, from file to JMS, AWS Connections to their products and so on.

Registry

In order to utilize beans from IoC systems, such as OSGi, Spring and JNDI, Camel supplies us with a Bean Registry. The Registry’s mission is to supply the beans referred on Camel routes with the ones create on his associated context, such as a OSGi container, a Spring context etc

Type converter

Type converters, as the name implies, are used in order to convert the body of a message from one type to another. The uses for a converter are varied, ranging from converting a binary format to a String to converting XML to JSON.

We can create our own Type Converter by extending the org.apache.camel.TypeConverter  interface. After creating our own Converter by extending the interface, we need to register it on the Type’s Converter Registry.

Endpoint

A Endpoint is the entity responsible for communicating a Camel Route in or out of his execution process. It comprises several types of sources and destinations as mentioned before, such as SQS, files, Relational Databases and so on. A Endpoint is instantiated and configured by providing a URI to a Camel Route, following the pattern below:

component:option?option1=value1&option2=value2

We can create our own Components by extending the org.apache.camel.Endpoint interface. When extending the interface, we need to override 3 methods, where we supply the logic to create a polling consumer Endpoint, a passive consumer Endpoint and a producer Endpoint.

Lab

So, without further delay, let’s start our lab! On this lab, we will create a route that polls access files from a access log style file, sends the logs to a SQS and backups the file on a S3.

Setup

The setup for our lab is pretty simple: It is a Spring Boot application, configured to work with Camel. Our Gradle.build file is as follows:

apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'org.springframework.boot'
apply plugin: 'maven'
apply plugin: 'idea'

jar {
    baseName = 'apache-camel-handson'
    version = '1.0'
}

project.ext {
    springBootVersion = '1.5.4.RELEASE'
    camelVersion = '2.18.3'

}

sourceCompatibility = 1.8
targetCompatibility = 1.8

repositories {
    mavenLocal()
    mavenCentral()
}

bootRun {
    systemProperties = System.properties
}


dependencies {

    compile group: 'org.apache.camel', name: 'camel-spring-boot-starter', version: camelVersion
    compile group: 'org.apache.camel', name: 'camel-commands-spring-boot', version: camelVersion
    compile group: 'org.apache.camel',name: 'camel-aws', version: camelVersion
    compile group: 'org.apache.camel',name: 'camel-mail', version: camelVersion
    compile group: 'org.springframework.boot', name: 'spring-boot-autoconfigure', version: springBootVersion
    

}
group 'com.alexandreesl.handson'
version '1.0'

buildscript {
    repositories {
        mavenLocal()
        maven {
            url "https://plugins.gradle.org/m2/"
        }
        mavenCentral()
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:1.5.4.RELEASE")
    }
}

And the Java main file is a simple Java Spring Boot Application file, as follows:

package com.alexandreesl.handson;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.ComponentScan;

/**
 * Created by alexandrelourenco on 28/06/17.
 */

@ComponentScan(basePackages = {"com.alexandreesl.handson"})
@SpringBootApplication
@EnableAutoConfiguration
public class ApacheCamelHandsonApp {

    public static void main(String[] args) {
        SpringApplication.run(ApacheCamelHandsonApp.class, args);
    }

}

We also configure a configuration class, where we will register a type converter that we will create on the next section:

package com.alexandreesl.handson.configuration;

import org.apache.camel.CamelContext;
import org.apache.camel.spring.boot.CamelContextConfiguration;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class CamelConfiguration {


    @Bean
    public CamelContextConfiguration camelContextConfiguration() {

        return new CamelContextConfiguration() {

            @Override
            public void beforeApplicationStart(CamelContext camelContext) {
               

            }

            @Override
            public void afterApplicationStart(CamelContext camelContext) {

            }

        };

    }

}

We also create a configuration which will create a AmazonS3Client and AmazonSQSClient, that will be used by the AWS-S3 and AWS-SQS Camel endpoints:

package com.alexandreesl.handson.configuration;

import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.internal.StaticCredentialsProvider;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.sqs.AmazonSQSClient;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.env.Environment;

@Configuration
public class AWSConfiguration {


    @Autowired
    private Environment environment;

    @Bean(name = "s3Client")
    public AmazonS3Client s3Client() {
        return new AmazonS3Client(staticCredentialsProvider()).withRegion(Regions.fromName("us-east-1"));
    }

    @Bean(name = "sqsClient")
    public AmazonSQSClient sqsClient() {
        return new AmazonSQSClient(staticCredentialsProvider()).withRegion(Regions.fromName("us-east-1"));
    }

    @Bean
    public StaticCredentialsProvider staticCredentialsProvider() {
        return new StaticCredentialsProvider(new BasicAWSCredentials("<access key>", "<secret access key>"));
    }

}

PS: this lab assumes that the reader is familiar with AWS and already has a account. For the lab, a bucket called “apache-camel-handson” and a SQS called “MyInputQueue” were created.

Configuring the route

Now that we have our Camel environment set up, let’s begin creating our route. First, we create a type converter called “StringToAccessLogDTOConverter” with the following code:

package com.alexandreesl.handson.converters;

import com.alexandreesl.handson.dto.AccessLogDTO;
import org.apache.camel.Converter;
import org.apache.camel.TypeConverters;

import java.util.StringTokenizer;

/**
 * Created by alexandrelourenco on 30/06/17.
 */

public class StringToAccessLogDTOConverter implements TypeConverters {

    @Converter
    public AccessLogDTO convert(String row) {

        AccessLogDTO dto = new AccessLogDTO();

        StringTokenizer tokens = new StringTokenizer(row);

        dto.setIp(tokens.nextToken());
        dto.setUrl(tokens.nextToken());
        dto.setHttpMethod(tokens.nextToken());
        dto.setDuration(Long.parseLong(tokens.nextToken()));

        return dto;

    }

}

Next, we change our Camel configuration, registering the converter:

package com.alexandreesl.handson.configuration;

import com.alexandreesl.handson.converters.StringToAccessLogDTOConverter;
import org.apache.camel.CamelContext;
import org.apache.camel.spring.boot.CamelContextConfiguration;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class CamelConfiguration {


    @Bean
    public CamelContextConfiguration camelContextConfiguration() {

        return new CamelContextConfiguration() {

            @Override
            public void beforeApplicationStart(CamelContext camelContext) {

                camelContext.getTypeConverterRegistry().addTypeConverters(new StringToAccessLogDTOConverter());

            }

            @Override
            public void afterApplicationStart(CamelContext camelContext) {

            }

        };

    }

}

Our converter reads a String and converts to a DTO, with the following attributes:

package com.alexandreesl.handson.dto;

/**
 * Created by alexandrelourenco on 30/06/17.
 */
public class AccessLogDTO {

    private String ip;

    private String url;

    private String httpMethod;

    private long duration;

    public String getIp() {
        return ip;
    }

    public void setIp(String ip) {
        this.ip = ip;
    }

    public String getUrl() {
        return url;
    }

    public void setUrl(String url) {
        this.url = url;
    }

    public String getHttpMethod() {
        return httpMethod;
    }

    public void setHttpMethod(String httpMethod) {
        this.httpMethod = httpMethod;
    }

    public long getDuration() {
        return duration;
    }

    public void setDuration(long duration) {
        this.duration = duration;
    }

    @Override
    public String toString() {

        StringBuffer buffer = new StringBuffer();
        buffer.append("[");
        buffer.append(ip);
        buffer.append(",");
        buffer.append(url);
        buffer.append(",");
        buffer.append(httpMethod);
        buffer.append(",");
        buffer.append(duration);
        buffer.append("]");

        return buffer.toString();

    }
}

Finally, we have our route, defined on our RouteBuilder:

package com.alexandreesl.handson.routes;

import com.alexandreesl.handson.dto.AccessLogDTO;
import org.apache.camel.LoggingLevel;
import org.apache.camel.spring.SpringRouteBuilder;
import org.springframework.context.annotation.Configuration;

/**
 * Created by alexandrelourenco on 30/06/17.
 */

@Configuration
public class MyFirstCamelRoute extends SpringRouteBuilder {


    @Override
    public void configure() throws Exception {

        from("file:/Users/alexandrelourenco/Documents/apachecamelhandson?delay=1000&charset=utf-8&delete=true")
                .setHeader("CamelAwsS3Key", header("CamelFileName"))
                .to("aws-s3:arn:aws:s3:::apache-camel-handson?amazonS3Client=#s3Client")
                .convertBodyTo(String.class)
                .split().tokenize("\n")
                    .convertBodyTo(AccessLogDTO.class)
                    .log(LoggingLevel.INFO, "${body}")
                    .to("aws-sqs://MyInputQueue?amazonSQSClient=#sqsClient");

    }
}

On the route above, we define a file endpoint that will poll for files on a folder, each 1 second and remove the file if the processing is completed successfully. Then we send the file to Amazon using S3 as a backup storage.

Next, we split the file using a splitter, that generates a string for each line of the file. For each line we convert the line to a DTO, log the data and finally we send the data to a SQS.

Now that we have our code done, let’s run it!

Running

First, we start our Camel route. To do this, we simply run the main Spring Boot class, as we would do with any common Java program.

After firing up Spring Boot, we would receive on our console the output that the route was successful started:

. ____ _ __ _ _
 /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/ ___)| |_)| | | | | || (_| | ) ) ) )
 ' |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot :: (v1.5.4.RELEASE)

2017-07-01 12:52:02.224 INFO 3042 --- [ main] c.a.handson.ApacheCamelHandsonApp : Starting ApacheCamelHandsonApp on Alexandres-MacBook-Pro.local with PID 3042 (/Users/alexandrelourenco/Applications/git/apache-camel-handson/build/classes/main started by alexandrelourenco in /Users/alexandrelourenco/Applications/git/apache-camel-handson)
2017-07-01 12:52:02.228 INFO 3042 --- [ main] c.a.handson.ApacheCamelHandsonApp : No active profile set, falling back to default profiles: default
2017-07-01 12:52:02.415 INFO 3042 --- [ main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@475e586c: startup date [Sat Jul 01 12:52:02 BRT 2017]; root of context hierarchy
2017-07-01 12:52:03.325 INFO 3042 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.apache.camel.spring.boot.CamelAutoConfiguration' of type [org.apache.camel.spring.boot.CamelAutoConfiguration$$EnhancerBySpringCGLIB$$72a2a9b] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2017-07-01 12:52:09.520 INFO 3042 --- [ main] o.a.c.i.converter.DefaultTypeConverter : Loaded 192 type converters
2017-07-01 12:52:09.612 INFO 3042 --- [ main] roperties$SimpleAuthenticationProperties :

Using default password for shell access: b738eab1-6577-4f9b-9a98-2f12eae59828




2017-07-01 12:52:15.463 WARN 3042 --- [ main] tarterDeprecatedWarningAutoConfiguration : spring-boot-starter-remote-shell is deprecated as of Spring Boot 1.5 and will be removed in Spring Boot 2.0
2017-07-01 12:52:15.511 INFO 3042 --- [ main] o.s.j.e.a.AnnotationMBeanExporter : Registering beans for JMX exposure on startup
2017-07-01 12:52:15.519 INFO 3042 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Starting beans in phase 0
2017-07-01 12:52:15.615 INFO 3042 --- [ main] o.a.camel.spring.boot.RoutesCollector : Loading additional Camel XML routes from: classpath:camel/*.xml
2017-07-01 12:52:15.615 INFO 3042 --- [ main] o.a.camel.spring.boot.RoutesCollector : Loading additional Camel XML rests from: classpath:camel-rest/*.xml
2017-07-01 12:52:15.616 INFO 3042 --- [ main] o.a.camel.spring.SpringCamelContext : Apache Camel 2.18.3 (CamelContext: camel-1) is starting
2017-07-01 12:52:15.618 INFO 3042 --- [ main] o.a.c.m.ManagedManagementStrategy : JMX is enabled
2017-07-01 12:52:25.695 INFO 3042 --- [ main] o.a.c.i.DefaultRuntimeEndpointRegistry : Runtime endpoint registry is in extended mode gathering usage statistics of all incoming and outgoing endpoints (cache limit: 1000)
2017-07-01 12:52:25.810 INFO 3042 --- [ main] o.a.camel.spring.SpringCamelContext : StreamCaching is not in use. If using streams then its recommended to enable stream caching. See more details at http://camel.apache.org/stream-caching.html
2017-07-01 12:52:27.853 INFO 3042 --- [ main] o.a.camel.spring.SpringCamelContext : Route: route1 started and consuming from: file:///Users/alexandrelourenco/Documents/apachecamelhandson?charset=utf-8&delay=1000&delete=true
2017-07-01 12:52:27.854 INFO 3042 --- [ main] o.a.camel.spring.SpringCamelContext : Total 1 routes, of which 1 are started.
2017-07-01 12:52:27.854 INFO 3042 --- [ main] o.a.camel.spring.SpringCamelContext : Apache Camel 2.18.3 (CamelContext: camel-1) started in 12.238 seconds
2017-07-01 12:52:27.858 INFO 3042 --- [ main] c.a.handson.ApacheCamelHandsonApp : Started ApacheCamelHandsonApp in 36.001 seconds (JVM running for 36.523)

PS: Don’t forget it to replace the access key and secret with your own!

Now, to test it, we place a file on the polling folder. For testing, we create a file like the following:

10.12.64.3 /api/v1/test1 POST 123
10.12.67.3 /api/v1/test2 PATCH 125
10.15.64.3 /api/v1/test3 GET 166
10.120.64.23 /api/v1/test1 POST 100

We put a file with the content on the folder and after 1 second, the file is gone! Where did it go?

If we check the Amazon S3 bucket interface, we will see that the file was created on the storage:

 

Screen Shot 2017-07-01 at 13.12.39

And if we check the Amazon SQS interface, we will see 4 messages on the queue, proving that our integration is a success:

Screen Shot 2017-07-01 at 13.38.12

If we check the messages, we will see that Camel correctly parsed the information from the file, as we can see on the example bellow:

[10.12.64.3,/api/v1/test1,POST,123]

Implementing Error Handling

On Camel, we can implement logic designed for handling errors. These are done by defining routes as well, which inputs are the exceptions fired by the routes.

On our lab, let’s implement a error handling. First, we add a option on the file endpoint that makes the file to be moved to a .error folder when a error occurs, and then we send a email to ourselves to alert of the failure. we can do this by changing the route as follows:

package com.alexandreesl.handson.routes;

import com.alexandreesl.handson.dto.AccessLogDTO;
import org.apache.camel.LoggingLevel;
import org.apache.camel.spring.SpringRouteBuilder;
import org.springframework.context.annotation.Configuration;

/**
 * Created by alexandrelourenco on 30/06/17.
 */

@Configuration
public class MyFirstCamelRoute extends SpringRouteBuilder {


    @Override
    public void configure() throws Exception {

        onException(Exception.class)
                .handled(false)
                .log(LoggingLevel.ERROR, "An Error processing the file!")
                .to("smtps://smtp.gmail.com:465?password=xxxxxxxxxxxxxxxx&username=alexandreesl@gmail.com&subject=A error has occurred!");

        from("file:/Users/alexandrelourenco/Documents/apachecamelhandson?delay=1000&charset=utf-8&delete=true&moveFailed=.error")
                .setHeader("CamelAwsS3Key", header("CamelFileName"))
                .to("aws-s3:arn:aws:s3:::apache-camel-handson?amazonS3Client=#s3Client")
                .convertBodyTo(String.class)
                .split().tokenize("\n")
                    .convertBodyTo(AccessLogDTO.class)
                    .log(LoggingLevel.INFO, "${body}")
                    .to("aws-sqs://MyInputQueue?amazonSQSClient=#sqsClient");

    }
}

Then, we restart the route and feed up a file like the following, that will cause a parse exception:

10.12.64.3 /api/v1/test1 POST 123
10.12.67.3 /api/v1/test2 PATCH 125
10.15.64.3 /api/v1/test3 GET 166
10.120.64.23 /api/v1/test1 POST 10a

After the processing, we can see the console and watch how the error was handled:

2017-07-01 14:18:48.695  INFO 3230 --- [           main] o.a.camel.spring.SpringCamelContext      : Apache Camel 2.18.3 (CamelContext: camel-1) started in 11.899 seconds2017-07-01 14:18:48.695  INFO 3230 --- [           main] o.a.camel.spring.SpringCamelContext      : Apache Camel 2.18.3 (CamelContext: camel-1) started in 11.899 seconds2017-07-01 14:18:48.699  INFO 3230 --- [           main] c.a.handson.ApacheCamelHandsonApp        : Started ApacheCamelHandsonApp in 35.612 seconds (JVM running for 36.052)2017-07-01 14:18:52.737  WARN 3230 --- [checamelhandson] c.amazonaws.services.s3.AmazonS3Client   : No content length specified for stream data.  Stream contents will be buffered in memory and could result in out of memory errors.2017-07-01 14:18:53.105  INFO 3230 --- [checamelhandson] route1                                   : [10.12.64.3,/api/v1/test1,POST,123]2017-07-01 14:18:53.294  INFO 3230 --- [checamelhandson] route1                                   : [10.12.67.3,/api/v1/test2,PATCH,125]2017-07-01 14:18:53.504  INFO 3230 --- [checamelhandson] route1                                   : [10.15.64.3,/api/v1/test3,GET,166]2017-07-01 14:18:53.682 ERROR 3230 --- [checamelhandson] route1                                   : An Error processing the file!2017-07-01 14:19:02.058 ERROR 3230 --- [checamelhandson] o.a.camel.processor.DefaultErrorHandler  : Failed delivery for (MessageId: ID-Alexandres-MacBook-Pro-local-52251-1498929510223-0-9 on ExchangeId: ID-Alexandres-MacBook-Pro-local-52251-1498929510223-0-10). Exhausted after delivery attempt: 1 caught: org.apache.camel.InvalidPayloadException: No body available of type: com.alexandreesl.handson.dto.AccessLogDTO but has value: 10.120.64.23 /api/v1/test1 POST 10a of type: java.lang.String on: Message[ID-Alexandres-MacBook-Pro-local-52251-1498929510223-0-9]. Caused by: Error during type conversion from type: java.lang.String to the required type: com.alexandreesl.handson.dto.AccessLogDTO with value 10.120.64.23 /api/v1/test1 POST 10a due java.lang.NumberFormatException: For input string: "10a". Exchange[ID-Alexandres-MacBook-Pro-local-52251-1498929510223-0-10]. Caused by: [org.apache.camel.TypeConversionException - Error during type conversion from type: java.lang.String to the required type: com.alexandreesl.handson.dto.AccessLogDTO with value 10.120.64.23 /api/v1/test1 POST 10a due java.lang.NumberFormatException: For input string: "10a"]. Processed by failure processor: FatalFallbackErrorHandler[Pipeline[[Channel[Log(route1)[An Error processing the file!]], Channel[sendTo(smtps://smtp.gmail.com:465?password=xxxxxx&subject=A+error+has+occurred%21&username=alexandreesl%40gmail.com)]]]]
Message History---------------------------------------------------------------------------------------------------------------------------------------RouteId              ProcessorId          Processor                                                                        Elapsed (ms)[route1            ] [route1            ] [file:///Users/alexandrelourenco/Documents/apachecamelhandson?charset=utf-8&del] [      9323][route1            ] [convertBodyTo2    ] [convertBodyTo[com.alexandreesl.handson.dto.AccessLogDTO]                      ] [      8370][route1            ] [log1              ] [log                                                                           ] [         1][route1            ] [to1               ] [smtps://smtp.gmail.com:xxxxxx@gmail.com&subject=A error ha                    ] [      8366]
Stacktrace---------------------------------------------------------------------------------------------------------------------------------------
org.apache.camel.InvalidPayloadException: No body available of type: com.alexandreesl.handson.dto.AccessLogDTO but has value: 10.120.64.23 /api/v1/test1 POST 10a of type: java.lang.String on: Message[ID-Alexandres-MacBook-Pro-local-52251-1498929510223-0-9]. Caused by: Error during type conversion from type: java.lang.String to the required type: com.alexandreesl.handson.dto.AccessLogDTO with value 10.120.64.23 /api/v1/test1 POST 10a due java.lang.NumberFormatException: For input string: "10a". Exchange[ID-Alexandres-MacBook-Pro-local-52251-1498929510223-0-10]. Caused by: [org.apache.camel.TypeConversionException - Error during type conversion from type: java.lang.String to the required type: com.alexandreesl.handson.dto.AccessLogDTO with value 10.120.64.23 /api/v1/test1 POST 10a due java.lang.NumberFormatException: For input string: "10a"] at org.apache.camel.impl.MessageSupport.getMandatoryBody(MessageSupport.java:107) ~[camel-core-2.18.3.jar:2.18.3] at org.apache.camel.processor.ConvertBodyProcessor.process(ConvertBodyProcessor.java:91) ~[camel-core-2.18.3.jar:2.18.3] at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77) [camel-core-2.18.3.jar:2.18.3] at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:542) [camel-core-2.18.3.jar:2.18.3] at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:197) [camel-core-2.18.3.jar:2.18.3]

If we look to the folder, we will see that a .error folder was created and the file was moved to the folder:

Screen Shot 2017-07-01 at 14.25.18

And if we check the mailbox, we will see that we received the failure alert:

Screen Shot 2017-07-01 at 14.27.40

Conclusion

And so we conclude our tour through Apache Camel. With a easy-to-use architecture and dozens of components, it is a highly pluggable and robust option on integration developing. Thank you for following me on this post, until next time.

Scala: using functional programming on the JVM – part 3

Standard

Hi, dear readers! Welcome to my blog. On this post, the last on this series, we will continue to see more features from the Scala language. If you haven’t read the previous post, please go to the “programming languages” menu option to find all of the series. So, without further delay, let’s begin!

Collections

Collections, as the name implies, are data structures where we can store and organize data. There is various types of Collections that can be used on the Scala language, all common from any programming language and with all the standard behavior from their types, such as lists, sets and maps.

On the next sections, we will see the major methods that Scala offers us to work with their collections.

So, let’s fire up the Scala REPL and begin!

Filter

As the name, implies, filter can be used to filter data from a collection, generating a subset. Let’s begin by creating a List:

val mylist = List[Integer](1,2,3,4,5)

Next, we create a function that returns if a number is even:

def isEven(n:Integer) = n % 2 == 0

And finally, we used the filter function, printing on console the even numbers:

scala> mylist.filter(n => isEven(n)).foreach(println(_))

2

4

As we can see, it printed only 2 and 4 from our list, proving that our filtering was successful.

One important thing to note on this and the other methods is that none of the methods changes the original collection, they always create and returns a new one, since they are designed to work with immutables. We can check this by printing the list:

scala> print(mylist)

List(1, 2, 3, 4, 5)

Find

The find method is similar to the filter one, but instead of returning a subset, it returns only a element from the collection. The return is a optional, typed from the same type of the element type from the collection.

For this example, we will use the same collection from our previous example. If we wanted to return only the number 2 element and print on console, all we have to do is this:

scala> println(mylist.find(n => n == 2).getOrElse(0))

2

Map

The map is another common method for collections on programming languages. His objective is to take a collection and transform his elements on new elements, that could be from a different type, generating a new collection. Let’s see a example.

On our example, we will take the numbers from our previous list and create a new list, where the numbers are transformed on strings on the format “the number is x”. If we wanted to do this transformation and print the result on console, we can do the following:

scala> mylist.map(n => "the number is " + n).foreach(println(_))

the number is 1

the number is 2

the number is 3

the number is 4

the number is 5

Flatmap

Another interesting method is the flatmap. The flatmap is similar to a map, but with one difference: when used against complex objects of nested collections, this method denormalize the results, generating a flat collection. Let’s see a example.

First we create a class:

case class classA(val a: String, val b : List[String])

Then, we create a list with objects from our class:

scala> val myobjectlist = List(classA("A",List("A","B","C")),classA("B",List("A","C")),classA("C",List("C")),classA("D",List("A","B")))

myobjectlist: List[classA] = List(classA(A,List(A, B, C)), classA(B,List(A, C)), classA(C,List(C)), classA(D,List(A, B)))

Now, let’s see the result on the REPL, if we try to map our list, using only the b attribute:

scala> val mapobjectlist = myobjectlist.map(n => n.b)

mapobjectlist: List[List[String]] = List(List(A, B, C), List(A, C), List(C), List(A, B))

As we can see above, the result is a list of lists. This gives us a extra complexity to iterate over our results, since we will need to access each internal list individually in order to obtain all the elements.

Now let’s see the same result, using flatmap this time:

scala> val mapobjectflatlist = myobjectlist.flatMap(n => n.b)

mapobjectflatlist: List[String] = List(A, B, C, A, C, C, A, B)

Now, the list is flatten to a single List, allowing us to iterate over the elements much easier.

Reduce

Another useful feature when working with collections is the reduce method. With result, as on map’s case, we make a transformation on a list, but on this case, instead of generating a new collection, we aggregate the collection, generating a new value.

The simplest and easier example we can demonstrate is simply summing up the values. If we wanted to sum up all the values from our numeric list, all we need to do is this:

scala> println(mylist.reduce((sum,n) => sum+n))

15

A important thing to take note is that, on this case, the order from which the numbers will be iterate is from left to right. If we would like to explicit this ordering or reverse it, we could do this by using the reduceLeft or reduceRight methods instead.

Fold

Fold is pretty similar to the reduce method, but with a fundamental difference: while reduce obligates us that the result must be from the same type of the source elements, fold doesn’t. Let’s see a example to better understand it.

Let’s suppose that, different from our previous example, we wanted to generate a string from the numbers of our numeric collection, separated by parentheses. We can do this using the following:

scala>  val foldlistStr = mylist.fold("")((sum,n) => sum+"("+n+")")

foldlistStr: Comparable[_ >: Integer with String <: Comparable[_ >: Integer with String <: java.io.Serializable] with java.io.Serializable] with java.io.Serializable = (1)(2)(3)(4)(5)

scala> println(foldlistStr)

(1)(2)(3)(4)(5)

scala>

As we can see, on this case, we not only had to declare the folding method, but also a empty string at beginning. That it was the aggregator variable, which is then used at each iteration to form the aggregation. This is necessary in order to allow Scala to infer what it will be the type of the result of our folding operation.

Conclusion

And so we conclude our trip on the Scala language. I hope I could bring for the reader a glimpse of the language and all his power. While is not as popular as languages such as Java or C#, it is definitely a good language worthy to be considered, specially on distributed systems where it could be used with distributed tools, such as Akka.

Thank you for following me on my series, see you next time!

gRPC: Transporting massive data with Google’s serialization

Standard

Hello, dear readers! Welcome to my blog. On this post, we will delve on Google’s RPC style solution, gRPC and his serialization technique, protocol buffers. But what is gRPC and why it is so useful? Let’s find out!

The escalation of data transfer

When developing APIs, one of the most common ways of implementing the interface is with REST, using HTTP and JSON as transportation protocol and data schema, respectively. At first, there is no problem with this approach, and most of the time we won’t need to change from this kind of stack.

However, the problems begin when we get a API that has a big continuous volume of requests. On a situation like this, the amount of memory used on tasks like data transportation could begin to be a burden on the API, making calls more and more slow.

It is on this scenario that gRPC comes in handy. With a server-client model and a serialization technique from Google’s that allows us to shrink the amount of resources used on remote calls, we can scale more capacity per API instance, making our APIs more powerful.

This comes with a price, however: operations like debugging get more difficult on this scenario, because gRPC encapsulates the transportation logic on Google’s solution that utilizes binary serialization to do the transportation, so it is not as easily debuggable as a common plain REST/JSON application.

RPC

RPC is a acronym for Remote Procedure Call, a old model that was much used on the past. On that model, a client-server solution is developed, where the details of transport are abstracted from the developer, been responsible only for implementing the server and client inner logic. Famous RPC models were CORBA, RMI and DCOM.

Despite the name, gRPC only has the concept of a client-server application in common with the old solutions, that suffered from problems such as incompatible protocols between each other, alongside with not offering more advanced techniques from today, such as streams. Their solutions reminds more of the classical request-response model, from the first days of the web.

gRPC

So what is gRPC? This is Google’s approach to a client-server application that takes principles from the original RPC. However, gRPC allows us to use more sophisticated technologies such as HTTP2 and streams. gRPC is also designed as technology-agnostic, which means that can be used and interacted with server and clients from different programming languages.

With gRPC we develop gRPC services, which are generated based on a proto file we provide. Using the proto file, gRPC generates for us a server and a stub (some languages just call it a client) where we develop our server and client logic. The following diagram, taken from gRPC documentation, represents the client-server schematics:

grpc-1

As we can see on the diagram, gRPC supplies us with different types of languages to choose from to develop /expose our gRPC services. At the time of this post, gRPC supports Java, C++, Python, Objective-C, C#, a lite-runtime (Android Java), Ruby, JavaScript and Go – using the golang/protobuf library.

Protocol Buffers

Protocol buffer is gRPC’s serialization mechanism, which allows us to send compressed messages between our services, allowing us in turn to process more data with less network roundtrips between the parts.

According to Protocol Buffers documentation, Protocol Buffers messages offers the following advantages, if we compare to a traditional data schema such as XML:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically

The Protocol Buffers framework supplies us with several code generators for different programming languages. If we want to develop on Java, for instance, we download Protocol Buffers for Java, then we model a proto file where we design the schema for the messages we will transport and then we generate code using the protoc compiler.

The compiler will generate code for us in order to use for serialize/deserialize data on the format we provided on the proto file.

Types of gRPC applications

gRPC applications can be written using 3 types of processing, as follows:

  • Unary RPCs: The simplest type and more close to classical RPC, consists of a client sending one message to a server, that makes some processing and returns one message as response;
  • Server streams: On this type, the client sends one message for the server, but receives a stream of messages from the server. The client keeps reading the messages from the server until there is no more messages to read;
  • Client streams: This type is the opposite of the server streams one, where on this case is the client who sends a stream of messages to make a request for the server and them waits for the server to produce a single response for the series of request messages provided;
  • Bidirecional stream RPC: This is the more complex but also more dynamic of all the types. On this model, we have both client and server reading and writing on streams, which are stablished between the server and client. This streams are independent from each other, which means that could be possible for a client to send a message to a server by one stream and vice-versa at the same time. This allows us to make multiple processing scenarios, such as clients sending all the messages before the responses, clients and servers “ping-poinging” messages between each other and so on.

Synch vs. Asynch

As the name implies, synchronous processing occurs when we have a communication where the client thread is blocked when a message is sent and is been processed.

Asynchronous processing occurs when we have this communication with the processing been done by other threads, making the whole process been non-blocking.

On gRPC we have both styles of processing supported, so it is up to the developer to use the better approach for his solutions.

Deadlines & timeouts

A deadline stipulates how much time a gRPC client will wait on a RPC call to return before assuming a problem has happened. On the server’s side, the gRPC services can query this time, verifying how much time it has left.

If a response couldn’t be received before the deadline is met, a DEADLINE_EXCEEDED error is thrown and the RPC call is terminated.

RPC termination

On gRPC, both clients and servers decide if a RPC call is finished or not locally and independently. This means that a server can decide to end a call before a client has transmitted all their messages and a client can decide to end a call before a server has transmitted one or all of their responses.

This point is important to remember specially when working with streams, in a sense that logic must pay attention to possible RPC terminations when treating sequences of messages.

Channels

Channels are the way a client stub can connect with gRPC services on a given host and port. Channels can be configured specific by client, such as turning message compression on and off for a specific client.

Tutorial

On this lab we will implement a gRPC service, test by making some calls and experimenting a little with streams. It will get us a feel and a head start on how to develop solutions using gRPC.

Set up

For this lab we will use Python 3.4. For the coding, I like to use Pycharm, is a really good IDE for Python that the reader can find it here.

For containerization I used Docker 17.03.1-ce. I also recommend you create a virtual environment for the lab, using virtualenv.

Let’s create a new virtual environment for our lab. On a terminal, let’s type the following:

virtualenv --python=python3.4 -v grpc-handson

After running the command, we will see that a folder was created. That is our virtual environment.

To install gRPC, first we enable the virtual environment on our terminal session. we do this by typing the following, assuming we are inside the virtual environment’s folder:

source ./bin/activate

This will change our terminal, that will now show a prefix with our env’s name, showing it is enabled.

Now on to the installation. First, let’s install gRPC with pip by typing:

python -m pip install grpcio

We also need to install gRPC tools. This tools are responsible for installing the protoc compiler which we will use later on the lab. We install it by typing:

python -m pip install grpcio-tools

That’s it! Now that we have our environment, let’s start with the development.

Creating the gRPC Service definition

First, we create a gRPC Service definition. We create this by coding a proto file, which will have the following:

syntax = "proto3";

option java_multiple_files = true;
option java_package = "com.alexandreesl.handson";
option java_outer_classname = "MyServiceProto";
option objc_class_prefix = "HLW";

package handson;


service MyService {

    rpc MyMethod1 (MyRequest) returns (MyResponse) {
    }

    rpc MyMethod2 (MyRequest) returns (MyResponse) {
    }

}

message MyRequest {
    string name = 1;
    int32 code = 2;
}

message MyResponse {
    string name = 1;
    string sex = 2;
    int32 code = 3;
}

This file is based on examples from the official gRPC repo – you can find the link at the end of this post. Alongside setting some properties such as the service’s package, we defined a service with 2 methods and 2 schemas for the protocol buffers, used by the methods.

With the proto file created (let’s save it as my_service.proto), it is time for us to use gRPC to create the code.

Generating gRPC code

To generate the code, let’s run the following command, with our virtual environment enabled and on the same folder of the proto file:

 python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. my_service.proto

After running the command, we will see 2 files created, as we can see bellow:

grpc-2

PS: The source code for our lab can be found at the end of the post.

Building the server

Now that we have our code generated, let’s begin our work. Let’s begin by creating the server side.

The code from our command is auto-generated, so is not a good practice to code on them. Instead, for the server we will extend the servicer class, so we implement our own gRPC service without editing generated code.

Let’s create a file called gRPC_server.py and add the following code:

import time
from concurrent import futures

import grpc

import my_service_pb2 as my_service_pb2
import my_service_pb2_grpc as my_service_pb2_grpc

_ONE_DAY_IN_SECONDS = 60 * 60 * 24


class gRPCServer(my_service_pb2_grpc.MyServiceServicer):
    def __init__(self):
        print('initialization')

    def MyMethod1(self, request, context):
        print(request.name)
        print(request.code)
        return my_service_pb2.MyResponse(name=request.name, sex='M', code=123)

    def MyMethod2(self, request, context):
        print(request.name)
        print(request.code * 12)
        return my_service_pb2.MyResponse(name=request.name, sex='F', code=1234)


def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    my_service_pb2_grpc.add_MyServiceServicer_to_server(
        gRPCServer(), server)
    server.add_insecure_port('[::]:50051')
    server.start()
    try:
        while True:
            time.sleep(_ONE_DAY_IN_SECONDS)
    except KeyboardInterrupt:
        server.stop(0)


if __name__ == '__main__':
    serve()

On the code above we created a class that it is a subclass of the class generated by the protoc compiler. We can see that in order to get the request’s parameters, all we have to do is navigate from the request object. To generate a response, all we have to do is instantiate the appropriate class.

We also coded the serve method, where we created a server with 10 worker threads to serve requests and initialized with the class we created to implement the server-side.

To start the server, all we have to do is run:

python gRPC_server.py

And from now on, unless we kill the process, it will be answering on the 50051 port.

Now that we created the server, let’s begin the work on the client.

Constructing the client

In order to consume the server, we need to create a client for our stub. Let’s do this.

Let’s create a file called gRPC_client.py and code the following:

import grpc

import my_service_pb2 as my_service_pb2
import my_service_pb2_grpc as my_service_pb2_grpc


class gRPCClient():
    def __init__(self):
        channel = grpc.insecure_channel('localhost:50051')
        self.stub = my_service_pb2_grpc.MyServiceStub(channel)

    def method1(self, name, code):
        print('method 1')
        return self.stub.MyMethod1(my_service_pb2.MyRequest(name=name, code=code))

    def method2(self, name, code):
        print('method 2')
        return self.stub.MyMethod2(my_service_pb2.MyRequest(name=name, code=code))


def main():
    print('main')

    client = gRPCClient()

    print(client.method1('Alexandre', 123))
    print(client.method2('Maria', 123))


if __name__ == '__main__':
    main()

As we can see, is really simple to create a client: we just needed to establish a channel and instantiate a stub with it. Once instantiated, we just need to call the methods on the stub as we normally would do with any Pythonic object.

Now that we have the coding done, let’s test some calls!

Making the call

To test a call, let’s first start the server. With a terminal session opened and our virtual environment enabled, let’s start the server, by entering the command we talked about earlier:

python gRPC_server.py

And on another terminal, started like the previous one, we call the client by typing:

python gRPC_client.py

After firing up the client, we will see that the client produced the following output:

main
method 1
name: "Alexandre"
sex: "M"
code: 123

method 2
name: "Maria"
sex: "F"
code: 1234

And on the server terminal, we can see the following output:

initialization
Alexandre
123
Maria
1476

Success! We have implemented our first gRPC service. Now, let’s wrap it up our lab by seeing the last topics: using streams and containerization.

Using streams

To learn about streams, let’s add a new method, that it will be a bidirectional streaming.

First, let’s change the proto file, creating a new method, like the following:

syntax = "proto3";

option java_multiple_files = true;
option java_package = "com.alexandreesl.handson";
option java_outer_classname = "MyServiceProto";
option objc_class_prefix = "HLW";

package handson;


service MyService {

    rpc MyMethod1 (MyRequest) returns (MyResponse) {
    }

    rpc MyMethod2 (MyRequest) returns (MyResponse) {
    }

    rpc MyMethod3 (stream MyRequest) returns (stream MyResponse) {
    }

}

message MyRequest {
    string name = 1;
    int32 code = 2;
}

message MyResponse {
    string name = 1;
    string sex = 2;
    int32 code = 3;
}

That’s right, this is everything we need to do. After generating the files from the proto again, we need to modify our server to reflect the changes. We do this modifying the file as follows:

import time
from concurrent import futures

import grpc

import my_service_pb2 as my_service_pb2
import my_service_pb2_grpc as my_service_pb2_grpc

_ONE_DAY_IN_SECONDS = 60 * 60 * 24


class gRPCServer(my_service_pb2_grpc.MyServiceServicer):
    def __init__(self):
        print('initialization')

    def MyMethod1(self, request, context):
        print(request.name)
        print(request.code)
        return my_service_pb2.MyResponse(name=request.name, sex='M', code=123)

    def MyMethod2(self, request, context):
        print(request.name)
        print(request.code * 12)
        return my_service_pb2.MyResponse(name=request.name, sex='F', code=1234)

    def MyMethod3(self, request_iterator, context):
        for req in request_iterator:
            print(req.name)
            print(req.code)

            yield my_service_pb2.MyResponse(name=req.name, sex='M', code=123)


def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    my_service_pb2_grpc.add_MyServiceServicer_to_server(
        gRPCServer(), server)
    server.add_insecure_port('[::]:50051')
    server.start()
    try:
        while True:
            time.sleep(_ONE_DAY_IN_SECONDS)
    except KeyboardInterrupt:
        server.stop(0)


if __name__ == '__main__':
    serve()

As we can see, the new MyMethod3 receives a iterator of request messages and also send a series of responses as well, by using the yield keyword, which teaches Python to create a generator from our function. We can read more about the yield keyword on this link.

Now we modify the client. We also use a generator with the yield keyword, but on this case, we make the generator create each item at random time intervals, to simulate a real life application with data entering at intervals. The response stream is read with a simple for loop, that it will continue to run until there is no data to output:

import random
import time

import grpc

import my_service_pb2 as my_service_pb2
import my_service_pb2_grpc as my_service_pb2_grpc


class gRPCClient():
    def __init__(self):
        channel = grpc.insecure_channel('localhost:50051')
        self.stub = my_service_pb2_grpc.MyServiceStub(channel)

    def method1(self, name, code):
        print('method 1')
        return self.stub.MyMethod1(my_service_pb2.MyRequest(name=name, code=code))

    def method2(self, name, code):
        print('method 2')
        return self.stub.MyMethod2(my_service_pb2.MyRequest(name=name, code=code))

    def method3(self, req):
        print('method 3')
        return self.stub.MyMethod3(req)


def generateRequests():
    reqs = [my_service_pb2.MyRequest(name='Alexandre', code=123), my_service_pb2.MyRequest(name='Maria', code=123),
            my_service_pb2.MyRequest(name='Eleuterio', code=123), my_service_pb2.MyRequest(name='Lucebiane', code=123),
            my_service_pb2.MyRequest(name='Ana Carolina', code=123)]

    for req in reqs:
        yield req
        time.sleep(random.uniform(2, 4))


def main():
    print('main')

    client = gRPCClient()

    print(client.method1('Alexandre', 123))
    print(client.method2('Maria', 123))

    res = client.method3(generateRequests())

    for re in res:
        print(re)


if __name__ == '__main__':
    main()

If we restart the server and run the new client, we will receive the following output:

main
method 1
name: "Alexandre"
sex: "M"
code: 123

method 2
name: "Maria"
sex: "F"
code: 1234

method 3
name: "Alexandre"
sex: "M"
code: 123

name: "Maria"
sex: "M"
code: 123

name: "Eleuterio"
sex: "M"
code: 123

name: "Lucebiane"
sex: "M"
code: 123

name: "Ana Carolina"
sex: "M"
code: 123


Process finished with exit code 0

Please note that all this calls were made using the synchronous approach, so the client thread is locked each time a call is made. If we wanted to call the server asynchronously, we would use Python’s futures to do so. I suggest the reader to explore this option as a post-lab exercise.

Running gRPC on Docker

Finally, we would want to run our gRPC server on a Docker container. That is a very simple task to do.

First, let’s create a Dockerfile like the following:

FROM grpc/python:1.0-onbuild
CMD [ "python", "./gRPC_server.py" ]

Yup, that’s all! This is a official image from gRPC which makes a pip install on a requirements file and adds the current directory on the container, so we don’t need to add the files ourselves. Let’s build the container:

docker build -t alexandreesl/grpc-lab .

And finally, launch it with our image:

docker run --rm -p 50051:50051 alexandreesl/grpc-lab

If we run again the client, we will see that the server was successfully started. If the reader want, you can also start the container directly by a image I created on Docker Hub for this lab, already built with the source code from our lab. To pull the image, just type the following:

docker pull alexandreesl/grpc-lab

Conclusion

And so we concludes another journey on our great world of technology. I hope I could help the reader to understand what is gRPC and how it can be used to improve our capacity on high demanding API scenarios. Thank you for following me on this post, see you next time.

Continue reading

Scala: using functional programming on the JVM – part 2

Standard

Hi, dear readers! Welcome to my blog. On this post, we will continue to see more features from the Scala language, such as abstract classes, traits and optionals. If you haven’t read the previous post, please go to the “programming languages” menu option to find all of the series. So, without further delay, let’s begin!

Abstract classes

Abstract classes on Scala are just like in any other OO language, that is, they are classes that have methods without implementation, that must be implemented by other classes in order to be used.

On Scala, we can create a abstract class like this, for example:

abstract class MyAbstractClass {
 def methodA(str: String): Set[String]
}

On this code, we are creating a abstract class MyAbstractClass and declaring a method called methodA which has a string as parameter and returns a Set of strings.

In order to implement the class, we could have a class as follows:

class MyAbstractClassImpl extends MyAbstractClass {
 
 def methodA(str: String): Set[String] = ???

}

On this code, we are extending the abstract class – on Scala, like Java, we can’t have multiple inheritance, so we can just extend one class – and provide a empty implementation for the method, with the keyword ???. This keyword produces the equivalent on Java as when we create a method that throws a NotImplementedError. We can see this if we try to instantiate and call the method, which will give us the following output:

scala.NotImplementedError: an implementation is missing

at scala.Predef$.$qmark$qmark$qmark(Predef.scala:284)

at MyAbstractClassImpl.methodA(MyAbstractClassImpl.scala:3)

at Main$.delayedEndpoint$Main$1(Myscript.scala:17)

at Main$delayedInit$body.apply(Myscript.scala:1)

at scala.Function0.apply$mcV$sp(Function0.scala:34)

at scala.Function0.apply$mcV$sp$(Function0.scala:34)

at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

at scala.App.$anonfun$main$1$adapted(App.scala:76)

at scala.collection.immutable.List.foreach(List.scala:378)

at scala.App.main(App.scala:76)

at scala.App.main$(App.scala:74)

at Main$.main(Myscript.scala:1)

at Main.main(Myscript.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

........omitted........

On the next post on the series, we will see how Scala’s inheritance mechanisms work on more detail. For now, let’s move on to our next topic, Traits.

Traits

Traits can be thought out like interfaces. With traits, we can create several different contracts to standardize our classes, while also providing default implementations for any method that requires it – just like default methods from Java 8 onwards.

To create a trait with 2 methods, one with a implementation and one without it, we can code like this:

trait MyLogger {
 
 def logPrintln(msg: String): Unit = println(msg)

 def log(msg: String): Unit

}

On this code, we declared 2 methods that receive a string as parameter and have void returns, one with a implementation and one without it. To test multiple traits inheritance, let’s create another trait as follows:

trait MyMathLibrary {
 
 def add(a: Double, b: Double): Double = a + b

}

If we wanted our previous class to implement our traits as well, we could just change the code as follows:

class MyAbstractClassImpl extends MyAbstractClass with MyLogger with MyMathLibrary {
 
def methodA(str: String): Set[String] = Set[String]("a","b","c")

def log(msg: String): Unit = { 

 println("this log is the same as the other method")
 println(msg)

 }

}

On the code we see that we chained the traits with the with keyword. We also provided a implementation for the abstract class’s method so we don’t receive a not implemented exception anymore.

 

Sealed traits & classes

Another cool feature from Scala are sealed classes and traits. If we want a class or trait to be prohibited of been extended outside of their own source file, we use the keyword sealed. This is particularly useful when implementing libraries, in order to prevent users from the library from changing the behavior of the library.

To seal a class or trait, we just change like this:

sealed abstract class MyAbstractClass {
 def methodA(str: String): Set[String]
}

Now, if we try to compile our code, we will receive the following error:

MyAbstractClassImpl.scala:1: error: illegal inheritance from sealed class MyAbstractClass

class MyAbstractClassImpl extends MyAbstractClass with MyLogger with MyMathLibrary {

                                  ^

one error found

Showing that our seal was successful. To allow our class to compile again without removing the seal, the only way is moving the abstract class to the same file of the implementation, like the following:

sealed abstract class MyAbstractClass {
 def methodA(str: String): Set[String]
}

class MyAbstractClassImpl extends MyAbstractClass with MyLogger with MyMathLibrary {
 
def methodA(str: String): Set[String] = Set[String]("a","b","c")

def log(msg: String): Unit = {

println("this log is the same as the other method")
 println(msg)

}

}

If we try to compile again, we will see that now our class can compile again as normal.

Optionals

Optionals on Scala are called options. With options, we can create code that it is resilient, since we won’t need to worry about shielding our code from null values.

When working with options, we can instantiate the Option type using 2 alternatives:

  • Some(value): the Some keyword allows us to return a value on optionals;
  • None: the None keyword allow us to represent the null value, that is, the absence of value;

Also, with options, we have two ways to get a value:

  • get: using this method, we receive the value inside the option, or a NoSuchElementException if the value is null;
  • getOrElse(value): using this method, we receive the value inside the option, or the value passed by parameter if the value is null. This way, we can guarantee a default value in case the data doesn’t exist;

Let’s see a example. On our REPL, let’s create a Map:

val mymap = Map(
 ("1", "value 1"),
 ("2", "value 2")
 )

Next we get values from the map. If we try to get values that exist and don’t exist on the map with the getOrElse method, we receive this output on console:

Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_73).
Type in expressions for evaluation. Or try :help.

scala> val mymap = Map(
 | ("1", "value 1"),
 | ("2", "value 2")
 | )
mymap: scala.collection.immutable.Map[String,String] = Map(1 -> value 1, 2 -> value 2)

scala> val value1 = mymap.get("1")
value1: Option[String] = Some(value 1)

scala> val value2 = mymap.get("2")
value2: Option[String] = Some(value 2)

scala> val value3 = mymap.get("3")
value3: Option[String] = None

scala> val value4 = mymap.get("4")
value4: Option[String] = None

scala> println(value1.getOrElse("X"))
value 1

scala> println(value2.getOrElse("X"))
value 2

scala> println(value3.getOrElse("X"))
X

scala> println(value4.getOrElse("X"))
X

scala>

This shows that optionals are a viable option on dealing with optional values on the Scala language.

Error handling

As any other language, Scala also have a error handling system. Like Java, Scala also use exceptions as forms to encapsulate errors. Previously we have seen the ??? keyword and how we receive a NotImplementedError if we try to use a method with that keyword. If we wanted to explicit do what the keyword encapsulates, we could do this:

def methodA(str: String): Set[String] = throw new NotImplementedError()

We can see that it is pretty much very straightforward from anyone who has a background on Java. The catching of exceptions are very similar to Java also, like on the following code, supposing that our method throws several types of exceptions:

try {
 methodA("test")
} catch {
 case e: IOException => println("IO exception")
 case e: Exception => println("general exception")
 case _ => println("general error")
}

Of course, we also have the finally block, that could be used as follows:

try {
 methodA("test")
} catch {
 case e: IOException => println("IO exception")
 case e: Exception => println("general exception")
 case _ => println("general error")
} finally {
 println("this executes no matter what")
}

Did you notice the “_”? That keyword was used to catch not only exceptions, but also error. On Scala we have a exception hierarchy that it is pretty much very similar to his Java counterpart, with two classes, Error and Exception, that extends from a root class called Throwable.

However, there is a key difference: Scala doesn’t have checked exceptions. That means we don’t have exceptions marked on method’s signatures as throwable neither we have the obligation to catch any exceptions that are thrown by a method. This can be considered a bad thing specially when we don’t known all the details from a code we are consuming, but it gives us flexibility to catch the exceptions wherever we want to.

Inheritance on Scala

On Scala, we have 3 types of inheritance, as follows:

  • Invariant: invariant inheritance means that only the exact type is allowed;
  • Covariant: covariant inheritance means that only the exact type and their subclasses are allowed;
  • Contravariant: contravariant inheritance means that only the exact type and their superclasses are allowed;

When using generics on Scala we use square brackets ([]).  When declaring the generic type, we could indicate if it is covariant or contravariant using the “+” and “-” symbols respectively. So, if we wanted to create a generic class to be used for a class and their subclasses, we could declare as:

class mygenericclass[+T](val id: T)

And on the opposite side, if we wanted the class to be using a class and their superclasses, we could declare as:

class mygenericclass[-T](val id: T)

On functions, however, there is a role that must be always remembered: On functions, all the parameters are contravariant, that is, they accept values from the declared type or supertypes, and the return is always covariant, in other words, it accepts values from the declared type or their subtypes.

Implicits

One last feature we will visit on this lab are implicits. With implicits, we can wrap it up classes that already exists with new features, without needing to extend or overload the original class. Even classes from the standard libraries can be wrapped this way!

Let’s see a example. On the REPL, we create a class like this:

case class myclass(val a:String, val b:String)

Now, let’s try to instantiate and use a print method on the class:

scala> val instance = new myclass("a","b")

instance: myclass = myclass(a,b)

scala> instance.print

<console>:13: error: value print is not a member of myclass

       instance.print

                ^

scala>

Of course, we got a error, since this method doesn’t exist. Now, we create a wrapper class:

implicit class myclasswrapper(mycl:myclass) { def print = println(mycl.a+mycl.b) }

 

Notice the implicit keyword? That means our class was created as a implicit, meaning that if we try to invoke the print method again:

scala> instance.print

ab

It will now work, as Scala is implicit converting our class to a myclasswrapper. Please note that, before Scala 2.10, we would need to create a method with the implicit keyword and make the wrapping by hand, instead of the useful declaration on the class level.

It is important to take caution, however, of not abusing of implicits, since we can change the behavior of basically everything on the language, making a application very unpredictable if the feature is overused!

Conclusion

And that concludes our second part on the Scala series. Next, on our last part, we will learn about collections and all that we can benefit from it. Thank you for your attention, until next time!

Scala: using functional programming on the JVM – part 1

Standard

Hello, dear readers! Welcome to my blog. On this post, we will talk about Scala, a powerful language that combines the object paradigm with the functional paradigm. Scala is used on several modern solutions, such as Akka.

Scala is a JVM-based language, which means that Scala programs are transformed in Java bytecode and them are run with the JVM. This guarantees that the robust JVM is used on the background, leaving us to use the rich Scala language for programming.

This is a 3-part series focused on learning the basis of the language. On this first part we will set up our environment and learn about the Scala type system, vars, vals, classes, case classes, objects, companion objects and pattern matching. On the other parts, we will learn other features such as traits, optionals, error handling, inheritance on Scala, collection-related operations such as map, folder, reduce and more. Please don’t miss out!

So, without further delay, let’s begin our journey on the Scala language!

Setting up

In order to prepare our lab environment, first we need to install Scala. You can download the last version of Scala – this lab is using Scala 2.12.1 – on this link. If you are using Mac and homebrew, the installation is as simple as running the following command:

brew install scala

In order to test the installation, run the command:

scala -version

This will print something like the following:

Scala code runner version 2.12.1 -- Copyright 2002-2016, LAMP/EPFL and Lightbend, Inc.

REPL

The REPL is a interactive shell for running Scala programs. The name stands for the sequence of operations it realizes: Read-Eval-Print-Loop. It reads information inputed by the user, evaluates the instruction, prints the result and start over (loops). In order to use the Scala REPL, all we have to do is type scala on a terminal. This will open the REPL shell, like the following snippet:

Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_73).

Type in expressions for evaluation. Or try :help.

scala>

When we are done with the REPL, all we have to do is press Crtl+C. Another way of running Scala programs is by creating Scala scripts (.scala files). When using Scala scripts, we first compile the script using the scalac command.

This hints a important thing to notice about Scala: Scala is not dynamic typed. It has some similarities in syntax with languages like Python, but we have to remember that it is static typed, as we will see on the next section.

Scala type system

As we talked before, Scala is compiled, opposed to other languages such as Python, Clojure etc. This means that when we write programs on Scala, the interpreter infers the type of a variable (immutable or not) by the type of value that it is attributed to. Let’s see this in action.

Let’s open the Scala REPL. We type var number=0 and hit enter. The following will be printed on our console:

scala> var number=0

number: Int = 0

As we can notice, the variable was defined as a integer, since we attributed a number to it. The reader could be thinking “but this is exactly like a dynamic typed language!”. It appears so at first, but here is a catch: if we try to change the variable to another type of value, this happens:

scala> number="a string"

:12: error: type mismatch;

 found   : String("a string")

 required: Int

       number="a string"

              ^
scala>

The interpreter throws a error, saying that the variable we defined previously is a integer, so we can’t change to a string, for instance. This is fundamentally different from dynamic typed languages, where we can change the type of a variable as much as we like.

This could be seen as a weak point depending on the point of view, but must be more seeing as a design choice: using a strong typed scheme, we have more security about knowing what exactly to expect from each variable in use on the system.

This is particularly important on the functional paradigm, where we normally use more immutable variables them mutable ones, as we will talk about on the next section. One last thing before we go: although we can use the interpreter inference to create the variables, we can also explicitly define the type during the creation, like with the following variable:

scala> var number2: Int = 1

number2: Int = 1

scala>

Var vs. Val

On Scala, we can declare variables using 2 keywords: var and val. The creation code on the 2 options is essentially the same, but there’s a primary difference between the 2: vars can have theirs values changed during their lifecycles, while vals can’t.

That means vals are immutable. The closest equivalent example we can have on Java code is a constant, which means that once declared, his value will never be changed again.

When working with the functional programming paradigm, essentially we use immutables most of the time. With immutables, we have the security that our functions will always behave as intended, since a function won’t change the data, making new runs with the same parameters always returns the same results.

Let’s test if vals can’t really be changed. Let’s create a string typed val, with the following code:

val mystring = "this is a string"

Then, we try to change the string. When we do this, we will receive the following:

scala> mystring = "this is a new string"

:12: error: reassignment to val

       mystring = "this is a new string"

                ^

scala>

The interpreter has complained that we are trying to change a val, proving that vals are indeed immutable.

Classes

On Scala, everything runs on a object. That’s why despite the fact that Scala allows us to develop using the functional paradigm, we can’t say that Scala is a pure functional programming language, like Haskell, for example.

On Scala’s object hierarchy, the root class for all classes is called Any. This class has 2 subclasses: AnyValue and AnyRef. AnyValue is the root class for primitive values such as integers, floats etc – all primitives on Scala are internally wrappers. AnyRef is for classes that are not primitives, like the classes we will develop on the lab, for example.

So, let’s create our first class! to do this, let’s create a file called Myclass.scala and enter the following code:

class Myclass(val myvalue1: Int, val myvalue2: String)

That’s right. All we have to do is this one line of code, and we have a complete class at our disposal! On this line, we created a class called Myclass, with 2 attributes: myvalue1 and myvalue2. Not only that, with this line we created a constructor that receives the 2 attributes as parameters and getter accessors. All of this with just one line!

The reason because Scala created the attributes to be set at object creation is because we declared the attributes as immutables. If we had declared them as vars, then Scala would have created setter accessors as well.

Since we are talking about constructors, it is important to know that we can also overload the constructor, by defining the constructor with the keyword this. For example, if we would like to have the option of a constructor that don’t need to pass the attributes, instead using default values, we could change the class like this:

class Myclass(val myvalue1: Int, val myvalue2: String) {

  def this() = this("", "")

}

Case classes

Another interesting thing about classes are case classes. With case classes, we have a class that has already coded the hashCode, equals and toString methods. How do we do this? Simple, by modifying our class as follows:

case class Myclass(val myvalue1: Int, val myvalue2: String) {

  def this() = this("", "")

}

That’s all we have to do, we just have to include the keyword case and the methods are implemented with a default implementation. That is another good example of how Scala can simplify the developer’s life.

Objects

We talked earlier about how everything on Scala are classes. However, there are cases when we want a class to have only one instance on the entire system. We commonly call this type of class Singletons. To achieve this on Scala, we declare objects.

Objects are like classes on their body, just that they can’t be instantiated, since they already are instances. Let’s create a simple Hello World script in order to learn how to create objects.

Let’s create a file called Myscript.scala. On the file, we code this:

object Myscript extends App {

print("Hello World!")

}

And then we compile with scalac Myscript.scala. When running with scala Myscript, we get the following on the console:

Hello World!%

The App that we extended with is the hint for Scala that this object is the main script for our Scala application to run. We will see more about inheritance on future parts of this series.

Companion objects

Companion objects are like the ones we just saw previously, with just one big difference: this objects must have the same name of a class, be declared on the same file of that class and they have access to attributes and methods from that class, even the private ones.

The use of companion classes could be to create factory methods. One example of this use is the case classes we saw before, that create methods such as toString for us. Internally, when we declare case classes, Scala creates a companion object for that class.

Pattern matchers

The last feature we will talk about are pattern matchers. With pattern matchers, we can run pieces of codes by case statements, similar with switch clauses on Java. Let’s see a example.

We will use the Myclass class we created earlier. Let’s suppose we have a scenario where we want to perform a different print depending on the value of the myvalue1 attribute and print the value itself if it doesn’t fit on any of the clauses. We can do this by coding the following:

object Myscript extends App {

case class Myclass(val myvalue1: Int, val myvalue2: String)

val myclass = new Myclass(1,"Myvalue2")

val result = myclass match {
 case Myclass(1, _) => "this is value 1"
 case Myclass(2, _) => "this is value 2"
 case m => s"$m"
 }

print(result)

}

On the code above, we stated that if we have a class with the value 1 as first attribute – the second one is defined with the “_” keyword, which means that we are accepting any value for that attribute – we output the string “this is value 1”, the string “this is value 2” for the 2 value and we will output the values from the class itself for any other value. If we run the code above, we will receive this message on the terminal:

this is value 1%

Showing that our code is correct. One important thing to notice, due to good practices recommended for Scala, is that when using pattern matchers, when you get the content from the variable been matched – the case of our last clause – always use lower-case only names. That is because when declaring the name starting with a upper-case letter, the Scala interpreter will try to find a variable with that name, instead of creating a new one. So, always remember to use lower-case variables on this cases.

Conclusion

And that concludes our first trip to the Scala language. On our next parts, we will see more interesting features of the language, such as traits, inheritance and optionals. Stay tuned!

Thanks you for your attention, until next time.

Refactoring:improving the design of existing code (book review)

Standard

Hi, Dear readers! Welcome to my blog. On this post, I will review a famous book of Martin Fowler, which focus on a refactoring techniques. But after all, why refactoring matters?

Definition

According to Fowler, a refactoring consists of modifying code in order to improve his readability and capacity to change, without changing his behavior. When refactoring, our objectives is to make the code easier to be read by humans and also improving his structure and design, making changes motivated by business rules easier to implement. Other benefits are that a cleaner code makes it easier to spot bugs, alongside fastening the development of new code on top of a well organized production code.

When refactor?

Fowler defends that refactoring should be done on 3 situations:

  • When you add a new functionality;
  • When you find a bug;
  • When you do a code review;

On this situations, you are forced to make changes on the code structure, making ideal situations for refactoring.

Pitfalls

When refactoring, there is some common pitfalls that could hinder the refactoring. The most common ones are the databases and the interfaces from the code.

Database schemas could be hard to change, specially if the database is old with millions of rows. This produces a splash effect on the code that manipulates the database, making more difficult to make changes on the code. Also the interfaces (APIs, libraries or even a single class inside a component) could be a challenge to refactoring, since a change on a interface could cascade to a change on lots of client’s code.

In order to solve this problem, the better approach for databases is to isolate the database’s logic on his own layer, allowing the “dirtier” code to be evolved in a more controlled manner. As for the interfaces issue, the better approach is to allow the old and new interfaces to coexist, while a migration work is conducted.

When not refactor?

According to Fowler, there is one situation when you shouldn’t refactor: when the code is so bad, that it is better to be written from scratch. This is a difficult rule to be measured as to when the code is bad enough to be rewritten. Some good hints could be if the code is infested by bugs or if it is identified that it has so much refactoring points, that fixing it up could end up rewritten most of the code.

Refactoring and performance

Sometimes, when refactoring, we could incur on refactorings that cause some performance degradation. Of course that it is up to the business to measure up how much this degradation is unbearable to meet the requirements, but as a general rule, we can assume that a more organized code is a easier code to fine tune. So, if we refactor first and improve his readability and design, it will be easier to make a performance tune later.

Unit testings

Another key point defended by Fowler is the need to develop unit tests for the code. With unit tests, we can develop refactorings in small steps (“baby steps”), receiving rapid feedback from the tests, so if anything breaks we can easily and fast make fixes during the refactoring process.

Refactoring catalog

Here there are some brief descriptions of some of the refactoring patterns that I found more interesting. Complete descriptions with examples can be found on Fowler’s book, that you can find on the links at the end of this post.

Extract Method

This refactoring consists of taking some code that can be grouped together and extract the code as his own method, this way improving readability.

Introduce Explaining Variable

This refactoring consists of taking a big and complex conditional and simplifying by turning his operators onto variables, this way making the conditional more self explanatory.

Replace Method with Method Object

This refactoring consists of a situation when you have a method that is better to have some code extracted to his own method, but it refers to a lot of variables that hinders the operation. On this case, this refactoring applies, consisting of taking all the variables and the method and moving to a new object, making a easier environment to make the extract method refactoring.

Move Method

This refactoring consists of moving a method from one class to another. This makes sense when the old class has less uses for his own method then the class he is moved to.

Extract Class

This refactoring consists of when you have a class that it is doing work that could be better organized if divided in two. On this case, we move the common behavior and data (methods and fields) that could form a new class and move it, making a delegation from the old one.

Remove Middle Man

This refactoring consists of when a class has lots of delegating methods to another class, which introduces unnecessary code. On this case, we create a accessor to the instance of the object itself, making it so the callers can call the methods from the class themselves, so after creating the accessor we remove all delegating methods.

Consolidate Conditional Expression

This refactoring consists of when you have several conditionals that returns the same value. On this case, we refactor the conditionals by creating a single one, commonly by creating a method, making the code more clear and simple.

Remove Control Flag

This refactoring consists of when you have a conditional flag that controls the behavior inside a loop. By using control commands like break and continue, we can remove the control flag, simplifying the code.

Replace Conditional with Polymorphism

This refactoring consists of when you have a method on a class that has conditional behavior depending on the type of the object. On this case, we extract each leg of the conditional and create a subclass around the different behaviors, until the method turns out to be empty, in which case we turn the method to abstract on the now superclass of the hierarchy. We may have to change the constructor of the class on a factory method.

Introduce Nul Object

This refactoring consists of when we have various null checks for data on the callers of a object. On this case, we create a object to represent null, that returns all the default data that should be used when the data was null. This way, we don’t have to make checks for null on the callers anymore, since the behavior on the null object will cover the check’s circumstances.

Preserve Whole Object

This refactoring consists of when you have a method call that is preceded by calling several data accessors to get lots of data from other object to pass as parameters for the call. On this case, we change the method to pass the object itself, removing the calls from the data accessors by moving them to inside the method itself. This refactoring not only simplifies the code on the caller’s side, but also simplifies changes if the method needs more data from the object passed by parameter on the future.

Replace Constructor with Factory Method

This refactoring consists of when we want to include more behavior on a constructor then it normally has it. This is specially true on class hierarchies, where the object construction must reflect the corresponding subclass depending on the type of the object. On this case, we change the constructors to a more restrictive access (private or protected)  and create a static factory method at the top of the hierarchy, allowing the dynamic creation of the objects.

Replace Error Code with Exception

This refactoring comes in handy when we have code that returns error codes when something breaks. Error codes are common on languages such as Unix and C, but on Java, we have a much more powerful tool: exceptions. With exceptions, we can easily separate the code that fix the errors from the normal code. So in this case, our best approach is to change the error code’s return to exception throws, which make a much more readable and organized structure.

Replace Exception with Test

This refactoring occurs when we have a code on a try-catch block, that has on the catch block some code that could be moved to be performed before the error occurs. This is typically found when we have predicted errors that can occur on some cases, but we use the catch block as part of our program’s logic. By changing the logic on the catch block to a test (if) before the code that breaks, we can remove the try -catch altogether, making a better readable and consistent code, that doesn’t rely on errors to work.

Extract Subclass

This refactoring consists of when we have some methods and fields that are used only on some instances of the class. On this case, we move the methods and fields to a new subclass, where they could be better organized and maintained.

Extract Superclass

This refactoring is opposed to the previous one, since we create a superclass instead of a subclass. If we have identified common behavior from two different classes, we create a superclass with the common behavior from the two and make both of them subclasses from the created superclass.

Form Template Method

This refactoring consists of when we have two classes that has methods with equal or very similar logic, that needs to be called on a certain order. On this case, we equalize the interfaces of the methods of both classes to be equal and creates a superclass where we move the common methods from both classes and create a orchestration method for the order of the calls. This improves the code on reusability and hierarchy organization.

Conclusion

And so we conclude our introduction to Martin Fowler’s Refactoring book. With good didactic and good examples, the book is a must read that I highly recommend!

Thank you for following me on this post, until next time.

Buy the book now!

Curator: Implementing purge routines on your Elasticsearch cluster

Standard

Hi, dear readers! Welcome to my blog. On this post, we will learn how to use the Curator project to create purge routines on a Elasticsearch cluster.

When we have a cluster crunching logs and other data types from our systems, it is necessary to configure process that manages this data, doing actions like purges and backups. For this purpose, the Curator project comes in handy.

Curator is a Python tool, that allows several types of actions. On this post, we will focus on 2 actions, purge and backup. To install Curator, we can use pip, like the command bellow:

sudo pip install elasticsearch-curator

Once installed, let’s begin preparing our cluster to make the backups, by a backup repository. A backup repository is a Elasticsearch feature, that process backups and save them on a persistent store. On this case, we will configure the backups to be stored on a Amazon S3 bucket. First, let’s install AWS Cloud plugin for Elasticsearch, by running the following command on each of the cluster’s nodes:

bin/plugin install cloud-aws

And before we restart our nodes, we configure the AWS credentials for the cluster to connect to AWS, by configuring them on the elasticsearch.yml file:

cloud:
  aws:
    access_key: <access key>
    secret_key: <secret key>

Finally, let’s configure our backup repository, using Elasticsearch REST API:

PUT /_snapshot/elasticsearch_backups
{
 “type”: “s3”,
 “settings”: {
 “bucket”: “elastic-bckup”,
 “region”: “us-east-1”
 }
}

On the command above, we created a new backup repository, called “elasticsearch-backups”, also defining the bucket where the backups will be created. With our repository created, let’s create our YAMLs to configure Curator.

The first YAML is “curator-config.yml”, where we configure details such as the cluster address. A configuration example could be as follows:

client:
  hosts:
    — localhost
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  aws_key:
  aws_secret_key:
  aws_region:
  ssl_no_validate: False
  http_auth:
  timeout: 240
  master_only: False
logging:
  loglevel: INFO
  logfile:
  logformat: default
  blacklist: [‘elasticsearch’, ‘urllib3’]

The other YAML is “curator-action.yml”, where we configure a action list to be executed by Curator. On the example, we have indexes of data from Twitter, with the prefix “twitter”, where we first create a backup from indexes that are more then 2 days old and after the backup, we purge the data:

actions:
 1:
   action: snapshot
   description: >-
     Make backups of indices older then 2 days.
   options:
     repository: elasticsearch_backups
     name: twitter-%Y.%m.%d
     ignore_unavailable: False
     include_global_state: True
     partial: False
     wait_for_completion: True
     skip_repo_fs_check: False
     timeout_override:
     continue_if_exception: False
     disable_action: False
   filters:
   — filtertype: age
     source: creation_date
     direction: older
     unit: days
     unit_count: 2
     exclude:
  2:
    action: delete_indices
    description: >-
      Delete indices older than 2 days (based on index name).
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    — filtertype: pattern
      kind: prefix
      value: twitter-
      exclude:
    — filtertype: age
      source: name
      direction: older
      timestring: ‘%Y.%m.%d’
      unit: days
      unit_count: 2
      exclude:

With the YAMLs configured, we can execute Curator, with the following command:

curator — config curator-config.yml curator-action.yml

The command will generate a log from the actions performed, showing that our configurations were a success:

2016–08–27 16:14:36,576 INFO Action #1: snapshot
2016–08–27 16:14:40,814 INFO Creating snapshot “twitter-2016.08.27” from indices: [u’twitter-2016.08.14', u’twitter-2016.08.25']
2016–08–27 16:15:34,725 INFO Snapshot twitter-2016.08.27 successfully completed.
2016–08–27 16:15:34,725 INFO Action #1: completed
2016–08–27 16:15:34,725 INFO Action #2: delete_indices
2016–08–27 16:15:34,769 INFO Deleting selected indices: [u’twitter-2016.08.14', u’twitter-2016.08.25']
2016–08–27 16:15:34,769 INFO — -deleting index twitter-2016.08.14
2016–08–27 16:15:34,769 INFO — -deleting index twitter-2016.08.25
2016–08–27 16:15:34,860 INFO Action #2: completed
2016–08–27 16:15:34,861 INFO Job completed.

That’s it! Now it is just schedule this script to execute from time to time – once per day, for example – and we will have automated backups and purges.

Thank you for following me on this post, until next time.

Docker: using containers to implement a Microservices architecture

Standard

Hello, dear readers! Welcome to another post from my blog. On this post, we will talk about the virtualization technology called Docker, which gained a ton of popularity this days, specially when we talk about Microservices architectures. But after all, what it is virtualization?

Virtualization

Virtualization, as the name implies, consists in the use of software to emulate any part of a environment, ranging from hardware components to a entire OS. In this world, there is several traditional technologies such as VMWare, Virtualbox, etc. This technologies, however, use hypervisors, which creates a intermediate layer responsible for isolating the virtual machine from the physical machine.

On containers, however, we don’t have this hypervisor layer. Instead, containers run directly on top of the kernel itself of the host machine they are running. This leads to a much more lightweight virtualization, allowing us to run a lot more VMs on a single host then we could do with a hypervisor.

Of course that using virtualization with hypervisors over containers has his benefits too, such as more flexibility – since you can’t for example run a windows container from a linux machine – and security, since this shared kernel approach leads to a scenario where if a attacker invades the kernel, he automatically get access to all the containers provided by that kernel, although we can say the same if the attacker invades the hypervisor of a machine running some VMs. This is a very heated debate, which we can find a lot of discussion by the Internet.

The fact is that containers are here to stay and his lightweight implementation provides a very good foundation for a lot of applications, such as a Microservices architecture. So, without further delay, let’s begin do dive in on Docker, one of the most popular container engines on the market today.

Docker Architecture

Docker was developed by Docker inc (formerly DotCloud inc) as a open-source engine for deployment of applications on containers. By providing a logical layer that manages the lifecycle of the containers, we can focus our development on the applications themselves, leaving the implementation of the container management to Docker.

The Docker architecture consists of a client-server model, where we have a Docker client, that could be the command line one provided by Docker, or a consumer of the RESTFul API also provided on the toolset, and the Docker server, also known as Docker daemon, which receives requests to create/start/stop containers, been responsible for the container management. The diagram bellow illustrates this architecture:

docker-architecture

On the image above we can see the Docker clients connecting to the daemon, making requests to start/stop/create containers etc. We can also see that the daemon is communicating with a image repository and using images while processing the requests. What are those images for? That’s what we will find out on the next section.

Containers & Images

When we talk about containers on Docker, we talk about processes running from instructions that were set on images. We can understand images as the building blocks from which containers are build, that way organizing the building of containers on a Docker environment.

This images are distributed on repositories, also known as a docker registry, which in turn are versioned using git. The main repository for the distribution of Docker images is, of course, the Docker Hub, managed by Docker itself.

A interesting thing is that the code behind the Docker Hub is open, so it is perfectly possible to host your own image repository. To know more about this, please visit:

github.com/docker/distribution

Docker Union File System

If the reader is familiar with traditional virtualization with hypervisors, he could be thinking: “How Docker manages the file system so the images didn’t get ‘dirt’ from the files generated on the executions of all the created containers?”. This could be even worse when we think that images can be inherited from other images, making a whole image tree.

The answer for this is how Docker utilizes the file system, with a file system called Union File System. On this system, each image is layered as a read-only layer. The layers are then overlapped on top of each other and finally on top of the chain a read-write layer is created, for the use of the container. This way, none of the image’s file systems are altered, making the images clean for use across multiple containers.

Using Docker in other OS

Docker is designed for use on Linux distributions. However, for use as a development environment, Docker inc released a Docker Toolbox that allows it to run Docker both on OSX and Windows environments as well.

In order to do this, the toolbox install virtualbox inside his distribution, as well as a micro linux VM, that only has the minimum kernel packages necessary for the launch of containers. This way, when we launch the terminal of the toolbox, the launcher starts the VM on virtualbox, starting Docker inside the VM.

One important thing to notice is that, when we launch Docker this way, we can’t use localhost anymore when we want to access Docker from the same host it is running, since it is binded to a specific ip. We can see the ip Docker is bind to by the initialization messages, as we can see on the image bellow, from a Docker installation on OSX:

Another thing to notice is that, when we use docker on OSX or Windows, we don’t need to use sudo to execute the commands, while on Linux we will need to execute the commands as root. Another way to use Docker that removes the need to use sudo is by having a group called ‘docker’ on the machine, which will make docker apply the permissions necessary to run docker without sudo to the group. Notice, however, that users from this group will have root permissions, so caution is required.

Docker commands

Well, now that we get the concepts out of the way, let’s start using Docker!

As said before, Docker uses image repositories to resolve the images it needs to run the containers. Let’s begin by running a simple container, to make a famous ‘hello world!’, Docker style.

To do this, let’s open a terminal – or the Docker QuickStart Terminal on other OS – and type:

docker run ubuntu echo 'hello, Docker!'

This is the run command, which creates a new container each time is called. On the command above,   we simply asked Docker to create a new container, using the image ubuntu – if it doesn’t have already, Docker will search Docker Hub for the latest version of the image and download it. If we wanted to use a specific version, we could specify like this: ‘ubuntu:14.04’ – from Docker Hub. At the end, we specify a command for the container to run, on this case, a simple “echo ‘hello world!’ “.

The image bellow show the command in action, downloading the image and executing:

To list the containers created, we run the command ‘docker ps’. If we run the command this way now, however, we won’t see anything, because the command by default only shows the active containers and on our case, our container simply stopped after running the command. If we run the command with the flag -a, we can then see the stopped containers, like the image bellow:

However, there’s a problem: our hello world container was just a test, so we didn’t want to keep with the container. Let’s correct this by removing the container, with the command:

docker rm 2f134296daa6

PS: The hash you can see on the command is part of the ID of the container, that you can see when you list the containers with docker ps.

Now, what about if we wanted to run again our hello world container, or any other container that we need just once, without the need to manually remove the container afterwards? We can do this using the –rm flag, running like this:

docker run --rm ubuntu echo 'hello, Docker!'

If we run docker ps -a again, after running the command above, we can see that there’s no containers  from our previous execution, proving that the flag worked correctly.

Let’s now do a little more complex example, starting a container with a Tomcat server. First, let’s search for the name of the image we want on Docker Hub. To do this, open a browser and type it:

https://hub.docker.com

Once inside the Docker Hub, let’s search for the image typing ‘tomcat’ on the search box on the top right corner of the site. We will be send to a screen like the one bellow. You can see the images with some classifiers like ‘official’ and ‘automated’.

Official means that the image is maintained by Docker itself, while automated means that the image is maintained by a CI workload.On the section ‘Publishing on Docker Hub’ we will understand more about the options to publish our own images to the Docker Hub.

From the Docker Hub we get that the name of the official image is ‘tomcat’, so let’s use this image. To use it, we simply run the following command, which will start a new container with the tomcat image:

docker run --name mytomcat -p 8888:8080 tomcat:8.0

You can see that we used some new flags on this command. First, we used the flag –name, which makes a name for our container, so we can refer to this name when running commands afterwards.

The -p flag is used to bind a port from the host machine to a port on the container. On the next sections, we will talk about creating our own images and we will see that we can expose ports from the container to be accessed by clients, like in this case that we exposed the port tomcat will serve on the container (8080) to the port 8888 of the host.

Lastly, when declaring the image we will use, we specified the version, 8.0, meaning that we want to use Tomcat 8.0. Note that we didn’t passed any command to the container, since the image is already configured to start the Tomcat server after the building. After running the command, we can see that tomcat is running inside our container:

The problem is that now our terminal is occupied by the tomcat process so we can’t issue more commands. Let’s press Crtl+C and type the docker ps command. The container is not active! the reason for this is that, by default, Docker don’t put the containers to run in background when we create a new container. To do this, we use the -d flagso on the previous command, all we had to do is include this flag to make the container start on background.

Let’s start the container again with the command docker start:

docker start mytomcat

After the start, we can see that the container is running, if we run docker ps:

Before we open the server on the browser, let’s just check if the container is exposed on the port we defined. To do this, we use the command:

docker port mytomcat

This command will return the following result:

8080/tcp -> 0.0.0.0:8888

Which means that the container is exposed on the port 8888, as defined. If we open the browser on localhost:8888 – or the ip binded by Docker on a non-linux environment – we will receive the following screen:

Excellent! Now we have our own dockerized Tomcat! However, by starting the container in background, we couldn’t see the logging of the server, to see if there wasn’t any problem on the startup of the web server. let’s see the last lines of the log of our container by entering the command:

docker logs mytomcat

This will show the last inputs of our container on the stdout. We could also use the command:

docker attach mytomcat

This command has the same effect of the tail command on a file, making it possible to watch the logs of the container. Take notice that after attaching to a container, pressing Ctrl+C will kill him!

Before ending our container, let’s see more detailed information about our container, such as the Java version, network mappings etc. To do this, we run the following command:

docker inspect mytomcat

This will produce a result like the following, in JSON format:

[

{

“Id”: “14598a44a3b4fe2a5b987ea365b2c83a0399f4fda476ad1754acee23c96fcc22”,

“Created”: “2016-01-03T17:59:49.632050403Z”,

“Path”: “catalina.sh”,

“Args”: [

“run”

],

“State”: {

“Status”: “running”,

“Running”: true,

“Paused”: false,

“Restarting”: false,

“OOMKilled”: false,

“Dead”: false,

“Pid”: 4811,

“ExitCode”: 0,

“Error”: “”,

“StartedAt”: “2016-01-03T17:59:49.728287604Z”,

“FinishedAt”: “0001-01-01T00:00:00Z”

},

“Image”: “af28fa31b54b2e45d53e80c5a7cbfd2693f198fdb8ba53d44d8a432832ad1012”,

“ResolvConfPath”: “/mnt/sda1/var/lib/docker/containers/14598a44a3b4fe2a5b987ea365b2c83a0399f4fda476ad1754acee23c96fcc22/resolv.conf”,

“HostnamePath”: “/mnt/sda1/var/lib/docker/containers/14598a44a3b4fe2a5b987ea365b2c83a0399f4fda476ad1754acee23c96fcc22/hostname”,

“HostsPath”: “/mnt/sda1/var/lib/docker/containers/14598a44a3b4fe2a5b987ea365b2c83a0399f4fda476ad1754acee23c96fcc22/hosts”,

“LogPath”: “/mnt/sda1/var/lib/docker/containers/14598a44a3b4fe2a5b987ea365b2c83a0399f4fda476ad1754acee23c96fcc22/14598a44a3b4fe2a5b987ea365b2c83a0399f4fda476ad1754acee23c96fcc22-json.log”,

“Name”: “/mytomcat”,

“RestartCount”: 0,

“Driver”: “aufs”,

“ExecDriver”: “native-0.2”,

“MountLabel”: “”,

“ProcessLabel”: “”,

“AppArmorProfile”: “”,

“ExecIDs”: null,

“HostConfig”: {

“Binds”: null,

“ContainerIDFile”: “”,

“LxcConf”: [],

“Memory”: 0,

“MemoryReservation”: 0,

“MemorySwap”: 0,

“KernelMemory”: 0,

“CpuShares”: 0,

“CpuPeriod”: 0,

“CpusetCpus”: “”,

“CpusetMems”: “”,

“CpuQuota”: 0,

“BlkioWeight”: 0,

“OomKillDisable”: false,

“MemorySwappiness”: -1,

“Privileged”: false,

“PortBindings”: {

“8080/tcp”: [

{

“HostIp”: “”,

“HostPort”: “8888”

}

]

},

“Links”: null,

“PublishAllPorts”: false,

“Dns”: [],

“DnsOptions”: [],

“DnsSearch”: [],

“ExtraHosts”: null,

“VolumesFrom”: null,

“Devices”: [],

“NetworkMode”: “default”,

“IpcMode”: “”,

“PidMode”: “”,

“UTSMode”: “”,

“CapAdd”: null,

“CapDrop”: null,

“GroupAdd”: null,

“RestartPolicy”: {

“Name”: “no”,

“MaximumRetryCount”: 0

},

“SecurityOpt”: null,

“ReadonlyRootfs”: false,

“Ulimits”: null,

“LogConfig”: {

“Type”: “json-file”,

“Config”: {}

},

“CgroupParent”: “”,

“ConsoleSize”: [

0,

0

],

“VolumeDriver”: “”

},

“GraphDriver”: {

“Name”: “aufs”,

“Data”: null

},

“Mounts”: [],

“Config”: {

“Hostname”: “14598a44a3b4”,

“Domainname”: “”,

“User”: “”,

“AttachStdin”: false,

“AttachStdout”: false,

“AttachStderr”: false,

“ExposedPorts”: {

“8080/tcp”: {}

},

“Tty”: false,

“OpenStdin”: false,

“StdinOnce”: false,

“Env”: [

“PATH=/usr/local/tomcat/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”,

“LANG=C.UTF-8”,

“JAVA_VERSION=7u91”,

“JAVA_DEBIAN_VERSION=7u91-2.6.3-1~deb8u1”,

“CATALINA_HOME=/usr/local/tomcat”,

“TOMCAT_MAJOR=8”,

“TOMCAT_VERSION=8.0.30”,

“TOMCAT_TGZ_URL=https://www.apache.org/dist/tomcat/tomcat-8/v8.0.30/bin/apache-tomcat-8.0.30.tar.gz”

],

“Cmd”: [

“catalina.sh”,

“run”

],

“Image”: “tomcat:8.0”,

“Volumes”: null,

“WorkingDir”: “/usr/local/tomcat”,

“Entrypoint”: null,

“OnBuild”: null,

“Labels”: {},

“StopSignal”: “SIGTERM”

},

“NetworkSettings”: {

“Bridge”: “”,

“SandboxID”: “37fe977f6a45f4495eae95260a67f0e99a8168c73930151926ad521d211b4ac6”,

“HairpinMode”: false,

“LinkLocalIPv6Address”: “”,

“LinkLocalIPv6PrefixLen”: 0,

“Ports”: {

“8080/tcp”: [

{

“HostIp”: “0.0.0.0”,

“HostPort”: “8888”

}

]

},

“SandboxKey”: “/var/run/docker/netns/37fe977f6a45”,

“SecondaryIPAddresses”: null,

“SecondaryIPv6Addresses”: null,

“EndpointID”: “09477abf22e8019e2f8d3e0deeb18d9362136954a09e336f4a93286cb3b8b027”,

“Gateway”: “172.17.0.2”,

“GlobalIPv6Address”: “”,

“GlobalIPv6PrefixLen”: 0,

“IPAddress”: “172.17.0.3”,

“IPPrefixLen”: 16,

“IPv6Gateway”: “”,

“MacAddress”: “02:42:ac:11:00:03”,

“Networks”: {

“bridge”: {

“EndpointID”: “09477abf22e8019e2f8d3e0deeb18d9362136954a09e336f4a93286cb3b8b027”,

“Gateway”: “172.17.0.2”,

“IPAddress”: “172.17.0.3”,

“IPPrefixLen”: 16,

“IPv6Gateway”: “”,

“GlobalIPv6Address”: “”,

“GlobalIPv6PrefixLen”: 0,

“MacAddress”: “02:42:ac:11:00:03”

}

}

}

}

]

Now, let’s stop our container, by entering:

docker stop mytomcat

Finally, let’s remove the container, since we won’t use him anymore on this practice:

docker rm mytomcat

Let’s also remove the image, since we won’t use it either on our next steps:

docker rmi tomcat:8.0

And that concludes our breaking course on Docker’s commands! There are also other useful commands as well, of course, such as:

  • docker build: Used to build a image from a Dockerfile (see next sections);
  • docker commit: Creates a image from a container;
  • docker push: Pushes the image for a registry (Docker Hub by default);
  • docker exec: Submit a command for a running container;
  • docker export: Export the file system of a container as a tar file;
  • docker images: list the images inside Docker;
  • docker kill: force kill a running container;
  • docker restart: restart a container;
  • docker network: manages docker networks (see next sections);
  • docker volume: manages docker volumes (see next sections);

Creating your own images

On the previous section, we used a image from the Docker Hub, that creates a container with a Tomcat Web Server. This image is implemented by a script with building instructions called Dockerfile. On our lab we will create a Dockerfile, but for now, let’s just examine the Dockerfile from the tomcat’s image in order to learn some of the instructions available:

FROM java:7-jre
ENV CATALINA_HOME /usr/local/tomcat

ENV PATH $CATALINA_HOME/bin:$PATH

RUN mkdir -p “$CATALINA_HOME”

WORKDIR $CATALINA_HOME
# see https://www.apache.org/dist/tomcat/tomcat-8/KEYS

RUN gpg –keyserver pool.sks-keyservers.net –recv-keys \

05AB33110949707C93A279E3D3EFE6B686867BA6 \

07E48665A34DCAFAE522E5E6266191C37C037D42 \

47309207D818FFD8DCD3F83F1931D684307A10A5 \

541FBE7D8F78B25E055DDEE13C370389288584E7 \

61B832AC2F1C5A90F0F9B00A1C506407564C17A3 \

79F7026C690BAA50B92CD8B66A3AD3F4F22C4FED \

9BA44C2621385CB966EBA586F72C284D731FABEE \

A27677289986DB50844682F8ACB77FC2E86E29AC \

A9C5DF4D22E99998D9875A5110C01C5A2F6059E7 \

DCFD35E0BF8CA7344752DE8B6FB21E8933C60243 \

F3A04C595DB5B6A5F1ECA43E3B7BBB100D811BBE \

F7DA48BB64BCB84ECBA7EE6935CD23C10D498E23
ENV TOMCAT_MAJOR 8

ENV TOMCAT_VERSION 8.0.30

ENV TOMCAT_TGZ_URL https://www.apache.org/dist/tomcat/tomcat-$TOMCAT_MAJOR/v$TOMCAT_VERSION/bin/apache-tomcat-$TOMCAT_VERSION.tar.gz
RUN set -x \

&& curl -fSL “$TOMCAT_TGZ_URL” -o tomcat.tar.gz \

&& curl -fSL “$TOMCAT_TGZ_URL.asc” -o tomcat.tar.gz.asc \

&& gpg –verify tomcat.tar.gz.asc \

&& tar -xvf tomcat.tar.gz –strip-components=1 \

&& rm bin/*.bat \

&& rm tomcat.tar.gz*
EXPOSE 8080

CMD [“catalina.sh”, “run”]

As we can see, the first instruction is called FROM. This instruction delimits the base image upon the image will be constructed. On this case, the java:7-jre image will create a basic Linux environment, with Java 7 configured.

Then we see some ENV instructions. This commands set environment variables on the OS, as part of Tomcat’s configuration. We can also see a WORKDIR instruction, which defines the directory that, from that point, Docker will use to run the commands.

We also see some RUN instructions. This instructions, as the name implies, run commands on the container.On the case of our image, this instructions make tomcat’s installation process.

Finally we see the EXPOSE instruction, which exposes the 8080 port for the host. Finally, we see the CMD instruction, which defines the start of the server. One important distinction of the CMD command from the RUN ones is that, while the RUN instructions only executes on the build phase, the CMD instruction is executed on the container’s startup, making it the command to be executed on the creation and start of a container. For this reason, it is only allowed to have one CMD command per Dockerfile.

One important side note about the CMD command is that, when starting a container, it allows to override the command on the Dockerfile, so it could lead to security holes when used. For this reason, it is recommended to use the ENTRYPOINT instruction instead, that like the CMD instruction, it defines the command the container will run at startup, but on this case, it will not permit the command to be overridden, making it more secure.

We could also use ENTRYPOINT and CMD combined, making it possible to restrict the user to only pass flags and/or arguments to the running command.

Of course, that is not the only instructions we can use on a Dockerfile. Other instructions are:

  • MAINTAINER: Defines the author of the image;
  • LABEL: Adds metadata to a image, in a key=value format;
  • ADD: Adds a file, folder or remote file URL to a folder inside the container. The current directory from which this instruction point it out is the same directory of the Dockerfile;
  •  COPY: Analogous to ADD, this instruction also copies files and folders from the host to the container. The two major differences are that COPY doesn’t allow the use of URLs and COPY doesn’t uncompress known files, like ADD can;
  • VOLUME: Creates a volume. On the section “creating the base image” we will learn in more detail about what are volumes on Docker;
  • USER: Defines the name of the user used to run the RUN, CMD and ENTRYPOINT instructions;
  • ARG: Defines build-time variables, that could be used to pass parameters for the image’s buildings, using the build-arg flag;
  • ONBUILD: Defines a command to be triggered by other images, when the image is used as a base image for other images;
  • STOPSIGNAL: Defines the code for the container to stop, as a number or in the SIGNAME format, for instance SIGKILL.

Publishing on Docker Hub

As we saw on the previous section, in order to create our own images, all we have to do is create a “Dockerfile” file, with the instructions necessary for the building. However, in order to distribute our images, we need to register them on a image registry, like the Docker Hub, for example.

The simplest way to publish images are with the commit and push commands. For example, if we had  made changes to our Tomcat container and wanted to save those changes as a new image, we could do this with this command:

docker commit mytomcat alexandreesl/mytomcat

Where the second argument is the <repository ID>/<image name>. In order to push the images to Docker Hub, the repository ID must be equals to the username of our Docker Hub account.

After committing the changes, all we have to do is push the changes to the Hub, using the following command:

docker push alexandreesl/mytomcat

And that’s it! After the push, if we see our Docker Hub account, we can see that our image was successfully created:

NOTE: before pushing, you could have to run the command docker login to register your credentials from the Docker Hub

Another way of publishing is by linking our images with a git repository, this way the Docker Hub will rebuild the image each time a new version of the Dockerfile is pushed to the repository. Is this method of publishing that generates automated build images, since this way Docker can establish a CI Workload with the git repository. We will see this method in action on the next section, ‘Creating the base image’.

Practice: Using Docker to implement a microservices architecture

On our lab, we will take a step forward from my previous post about microservices with Spring Boot (haven’t see it? you can find it here, I am really grateful if you read that post as well!), by using a Service Registry called Eureka, designed by Netflix.

On the Service Registry pattern, we have a registry where microservices can register/unregister and also find the addresses of a service dynamically, this way decoupling the bounds between them. We will dockerize (run inside a container) the services of that lab and make them use Eureka, which will also run inside a container. In order to integrate with Eureka, we will modify our applications to use the Spring Cloud project, which according to the project’s description:

Spring Cloud provides tools for developers to quickly build some of the common patterns in distributed systems (e.g. configuration management, service discovery, circuit breakers, intelligent routing, micro-proxy, control bus, one-time tokens, global locks, leadership election, distributed sessions, cluster state). Coordination of distributed systems leads to boiler plate patterns, and using Spring Cloud developers can quickly stand up services and applications that implement those patterns. They will work well in any distributed environment, including the developer’s own laptop, bare metal data centres, and managed platforms such as Cloud Foundry.

Also, to “Springfy” even more our example, we will exchange our implementation from the previous post that uses pure jax-rs to the RestController implementation, present on the Spring Web project.

Creating the base image

Well, so let’s begin our practice! First, we will create a Docker Network to accommodate the containers.

If we see the network adapters from the host machine when the Docker Daemon is up, we will see that he creates a bridge adapter on the host, subsequently appending the containers on network interfaces inside the bridge adapter. This forms a subnet where the containers can see each other and also Docker facilitates the work for us, by mapping the container’s IPs with their names, on the /etc/hosts files inside.

On our scenario, we will use this feature so the microservices can easily find our Eureka registry, by mapping the address to the container’s name. Of course that on a real scenario we would have a cluster of Eurekas with a load balancer on separate hosts, but for simplicity’s sake of our lab, we will just use one Eureka instance.

So, in order to create a network to accommodate our architecture, first we create a network, by running the following command:

docker network create microservicesnet

After running the command we will see a hash’s ID indicating our network was created. We can see the details of our network by running the following command:

docker network inspect microservicesnet

Which will produce a result like the following:

[

{

“Name”: “microservicesnet”,

“Id”: “e8d401a00de26b74f4f2461c13dbf848c14f37127b3337a3e40465eff5897910”,

“Scope”: “local”,

“Driver”: “bridge”,

“IPAM”: {

“Driver”: “default”,

“Config”: [

{}

]

},

“Containers”: {},

“Options”: {}

}

]

Notice that, for now, the containers object is empty. This is why we didn’t add any containers to the network yet, but this will soon change.

Now we will create the Dockerfile. Create a folder in the directory of your choice and create a file called ‘Dockerfile’ (without any extension). On my case, I will push my Dockerfile to a Github repository, in order to create the image on Docker Hub as a automated build image. You can find the repositories with the source for this lab at the end of the post.

So, without further delay, let’s code the Dockerfile. The code for our image will be the following:

FROM java:8-jre

MAINTAINER Alexandre Lourenco <alexandreesl@example.com>

VOLUME [ "/data" ]

WORKDIR /data

EXPOSE 8080
ENTRYPOINT [ "java" ]
CMD ["-?"]

 

As we can see, it is a simple Dockerfile. We use Java’s 8 official image as the base,  define a volume and set his folder as our Workdir, expose the 8080 port which is the default for Spring Boot and finally we combine the entrypoint and cmd commands, meaning that anything we pass when we get up a container will be treated as parameters for the Java command. But what is this volume we have speaked of?

The reader must remember what we saw about the Union File System and how each image is layered as a read-only layer. The volumes on Docker is a technique that bypasses the Union File System, by defining a mount point on the Containers that points for a shared space on the host machine. The uses of this technique are in order to provide a place where the container can read/write information that needs to be persisted, alongside the need to share data between containers.

On our scenario, we will use the volume to point to a place where the jars of our microservices will be generated, so when a container runs a microservice, it runs the last version of the software. On a real-world scenario, this place could be a result of a CI workload, for example from a Jenkins job.

Now that we have our image, let’s build the image to test his correctness:

docker build -t alexandreesl/microservices-spring-boot-base .

At the end, we will see a message that our building was successful, validating the Dockerfile. Out of curiosity, if you see the building process, you will notice that Docker created several temporary containers for each instruction of our script, maintaining a status of the last successful instruction executed, so if we have to rebuild the image after a failure, we can restart from the failed point. We can disable this feature with the flag -no-cache if we want.

Now, let’s register the image on Docker Hub. I have created a Github repository here, so I will register the image directly on the site. To do this, we log in on our account and click on the “create automated build” menu item, as shown on the screen bellow:

After clicking, if we haven’t linked a github account yet, we will be prompted to do so. after making the linking, we will see a page with the list of our git repositories. We select the one with the Dockerfile and finally create the image, entering the name and a description for the image, as the screen bellow:

After clicking on the create button, we have completed our step, successfully registering our image on the Docker Hub! You can see my image on this link.

Lastly, just to test if our Docker Hub upload was really successful, we will remove the image from the local cache – that we added with our docker build command – and download using the docker pull command. First, let’s remove the image with the command:

docker rmi alexandreesl/microservices-spring-boot-base

And after, let’s pull the image with the command:

docker pull alexandreesl/microservices-spring-boot-base

After the download, we will have the image from the Docker Hub repository.

Preparing the service registry image

For the service registry image, we actually don’t have to do any coding, since we will use a already made image. We will start the image on the “Launching the containers” section, but the reader can see the image’s page to satisfy his curiosity here.

Preparing the services to use the registry

Now, Let’s prepare the services for the registering/deregistering of microservices, alongside service discovery when a service has dependencies.

To focus on the explanation, I will omit some details like the pom configuration, since the reader can easily get the configuration on the Github repository from the lab. On this lab, we have 3 microservices: a Customer service, a Product service and a Order service. The Order service has dependencies on the other 2 services, in order to mount a Order.

On the 3 projects we will have the same configuration for Spring Boot’s main class, called Application,  as follows:

@SpringBootApplication
@Configuration
@EnableAutoConfiguration
@EnableEurekaClient
public class Application {

public static void main(String[] args) {
 SpringApplication.run(Application.class, args);
 }
}

The key line here is the annotation @EnableEurekaClient, which configures the Spring Boot’s application to work with Eureka. Just with these annotation alone, we configure Spring Boot to connect to Eureka at startup and register itself with, send heartbeats during his lifecycle to keep the registration alive and finally make a unregistration when the process is terminated. Also, it instantiate a RestTemplate with a Ribbon load balancer, also made by Netflix, that can easily lookup service addresses from the registry. All of this with just one annotation!

In order to connect to Eureka, we have to provide the registry’s address. this is made by configuring a YAML file, called application.yml, that we put it on the resources source folder. We also configure a application name on this file, in order to inform Eureka about what Application’s ID we would like to use for the microservice.

So, in order to make this configuration, we create the files on our projects. seeing a example, for the Order’s service,  we have the following YAML configuration:

spring:
 application:
 name: OrderService
eureka:
 client:
 serviceUrl:
 defaultZone: http://eureka:8761/eureka/

Notice that when we configure Eureka’s address, we used “eureka” as the host name. This is the name of the container that we will use in order to deploy the Eureka server on our Docker Network, so on the /etc/hosts files of our containers, this name will be mapped to Eureka’s container IP, making it possible to point it out dynamically.

Now let’s see how our Microservices were implemented. For the Customer service, we have the following code:

@RestController
@RequestMapping("/")
public class CustomerRest {

private static List<Customer> clients = new ArrayList<Customer>();

static {
Customer customer1 = new Customer();
 customer1.setId(1);
 customer1.setName("Cliente 1");
 customer1.setEmail("customer1@gmail.com");

Customer customer2 = new Customer();
 customer2.setId(2);
 customer2.setName("Cliente 2");
 customer2.setEmail("customer2@gmail.com");

Customer customer3 = new Customer();
 customer3.setId(3);
 customer3.setName("Cliente 3");
 customer3.setEmail("customer3@gmail.com");

Customer customer4 = new Customer();
 customer4.setId(4);
 customer4.setName("Cliente 4");
 customer4.setEmail("customer4@gmail.com");

Customer customer5 = new Customer();
 customer5.setId(5);
 customer5.setName("Cliente 5");
 customer5.setEmail("customer5@gmail.com");

clients.add(customer1);
 clients.add(customer2);
 clients.add(customer3);
 clients.add(customer4);
 clients.add(customer5);
}

@RequestMapping(method = RequestMethod.GET, produces = MediaType.APPLICATION_JSON_VALUE)
 public List<Customer> getClientes() {
 return clients;
 }

@RequestMapping(value = "customer/{id}", method = RequestMethod.GET, produces = MediaType.APPLICATION_JSON_VALUE)
 public Customer getCliente(@PathVariable long id) {

Customer cli = null;

for (Customer c : clients) {

if (c.getId() == id)
 cli = c;

}

return cli;
 }
}

As we can see, nothing out of normal, very simple usage of Spring’s REST Controllers. Now, on to the Product service:

@RestController
@RequestMapping("/")
public class ProductRest {

private static List<Product> products = new ArrayList<Product>();

static {

Product product1 = new Product();
 product1.setId(1);
 product1.setSku("abcd1");
 product1.setDescription("Produto1");

Product product2 = new Product();
 product2.setId(2);
 product2.setSku("abcd2");
 product2.setDescription("Produto2");

Product product3 = new Product();
 product3.setId(3);
 product3.setSku("abcd3");
 product3.setDescription("Produto3");

Product product4 = new Product();
 product4.setId(4);
 product4.setSku("abcd4");
 product4.setDescription("Produto4");

products.add(product1);
 products.add(product2);
 products.add(product3);
 products.add(product4);

}

@RequestMapping(method = RequestMethod.GET, produces = MediaType.APPLICATION_JSON_VALUE)
 public List<Product> getProdutos() {
 return products;
 }

@RequestMapping(value = "product/{id}", method = RequestMethod.GET, produces = MediaType.APPLICATION_JSON_VALUE)
 public Product getProduto(@PathVariable long id) {

Product prod = null;

for (Product p : products) {

if (p.getId() == id)
 prod = p;

}

return prod;
 }

}

Again, nothing unusual on the code. Now, let’s see the Order service, where we will see the registry been used for consuming microservices:

@RestController
@RequestMapping("/")
public class OrderRest {

private static long id = 1;

// Created automatically by Spring Cloud
 @Autowired
 @LoadBalanced
 private RestTemplate restTemplate;

private Logger logger = Logger.getLogger(OrderRest.class);

@RequestMapping(value = "order/{idCustomer}/{idProduct}/{amount}", method = RequestMethod.GET, produces = MediaType.APPLICATION_JSON_VALUE)
 public Order submitOrder(@PathVariable long idCustomer, @PathVariable long idProduct, @PathVariable long amount) {

Order order = new Order();

 Map map = new HashMap();

 map.put("id", idCustomer);

ResponseEntity<Customer> customer = restTemplate.exchange("http://CUSTOMERSERVICE/customer/{id}",
 HttpMethod.GET, null, Customer.class, map);

 map = new HashMap();

 map.put("id", idProduct);

ResponseEntity<Product> product = restTemplate.exchange("http://PRODUCTSERVICE/product/{id}", HttpMethod.GET,
 null, Product.class, map);

order.setCustomer(customer.getBody());
order.setProduct(product.getBody());
order.setId(id);
order.setAmount(amount);
order.setOrderDate(new Date());

logger.warn("The order " + id + " for the client " + customer.getBody().getName() + " with the product "
 + product.getBody().getSku() + " with the amount " + amount + " was created!");

id++;

return order;
 }
}

The first thing we notice is the autowired RestTemplate, that also has a@LoadBalanced annotation. This RestTemplate is automatically instantiated by Spring Cloud, creating a interface which we could use to communicate with Eureka. The annotation informs that we want to use Ribbon to make the load balance, in case we have a cluster of instances of the same service.

The exchange method we use inside of the submitOrder method is where the magic happens. When we make our calls using this method, internally a lookup is made with the Eureka Server, the addresses of the service are passed to Ribbon in order to load balance the calls, and them the call is made. Notice that we used names like “CUSTOMERSERVICE” as the host address on our URIs. This pattern tells the framework the ID of the application we really want to call, been replaced at runtime with one of the IP addresses of the service we want to call.

And that concludes our quick explanation of the Java code involved on our lab. Again, if the reader want to get the full code, just head for the Github repositories at the end of this post. In order to run the lab, I recommend that you clone the whole repository and execute a mvn package command on each of the 3 projects. If you don’t have Maven, you can get it from here.

Launching the containers

Now that we are almost finished with the lab, it is time for the fun part: run our containers! To do this, we will use docker-compose. With docker-compose, we can start/stop/kill etc full stacks of containers, without having to instantiate everything by hand. In order to make our docker-compose stack, let’s create a YAML file called docker-compose.yml and include the following configuration:

eureka:
 image: springcloud/eureka
 container_name: eureka
 ports:
 - "8761:8761"
 net: "microservicesnet"
customer1:
 image: alexandreesl/microservices-spring-boot-base
 container_name: customer1
 hostname: customer1
 net: "microservicesnet"
 ports:
 - "8080:8080"
 volumes:
 - /Users/alexandrelourenco/Applications/git/docker-handson:/data
 command: -jar /data/Customer-backend/target/Customer-backend-1.0.jar
product1:
 image: alexandreesl/microservices-spring-boot-base
 container_name: product1
 hostname: product1
 net: "microservicesnet"
 ports:
 - "8081:8080"
 volumes:
 - /Users/alexandrelourenco/Applications/git/docker-handson:/data
 command: -jar /data/Product-backend/target/Product-backend-1.0.jar
order1:
 image: alexandreesl/microservices-spring-boot-base
 container_name: order1
 hostname: order1
 net: "microservicesnet"
 ports:
 - "8082:8080"
 volumes:
 - /Users/alexandrelourenco/Applications/git/docker-handson:/data
 command: -jar /data/Order-backend/target/Order-backend-1.0.jar

Some things to notice from our configuration:

  • In all the containers we configured our “microservicesnet” network, in order to use the Docker Network feature to resolve our needs;
  • On the MicroService’s containers, we also defined the hostname, to force Spring Cloud’s registration control to properly register the correct alias for the service’s addresses;
  • We have exposed Eureka’s port to the physical host at 8761, so we can see the web interface from outside the Docker environment;
  • On the volumes sections, I have mapped the base folder where my Maven projects from the microservices are placed, binding to the /data folder inside the container, which is defined as the workdir on the image we created previously. This way, on the command section, when we map the location of the microservice’s Spring Boot jar, we map the location from the /data folder;

Finally, after saving our file, we run the script by simply running the following command, at the same folder of the YAML file:

docker-compose up

We will see some information about our containers  getting up and at the end we will see some log information from all the containers mixed, with the name of the container as a prefix for identification:

After waiting some moments we will see our containers are up. Let’s see if the microservices are up? Let’s open a browser and point it to the port 8761, using localhost or the ip used by Docker in case you are in OSX/Windows:

Excellent! Not only has Eureka booted up successfully, but also all of our services have registered with it. Let’s now toy a little with our stack, to test it out.

Let’s begin by making a search for a customer of ID 1 on the Customer’s service:

curl -XGET 'http://<your ip>:8080/customer/1'

This will produce a JSON response like the following:

{"id":1,"name":"Cliente 1","email":"customer1@gmail.com"}

Next, let’s test out the Product service, with a call for the details of the Product of ID 4:

curl -XGET 'http://<your ip>:8081/product/4'

This time it will produce a response like the following:

{"id":4,"sku":"abcd4","description":"Produto4"}

Finally, the moment of true: let’s call the Order service, that utilises our other 2 services, to see if the service’s addresses are resolved with Eureka’s help. To test it, let’s make the following call:

curl -XGET 'http://<your ip>:8082/order/2/1/4'

If everything goes well, we will see a response like:

{"id":1,"amount":4,"orderDate":1452216090392,"customer":{"id":2,"name":"Cliente 2","email":"customer2@gmail.com"},"product":{"id":1,"sku":"abcd1","description":"Produto1"}}

Success! But can we have further proof? If we see the logs from the Order1 container, we can see some logs showing Ribbon in action, searching for the services we need on the call and initializing load balancers, in order to serve our needs:

Let’s make one last test before closing in: let’s add one more instance of the Product’s service, to see if the new instance is registered under the same application ID. To do this, let’s run the following command. Notice that we didn’t specify the port to expose the service, since we just want to add up the instance to load balance the internal calls from the Order service:

docker run --rm --name product2 --hostname=product2 --net=microservicesnet -v /Users/alexandrelourenco/Applications/git/docker-handson:/data alexandreesl/microservices-spring-boot-base -jar /data/Product-backend/target/Product-backend-1.0.jar

If we open the Eureka web interface again, we will see that the second address is mapped:

Finally, to test the balance, let’s make a call of the previous URL of the Order service. After the call, we will see that our service is aware of both addresses, proving that the balance is implemented, as we can see on the log’s fragment bellow:

[2016-01-08 14:14:09.703] boot – 1  INFO [http-nio-8080-exec-1] — ChainedDynamicProperty: Flipping property: PRODUCTSERVICE.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647

[2016-01-08 14:14:09.709] boot – 1  INFO [http-nio-8080-exec-1] — BaseLoadBalancer: Client:PRODUCTSERVICE instantiated a LoadBalancer:DynamicServerListLoadBalancer:{NFLoadBalancer:name=PRODUCTSERVICE,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null

[2016-01-08 14:14:09.714] boot – 1  INFO [http-nio-8080-exec-1] — ChainedDynamicProperty: Flipping property: PRODUCTSERVICE.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647

[2016-01-08 14:14:09.718] boot – 1  INFO [http-nio-8080-exec-1] — DynamicServerListLoadBalancer: DynamicServerListLoadBalancer for client PRODUCTSERVICE initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=PRODUCTSERVICE,current list of Servers=[product1:8080, product2:8080],Load balancer stats=Zone stats: {defaultzone=[Zone:defaultzone; Instance count:2; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;]

},Server stats: [[Server:product2:8080; Zone:defaultZone; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Thu Jan 01 00:00:00 UTC 1970; First connection made: Thu Jan 01 00:00:00 UTC 1970; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]

, [Server:product1:8080; Zone:defaultZone; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Thu Jan 01 00:00:00 UTC 1970; First connection made: Thu Jan 01 00:00:00 UTC 1970; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]

]}ServerList:org.springframework.cloud.netflix.ribbon.eureka.DomainExtractingServerList@2561f098

[2016-01-08 14:14:09.738] boot – 1  INFO [http-nio-8080-exec-1] — ConnectionPoolCleaner: Initializing ConnectionPoolCleaner for NFHttpClient:PRODUCTSERVICE

And that concludes our lab. As we can see, we builded a robust stack to deploy our microservices with their dependencies, with little effort.

Clustering container hosts

In all of our examples, we are always working with a single Docker host, our own machine on the case. On a real production environment, we can have some dozens or even hundreds of Docker hosts. In order to make all the hosts to work together, it is necessary to implement some layer that manages the hosts, making them appear as a single cluster to the consumers.

This is the goal of Docker Swarm. With Docker Swarm, we can connect multiple Docker hosts across a network, using them as a single cluster by using Docker Swarm’s interface.

For further information about Docker Swarm alongside a good example of utilization, I suggest consulting the excellent “The Docker Book”, which I describe on the next section.

References

This post was inspired by my studies on Docker with The Docker Book, written by James Turnbull. It is a excellent source of information, that I highly recommend to buy it! You can find the book available to purchase online on:

www.dockerbook.com

Conclusion

And this concludes our tour on the world of Docker. With a simple interface, Docker allows us to use all the power of the container world, allowing us to quickly deploy and escalate our applications. Never it was so simple to deploy our applications! Thank you for following me on another post, until next time.

Continue reading

TDD: Designing our code, test-first

Standard

Hi, dear readers! Welcome to my blog. On this post, we will talk about TDD, a methodology that preaches about focusing first on tests, instead of our production code. I have already talked about test technologies on my post about mockito, so in this post we will focus more on the theoretical aspects of this subject. So, without further delay, let’s begin talking about TDD!

Test-driven development

TDD, also known as Test-driven development, is a development technique created by Kent Beck, which also created JUnit, a highly known Java test library.

TDD, as the name implies, dictates that the development must be guided by tests, that must be written before even the “real” code that implements the requirements is written! Let’s see in more detail how the steps of TDD work on the next section.

TDD steps

As we can see on the picture above, the following steps must be met to proceed in using the TDD paradigm:

  1. Write a test (that fails): Represented by the red circle, it means that before we implement the code – a method on a class, for example – we implement a test case, like a JUnit method for instance, and invoke our code, in this case a simple empty method. Of course, on this scenario the test will fail, since there is no implementation, which leads to our next step;
  2. Write code just enough to pass: Represented by the green circle, it means that we write code on our method just enough to make our test case pass with success. The principle behind this step is that we write code with simplicity in mind, in other words, keeping it simple, like the famous kiss principle;
  3. Refactor the code, improving his quality: Represented by the blue circle, it means that, after we make our code to pass the test, we analyse our code, looking for chances to improve his quality. Is there duplicate code? Remove the duplications! Is there hard coded constants? Consider changing to a enum or a properties table, for instance.

A important thing to notice is that the focus of the third step and the previous one is to not implement more functionality then the test code is testing. The reason for this is pretty obvious: since our focus is to implement the tests for our scenarios first, any functionality that we implement without the test reflecting the scenario is automatically a code without test coverage;

After we conclude the third step, the construction returns to the first step, reinitiating the cycle. On the next iteration, we implement another test case for our method, representing another scenario. We then code the minimum code to implement the new scenario and conclude the process by refactoring the code. This keeps going until all the scenarios are met, leaving at the end not only our final code, but all the test cases necessary to effective test our method, with all the functionalities he is intended to implement.

The reader may be asking: “but couldn’t we get the same by coding the tests after we write our production code?”. The answer is yes, we could get the same results by adding our tests at the end of the process, but then we wouldn’t have something called feedback from our code.

Feedback

When we talk about feedback, we mean the perception we have about how our code will perform. If we think about, the test code is the first client of our code: he will prepare the data necessary for the invocation, invoke our code and then collect the results. By implementing the tests first, we receive more rapidly this feedbacks from our implementation, not only on the sphere of correctness, but also on design levels.

For instance, imagine that we are building a test case and realize that the test is getting difficult to implement because of the complex class structure we have to populate just to make the invocation of our test’s target. This could mean that our class model is too complex and a refactoring is necessary.

By getting this feedback when we are just beginning the development – remember that we always code just enough to the test to pass! – it makes a lot more cheap and less complex to refactor the model then if we receive this feedback just at the end of the construction, when a lot of time and effort was already made! By this point of view, we can easily see the benefits of the TDD approach.

Types of tests

When we talked about tests on the previous sections, we talked about a type of test called unit test. Unit tests are tests focused on testing a single program unit, like a class, not focusing on testing the access to external resources, for example. Of course, there are other types of tests we can implement, as follows:

Unit tests: Unit tests have their focus to test individually each program unit that composes the system, without external interferences.

Integration tests: Integration tests also have the focus to test program units, but in this case to test the integration of the system with external resources, like a database, for example. A good example of a integration test is test cases for a DAO class, testing the code that implement insertions, updates, etc.

System tests: System tests, as the name implies, are tests that focus on testing the whole system, across all his layers. In a web application, for example, that means a automated test that turns on a server, execute some HTTP requests to the web application and analyse the responses. A good example of technology that could be used to test the web tier is a tool called Selenium.

Acceptance tests: Acceptance tests, commonly, are tests made by the end user of the system. The focus of this kind of test is to measure the quality of the implementation of requirements specified by the user. Other requirements such as usability are also evaluated by this kind of test.

A important thing to notice is that this kind of test is also referred as a automated system test, with the objective of testing the requirement as it is, for example:

  • Requirement: the invoice must be inserted on the system by the web interface;
  • Test: create a system test that executes the insertion by the web interface and verifies if the data is correctly inserted;

This technique, called ATDD (Acceptance Test-Driven Development) preaches that first a acceptance test must be created, and then the TDD technique is applied, until the acceptance test is satisfied. The diagram bellow shows this technique in practice:

Mock objects and unit tests

When we talk about unit tests, as said before, it is important to isolate the program unit we are testing, avoiding the interference from other tiers and dependencies that the program unit uses. On the Java world, a good framework we can use to create mocks is Mockito, which we talked about on my previous post.

With mocks, following the principles of TDD, we can, for example, create just the interfaces our code depends on and mock that interfaces, this way already establishing the communication channel that will be created, without leaving our focus from the program unit we are creating.

Another benefit of this approach is on the creation of the interfaces themselves, since our focus is always to implement the minimum necessary for the tests to pass, the resulting interfaces will be simple and concise, improving their quality.

Considerations

When do I use mocks?

A important note about mocks is that not always is good to use them. Taking for example a DAO class, that essentially is just a class that implement code necessary to interact with a database, for instance, the use of mocks won’t bring any value, since the code of the class itself is too simple to benefit from a unit test. On this cases, typically just a integration test is enough, using for example in-memory databases such as HSQLDB to act as the database.

Should I code more then one test case at a time?

In general, is considered a bad practice to write more then one test case at once before running and getting the fail results. The reason for this is that the point of the technique is to test one functionality at a time, which of course is “ruined” by the practice of coding more then one test at once.

How much of a functionality do I code on the test?

On the TDD terminology, we can also call the “amounts” of functionality we code at each iteration as baby steps. There is no universal convention of how much must be implement in each of this baby steps, but it is a good advice to follow common sense. If the developer is experienced and the functionality is simple, it could be even possible to implement almost the whole code on just one iteration.

If the developer is less experienced and/or the functionality is more complex, it makes more sense to spend more time with more baby steps, since it will create more feedbacks for the developer, making it easier to implement the complex requirements.

Should I test private code?

Private code, as the name implies, are code – like a method, for instance – that are accessible only inside the same program unit, like a class, for example. Since the test cases are normally implemented on another program unit (class), the test code can’t access this code, which in turn begs the question: should I make any code to test that private code?

Generally speaking, a common consensus is that private code is intended to implement little things, like a portion of code that is duplicated across multiple methods, for example. In that scenario, if you have for instance a private method on a Java class that it is enormous with lots of functionality, then maybe it means that this method should be made public, maybe even moved to a different class, like a utility class.

If that is not the case, then it is really necessary to design test cases to efficiently test the method, by invoking him indirectly by his public consumer. Talking specifically on the Java World, one way to test the code without the “barrier” of the public consumer is by using reflection.

My target code has too much test cases, is this normal?

When implementing the test cases for a target production code – a method, for example – the developer could end up on a scenario that lots of test cases are created just to test the whole functionality that single code composes. When this kind of situation happens, it is a good practice that the developer analyse if the code doesn’t have too much functionality implemented on itself, which leads to what in OO we call as low cohesion.

To avoid this situation, a refactoring from the code is needed, maybe splitting the code on more methods, or even classes on Java’s case.

Conclusion

And this concludes our post about TDD. By shifting the focus from implementing the code first to implementing the tests first, we can easily make our code more robust, simple and testable.

In a time that software is more important then ever – been on places like airplanes, cars and even inside human beans -, testing efficiently the code is really important, since the consequences of a bad code are getting more and more devastating. Thank you for following me on another post, until next time.

Java 8: Knowing the new features – the new Date API

Standard

Hi, dear readers! Welcome to my blog. On this post, the last on the series, we talk about the new library for Date & Time manipulation, which was inspired by the Joda Time library.

So, without further delay, let’s begin our journey through this feature!

Manipulating Dates & Time on Java

It is a old complain on the Java community how the Java APIs for manipulating Dates has his issues, like limitations, difficult  to work with, etc. Thinking on this, the Java 8 comes with a new API that brings simplicity and strength to the way we work with datetimes on Java. Let’s start by learning how to create instances of the new classes.

To create a new Date instance (without time), representing the current date, all we have to do is:

LocalDate date = LocalDate.now();

To create a new Time instance, based at the time the instance was created, we do this:

LocalTime time = LocalTime.now();

And finally, to create a datetime, in other words, a date and time representation, we use this:

LocalDateTime dateTime = LocalDateTime.now();

The instance above have not timezone information, using only the local timezone. If it is needed to use a specific timezone, we created a class called ZonedDateTime. For example, if we wanted to create a instance from our timezone and them change to Sidney’s timezone, we could do like this:

ZonedDateTime zonedDateTime = ZonedDateTime.now();

System.out.println("Time at my timezone: " + zonedDateTime);

zonedDateTime = zonedDateTime.withZoneSameInstant(ZoneId
.of("Australia/Sydney"));

System.out.println("Time at Sidney: " + zonedDateTime);

The code above print the following at my location:

Time at my timezone: 2015-06-04T14:42:30.850-03:00[America/Sao_Paulo]
Time at Sidney: 2015-06-05T03:42:30.850+10:00[Australia/Sydney]

Another way of instantiating this classes is for a predefined date and/or time. We can do this like the following:

 date = LocalDate.of(2015, Month.DECEMBER, 25);

 dateTime = LocalDateTime.of(2015, Month.DECEMBER, 25, 10, 30);

With all those classes is really simple to add and/or remove days, months or years to a date, or the same to a time object. the code bellow illustrate this simplicity:

System.out.println("Date before adding days: " + date);

date = date.plusDays(10);

System.out.println("Date after adding days: " + date);

date = date.plusMonths(6);

System.out.println("Date after adding months: " + date);

date = date.plusYears(5);

System.out.println("Date after adding years: " + date);

date = date.minusDays(7);

System.out.println("Date after subtracting days: " + date);

date = date.minusMonths(6);

System.out.println("Date after subtracting months: " + date);

date = date.minusYears(10);

System.out.println("Date after subtracting years: " + date);

time = time.plusHours(12);

System.out.println("Time after adding hours: " + time);

time = time.plusMinutes(30);

System.out.println("Time after adding minutes: " + time);

time = time.plusSeconds(120);

System.out.println("Time after adding seconds: " + time);

time = time.minusHours(12);

System.out.println("Time after subtracting hours: " + time);

time = time.minusMinutes(30);

System.out.println("Time after subtracting minutes: " + time);

time = time.minusSeconds(120);

System.out.println("Time after subtracting seconds: " + time);

Running the above code, it prints:

Date before adding days: 2015-12-25
Date after adding days: 2016-01-04
Date after adding months: 2016-07-04
Date after adding years: 2021-07-04
Date after subtracting days: 2021-06-27
Date after subtracting months: 2020-12-27
Date after subtracting years: 2010-12-27
Time after adding hours: 09:28:24.380
Time after adding minutes: 09:58:24.380
Time after adding seconds: 10:00:24.380
Time after subtracting hours: 22:00:24.380
Time after subtracting minutes: 21:30:24.380
Time after subtracting seconds: 21:28:24.380

One important thing to notice is that in all methods we had to “catch” the return of the operations. The reason for this is that, opposite to the old classes we used like the Calendar one, the instances on the new date API are immutable, so they always return a new value. This is useful for scenarios with concurrent access for example, since the instances wont carry states.

Another simplicity is on the way we get the values from a date or time. On the old days, when we wanted to get a year or month from a Calendar, for example, we would need to use the generic get method, with a indication of the field we would want, like Calendar.YEAR. With the new API, we could use specific methods with ease, like the following:

System.out.println("For the date: " + date);

System.out.println("The year from the date is: " + date.getYear());

System.out.println("The month from the date is: " + date.getMonth());

System.out.println("The day from the date is: " + date.getDayOfMonth());

System.out.println("The era from the date is: " + date.getEra());

System.out.println("The day of the week is: " + date.getDayOfWeek());

System.out.println("The day of the year is: " + date.getDayOfYear());

After we run the code above, the following result will be produced:

For the date: 2010-12-27
The year from the date is: 2010
The month from the date is: DECEMBER
The day from the date is: 27
The era from the date is: CE
The day of the week is: MONDAY
The day of the year is: 361

Another simple thing to do is comparing dates with the API. If we code the following:

  // comparing dates
  LocalDate today = LocalDate.now();
  LocalDate tomorrow = today.plusDays(1);

   System.out.println("Is today before tomorrow? "
			+ today.isBefore(tomorrow));

   System.out.println("Is today after tomorrow? "
			+ today.isAfter(tomorrow));

   System.out.println("Is today equal tomorrow? "
			+ today.isEqual(tomorrow));

On the code above, as expected, only  the first print will print true.

One interesting feature of the new API is the locale support. On the code bellow, for example, we print the month of a date in different languages:

System.out.println("English: "+today.getMonth().getDisplayName(TextStyle.FULL, Locale.ENGLISH));
		
System.out.println("Portuguese: "+today.getMonth().getDisplayName(TextStyle.FULL, Locale.forLanguageTag("pt")));
		
System.out.println("German: "+today.getMonth().getDisplayName(TextStyle.FULL, Locale.GERMAN));
		
System.out.println("Italian: "+today.getMonth().getDisplayName(TextStyle.FULL, Locale.ITALIAN));
		
System.out.println("Japanese: "+today.getMonth().getDisplayName(TextStyle.FULL, Locale.JAPANESE));
		
System.out.println("Chinese: "+today.getMonth().getDisplayName(TextStyle.FULL, Locale.CHINESE));

Running the above code, on my current date, we will get the following result:

English: June
Portuguese: Junho
German: Juni
Italian: giugno
Japanese: 6月
Chinese: 六月

Formatting dates is also a easy task with the new API. If we wanted to format a date to a “dd/MM/yyyy” format, all we have to do is pass a DateTimeFormatter with the desired format:

System.out.println(today.format(DateTimeFormatter
			.ofPattern("dd/MM/yyyy")));

One very common requirement we encounter from time to time is the need to calculate the time between two dates. With the new API, we can calculate this very easily, with the ChronoUnit class:

 LocalDateTime oneDate = LocalDateTime.now();
 LocalDateTime anotherDate = LocalDateTime.of(1982, Month.JUNE, 21, 20,
 00);

 System.out.println("Days between the dates: "
 + ChronoUnit.DAYS.between(anotherDate, oneDate));

 System.out.println("Months between the dates: "
 + ChronoUnit.MONTHS.between(anotherDate, oneDate));

 System.out.println("Years between the dates: "
 + ChronoUnit.YEARS.between(anotherDate, oneDate));

System.out.println("Hours between the dates: "
 + ChronoUnit.HOURS.between(anotherDate, oneDate));

 System.out.println("Minutes between the dates: "
 + ChronoUnit.MINUTES.between(anotherDate, oneDate));

 System.out.println("Seconds between the dates: "
 + ChronoUnit.SECONDS.between(anotherDate, oneDate));

On my current day (08/06/2015), the above code produced:

Days between the dates: 12040
Months between the dates: 395
Years between the dates: 32
Hours between the dates: 288962
Minutes between the dates: 17337771
Seconds between the dates: 1040266275

One thing to note is that, if we use the same methods with the objects exchanged, we will receive negative numbers. If our logic needs the calculations to be always positive, we could use the classes Period and Duration to calculate the time between the dates, which have the methods isNegative() and negated() to produce this desired effect.

One final feature we will visit of the new API is the concept of invalid dates. When we were using a Calendar,  if we tried to input the date of February, 30, on a year the month goes to 28 days, the Calendar will adjust the date to March, 2, in other words, it will go past the date inputted, without throwing any errors. This is not always the desired effect, since sometimes this could lead to unpredictable behaviors. On the new API, if we try for example to do the following:

LocalDate invalidDate = LocalDate.of(2014, Month.FEBRUARY, 30);
		
System.out.println(invalidDate);

We will receive a invalid date exception, ensuring a easier way to treat this kind of bug:

Exception in thread "main" java.time.DateTimeException: Invalid date 'FEBRUARY 30'
	at java.time.LocalDate.create(LocalDate.java:431)
	at java.time.LocalDate.of(LocalDate.java:249)
	at com.alexandreesl.handson.DateAPIShowcase.main(DateAPIShowcase.java:174)

References

This series was inspired by a book from the publisher “Casa do Código”, which was used by me on my studies. Unfortunately the book is on Portuguese, but it is a good source for developers who want to quickly learn about the new features of Java 8:

Java 8 prático

Conclusion

And that concludes our series about the new features  of the Java 8. Of course, there is other subjects we didn’t talked about, like the end of the PermGen, that it was replaced by another memory technique called metaspace. If the reader wants to know more about this, this article is very interesting on the subject. However, with this series, the reader can have a good base to start developing on Java 8.

On a programming language like Java, it is normal to have changes from time to time. For a language with so many years, it is impressive how Java can still evolve, reflecting the new tendencies from the more modern languages. Will it Java continue like this forever? Only time will tell….

Thank you for following me on another post from my blog, until next time.

Continue reading