Akka & Akka Streams: Construa sistemas distribuídos com atores

Hello, dear readers! Welcome to my blog. In this post, I would like to share with you my new personal project: a new book! This book comes to join my first child, where I talk about Elasticsearch.

In this book, you will find all information necessary to learn about concurrency and Akka frameworks, that allow us to develop applications using the Actor model to handle massive loads of data, using techniques such as asynchronous processing and back-pressure.

The book uses real-life examples to demonstrate the concepts, making you develop applications such as integrations using Kafka and relational databases, and an E-commerce API.

The book can be found at:

Akka & Akka Streams: Construa sistemas distribuídos com atores

Thank you for following me and supporting this blog, by visiting and sharing the knowledge, means very much to me!

If you kindly wanna support me on my adventures in producing content, feel free to donate a crypto coin to me. Blockchain is one of my new fields of interest right now, which I promise I will tackle in a future article!

If you prefer to donate Neblios (NEBL), please donate to:

Ni239Wqwvp12eHNnbXGbo6BwFbBhwF13L5

If you prefer to donate Ethereum (ETH), please donate to:

0x2D9A0781872aCe96365d2b20b3ce905E0473981D

Please keep in mind that this is entirely optional and it won’t stop me from producing content whatsoever, it is just a possible way to express recognition to my effort, and it will be very grateful on my part!

Thank you for following me on another post, until next time!

Gatling: making performance tests with Scala

Hello, dear readers! Welcome to my blog. In this post, we will talk about Gatling, a Scala library designed for developing performance/stress tests. But why do we need to make such tests? Let’s find it out!

Measuring our code

It is no mystery to anyone that performance is key to any architecture, in today’s ever-growing necessity to crunch more and more massive chunks of data at high speed. So, in order to supply this demand, we need to keep an eye out for aspects concerning performance, such as network throughputs and I/O costs.

There is some debate about when to start thinking about performance, with people both defending that it should be the first thing to do, or that you should think only at later stages when the need arises.

Wherever is your opinion on this subject, the point is that, at some point, you will be prompted to think about your application performance. When this happens, a good tool that you could use for helping you in measuring the performance are performance tests, such as stress and load tests.

The difference between stress and load tests is essentially their objectives. Stress tests objective is to check how much “punishment” an application can take it before it will breakdown, or become very close to doing so. This can help to define scaling thresholds to add more instances before the breaking point is met, for example.

Load tests, on the other hand, has the objective to test how the application will behave with different loads of data, up to a massive amount that reflects a peak situation, even more than a normal peak. This can help to identify bottlenecks, such as a search endpoint which have a more complex query that won’t withstand an access peak and will become very slow at high traffic, for example.

Well, so now that we know why is good to do performance tests, why we need a tool to do that?

Why use a library for that?

Of course, we could just write a simple script that loads a chunk of threads and start firing at will at our application – let’s assume we are talking about a REST API at this point, which is the focus of our lab later on -, and just see if it breaks after a wave of HTTP requests. Easy enough, right?

The problem with this approach is that test suite features are not that simple as it seems. Not only we need to make more complex testing scenarios, such as ramping up users (to simulate increasing usage across a timeline), distributed calls on a time frame, parallelism, etc but also we need to think about reading the results itself, as it would be useful to calculate things such as percentiles, average call durations, etc.

When we use a tool such as Gatling to this end, we get all these features out-of-box, making our life much easier. Besides, since it is already made in a way that’s easy to reuse, we can make scripts that will be used across several different applications, and can even be used on CI pipelines.

What about monitoring platforms?

You could be thinking about some monitoring platforms, such as NewRelic, which already use technologies such as Java profiling to make real-time performance monitoring, pointing out issues at specific layers such as databases, for example. These platforms are excellent and should be used, no doubt about it.

However, if possible, for applications that really are performance-critical, it could pay off to make use of a tool such as Gatling, since as said before, it could be integrated to the CI pipeline, making it possible to test his performance on critical operations even before the new code is sent to Production.

So, now that we talked about the importance of performance and about what Gatling is for and why it is a good tool, let’s start with a simple lab to show his usefulness in practice.

Lab

For this lab, we will create a simple Spring Boot API, that uses Postgres as a database. It will be a simple user CRUD service, since our primary focus is not on showing how to develop an API, but on how to use Gatling to measure it.

So, for our lab we will have an API with the following endpoints:

http://localhost:8080/user/ (POST): Creates a user;
http://localhost:8080/user/ (PATCH): Updates a user;
http://localhost:8080/user/{id} (GET): Searches for a user by his ID;
http://localhost:8080/user/name/{name} (GET): Searches for a user by his name;
http://localhost:8080/user/{id} (DELETE): Delete the user from the given ID.

The whole project is dockerized, creating an API alongside a populated database. Gatling will also run inside a container. For convenience, there is also a Makefile with the necessary commands to execute the stack. To run it, you need to have Docker and Java 11 installed and use the Makefile included on the project, as shown:

make run

If the reader doesn’t have – or want – Java installed in your machine, there is also a convenient Docker image provided on Docker hub. Just use run-docker instead of run on make and it will run everything with just Docker.

Since our focus is on Gatling, we will not enter in more discussions about Spring Boot itself and other API details. If the reader doesn’t know Spring Boot, I suggest reading this article.

To run Gatling, we will use a Docker container that will run our simulations (Gatling’s terminology for his test suites). All simulations are inside the project, on a folder called src/gatling/simulations.

Inside the gatling folder, we can also find another two folders, the first one being conf. Gatling ships with a good set of default values for configuration, so we just use this folder to set a default value for our simulations on the reports. If we don’t do this, it will keep asking for a description every time we run it. We can find all possible settings to be used on this link.

The last folder is the reports one. This folder contains the reports generated for us at every run. We will start analyzing the reports soon enough.

All coding from our lab can be found here. Now, let’s begin our simulations!

Working with Gatling

Let’s begin with our first simulation. The simulations are written on Scala. If the reader is not familiar with Scala, I suggest reading my series of articles here. Our first simulation simply makes a call for each operation of the API, as follows:

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class CrudSimulation extends Simulation {

  val httpProtocol = http // 1
    .baseUrl("http://api:8080/user") // 2
    .acceptHeader("application/json") // 3
    .userAgentHeader("Mozilla/5.0 (Windows NT 5.1; rv:31.0) " +
    "Gecko/20100101 Firefox/31.0") // 4

  val scn = scenario("CrudSimulation") // 5
    .exec(http("request_get") // 6
    .get("/1")) // 7
    .pause(5) // 8
    .exec(http("request_post")
    .post("/")
    .body(StringBody(
      """{ "name": "MytestDummy",
        | "phone":11938284334 }""".stripMargin)).asJson)
    .pause(5)
    .exec(http("request_patch")
      .patch("/")
      .body(StringBody(
        """{ "id":1, "name": "MytestDummy2",
          |"phone":11938284123 }""".stripMargin)).asJson)
    .pause(5)
    .exec(http("request_get_name")
      .get("/name/MytestDummy"))
    .pause(5)
    .exec(http("request_delete")
      .delete("/1"))
    .pause(5)


  setUp( // 9
    scn.inject(atOnceUsers(1)) // 10
  ).protocols(httpProtocol) // 11
}

Let’s introduce the simulation structure by analyzing the code:

First, we define an http object, which will be used to set the default values for our simulation;
Here we define the base URL for all our calls on the simulation;
Here we define the media header that defines that all our calls will be on JSON;
Here we define the user-agent, this is not so important on an API test, but could be useful for a website that requires testing for different browsers, for example;
Here we create our scenario. Simulations are composed of scenarios, where calls are made;
Here we prepare our first call. By creating an http object, we start defining the call we wish to make;
Here we define we want to make a GET call. On the next lines, we can also see how to make POST and PATCH calls, where a request body is provided;
Here we define a pause of 5 seconds before making the next call;
Here we invoke setUp, which will initialize the scenario to run;
Here we tell how we want to run the scenario. For this simple first simulation, we just want one user to make the calls;
Here we define the protocol used in the scenario. It is here that we pass the first object we created, with all the default values we defined at first.

Gatling also offers a recorder, where you can record a browser session to simulate user iterations. This is useful when we want to create tests for a website, instead of a API. The recorder can be found here.

As we can see, is very easy to create simulations. To run the simulation, just use the following make command:

make run

Like we talked about the run command, there is also a run-docker command, if the reader just want to use Docker

After running Gatling, in the end, we will see a table like the following:

The table has information summarizing the execution. Some interesting assets are like the average, min and max response times – in milliseconds -, and percentiles. Percentiles are calculations that show for a given data load, the percentage a given value occurs. For example, on the table below, we can see that, for our 5 count requests, 50% of the requests have 24 milliseconds response time. You can read more about percentiles here.

After running, Gatling also generated an HTML report inside the aforementioned reports folder. If we open index.html, we can see not only the data we just talked about but also other information, such as active users during the simulation, response time distribution, etc.

Now, let’s make another two scenarios. We will start to really make performance tests, both on writing and reading API operations.

We start by making a refactoring. We create some traits – Traits are more or less the equivalent of Java interfaces in Scala, but with some powerful differences – to reuse code and group the scenarios together in a single simulation. First, we create GatlingProtocol trait:

import io.gatling.core.Predef._
import io.gatling.http.Predef._

trait GatlingProtocol {

  val httpProtocol = http
    .baseUrl("http://api:8080/user")
    .acceptHeader("application/json")
    .userAgentHeader("Mozilla/5.0 (Windows NT 5.1; rv:31.0) " +
      "Gecko/20100101 Firefox/31.0")

}

Next, we create a NumberUtils trait, to reuse code we will use on both new scenarios:

trait NumberUtils {

  val leftLimit = 1L
  val rightLimit = 10L
  def generateLong = leftLimit + (Math.random * (rightLimit - leftLimit)).toLong

}

After this refactoring, this is how we code our previously simulation:

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

trait CrudSimulation {

  val crudScn = scenario("CrudScenario")
    .exec(http("request_get")
      .get("/1"))
    .pause(5)
    .exec(http("request_post")
      .post("/")
      .body(StringBody(
        """{ "name": "MytestDummy",
          | "phone":11938284334 }""".stripMargin)).asJson)
    .pause(5)
    .exec(http("request_patch")
      .patch("/")
      .body(StringBody(
        """{ "id":11, "name": "MytestDummy2",
          |"phone":11938284123 }""".stripMargin)).asJson)
    .pause(5)
    .exec(http("request_get_name")
      .get("/name/MytestDummy"))
    .pause(5)
    .exec(http("request_delete")
      .delete("/11"))
    .pause(5)


}

Then, we create two new scenarios, one with reading operations from the API, and another with the writing ones:

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

trait WriteOperationsSimulation extends NumberUtils {

  val writeScn = scenario("WriteScenario")
    .exec(http("request_post")
    .post("/")
    .body(StringBody(
      """{ "name": "MytestDummy",
        | "phone":11938284334 }""".stripMargin)).asJson)
    .pause(5)
    .exec(http("request_patch")
      .patch("/")
      .body(StringBody(
        s"""{ "id":$generateLong, "name": "MytestDummy$generateLong",
          |"phone":11938284123 }""".stripMargin)).asJson)

}

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

trait ReadOperationsSimulation extends NumberUtils {

  val readScn = scenario("ReadScenario")
    .exec(http("request_get")
      .get("/" + generateLong))
    .pause(5)
    .exec(http("request_get_name")
      .get("/name/Alexandre"))
    .pause(5)


}

Finally, we create a runner, which will set all scenarios in a simulation to run:

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class SimulationRunner extends Simulation with CrudSimulation with ReadOperationsSimulation with WriteOperationsSimulation with GatlingProtocol {

  setUp(
    crudScn.inject(atOnceUsers(1)), 
    readScn.inject(constantUsersPerSec(50) during (5 minutes)), // 1
    writeScn.inject(rampUsers(200) during (2 minutes)) // 2
  ).protocols(httpProtocol)

}

Other new features we can see here are new forms of setting how Gatling will distribute the users to fire our scenarios on the simulation. In our case, we are telling Gatling to:

Start 50 users per second along with the execution, for 5 minutes;
Start users up to 200, evenly distributed across a 2-minute time range.

More examples of configuration scenarios can be found here.

After running the tests again, we can see we get a lot more requests made – and of course, the test gets a lot more to run – and lots of data to analyze:

Analyzing for bottlenecks

Now, let’s see if we can use our tests to check for possible performance bottlenecks.

If we check the get by name endpoint report, we will see that the max response time is more then 1 second:

Now let’s imagine that this is not an acceptable response time for our needs since our business requirements declare that this endpoint will be heavily used by the clients. By using Gatling, we could detect the problem, before it was dispatched to production.

In our case, our most likely culprit is the database, since the API is so simple. Let’s try to improve the performance of the search by creating an index on the name field. After creating the index and re-running the tests, we can see that our performance has improved:

We can see that now the max response time is below 1 second, solving our performance problem.

Stressing the API

Let’s do one last test before wrapping it up. Let’s do a stress test and see how much more punishment our API can take before starting getting errors from the load.

First we increase the users injected in the scenarios:

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class SimulationRunner extends Simulation with CrudSimulation with ReadOperationsSimulation with WriteOperationsSimulation with GatlingProtocol {

  setUp(
    crudScn.inject(atOnceUsers(1)),
    readScn.inject(constantUsersPerSec(150) during (5 minutes)),
    writeScn.inject(rampUsers(500) during (2 minutes))
  ).protocols(httpProtocol)

}

Then we run:

Wow, still no failures! But of course, the degradation is perceptible. Let’s increase a little more:

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class SimulationRunner extends Simulation with CrudSimulation with ReadOperationsSimulation with WriteOperationsSimulation with GatlingProtocol {

  setUp(
    crudScn.inject(atOnceUsers(1)),
    readScn.inject(constantUsersPerSec(250) during (5 minutes)),
    writeScn.inject(rampUsers(600) during (2 minutes))
  ).protocols(httpProtocol)


}

Now we got some errors! The cause, as we can see from the report, are timeouts from server threads been exhausted due to the massive amount of requests. In a real scenario, we should think of options such as horizontal balancing, reactive streams and such. But since our focus is not on API performance in the post, we will not continue for now. The main point for this little test is to show how Gatling can help us in testing the capacity of our applications.

Conclusion

And so we conclude our lesson. With simple code, we created a powerful tool, that it will help us in testing our API for performance and load tests. Thank you for following me on another article, until next time.

Akka Streams: developing robust applications using Scala

Hi, dear readers! Welcome to my blog. A long time ago, I wrote a post about the actor model and how to use Akka to implement solutions using actors. If the reader doesn’t read the post, it can be found here. Now, more than 4 years later – how fast time goes! – it is time to revisit this world, with a much better understanding and maturity. At the time, I used good old Java to do the task.

There’s nothing wrong with using Java, but if you really want to delve on Akka, then Scala is the language of choice, especially if we want to use Akka, a project specially tailored to develop data flows that could do tasks such as system integrations.

With non-blocking IO and parallelism embedded at his core – and encouraged to be used on our custom code by following their good practices! – Akka streams allow us to develop really fast applications that can easily scale. From my personal experience, it is a specially good option for integrating with Apache Kafka.

So, without further delay, let’s begin our journey!

Actor Model

The actor model was already explained in my previous post so we won’t waste much time with this explanation. To sum it up, we have a system where actors work with each other asynchronously, creating a system where tasks are broken down on multiple steps by different actors, each one of them communicating by a personal queue (mailbox) that enqueues messages to be processed by the actors. This way, we have a scalable solution, where tasks are done in parallel.

Akka quick recap

Actors

Actors are the core components of an actor system. An actor consists of a program unit that implements logic based on messages it receives from his mailbox.

When developing Akka applications on Scala, an actor must implement a receive method, where will create logic for different types of messages it can receive. Each time a message arrives at the mailbox, the dispatcher delivers the message to the actor. It is important to notice it, however, that it is the actor which asks for the next message, as it completes processing the current message – by default, each actor process just one message a time -, this way avoiding an actor to be overloaded. This technique is called back-pressure, which we will talk more about it in the next sections.

ActorSystem

The actor system comprehends the whole actor solution. It is composed of actors, dispatchers, and mailboxes.

Applications can have multiple actor systems inside. Also, it is possible to define actor systems to be linked together remotely, forming a cluster.

Execution Context (Dispatchers)

Execution contexts, also known as dispatchers, are responsible for serving actors with messages by delivering the messages to the mailboxes.

Dispatchers are also responsible for allocating the actors themselves, including details such as parallel actor execution, using strategies such as thread pools, for example. A dispatcher can be defined globally for the whole system or defined at actor level.

An important note regarding performance with dispatchers is that they run the actors inside thinner layers that are memory-optimized, so memory consumption inside Akka solutions is lower than in traditional Java applications.

One interesting thing to notice on actor instantiation is how Akka treats actor references. When asking for an actor to be created inside a system, Akka will create an actor reference, which can be used to send messages to it.

These messages are sent using remote calls, even when the actor system is been used all locally. This guarantees that when using actor systems remotely, such as in a cluster, for example, there will be no difference in the code.

Mailboxes are, like the name suggests, repositories to messages that it will be processed by actors. Mailboxes can have different strategies to treat messages, such as unbounded lists, single and multi-consumer, priority queues and more.

Actor Supervisors & Lifecycle

When creating actors, we can create them at the system level or create them inside another actor. When creating inside an actor, we call the parent actor a Supervisor and the actors created inside are called Child Actors (when creating an actor at the system level, also known as Top-level Actors, they also are child actors, in this case from a reserved actor from Akka itself).

Every actor has a lifecycle: it can be started, on which case is running, stopped when an unrecoverable failure occurs and restarted or resumed, depending on the circumstances of the failure.

Supervisors are actors responsible for deciding for their child actors what to do when one of them faces a failure. It is possible to simply stop the actor, restart him, or resume (the main difference between restart and resume is that resources are freed on a restart, while on a resume the actor simply resumes his execution).

These decisions are called Supervisor Policies. These policies can also be set to behavior as one-for-one or all-for-one, meaning that when an error occurs on one child actor, the policy will be applied to all actors bellow (for example, all actors would restart) or just to the failed actor.

And that concludes our quick recap of Akka. Now, let’s begin our talk about Akka Stream.

Stream concepts

A stream is composed of tasks that must be done – continuously or not – in order to do a process. Each stream must have a Source, which is the beginning, a Flow composed of multiple tasks that can run at parallel depending on the needs and a Sink, which is the stream’s end.

Actor materialization

Akka Streams runs on top of Akka. That means when a stream is started, internally Akka Streams creates an actor system with actors responsible for running the tasks of the stream.

The responsibility for doing this task is of the Actor Materializer, that creates (materializes) the resources need to run the stream. One interesting thing is that it is possible to explicitly define materializing points on our flow.

These points are used by Akka streams to define points where it will group the tasks from the flow to run on separated actors, so it is a good technique to keep it in mind when doing stream tunning.

Sources

Sources are flow’s beginnings. A source is used for defining an entrypoint for data, be a finite datasource, such as a file, to an infinite one, such as a Kafka topic, for example. It is possible to zip multiple source definitions on a single combined source for processing, but still, a flow can have only one source.

Flows

Flows are the middle of the stream. One flow can have an infinite number of tasks (steps), that range from data transformation to enrichment by calling external resources.

Sinks

Sinks are flow’s endings. Analogous to sources, sinks can have multiple types of destinations, such as files, Kafka topics, REST endpoints, etc. Likewise the source, flows can also have only one sink.

Graphs

When modeling an Akka stream, as seen previously, we define a source, a sink and several flows in between. All of this generates a graph, where each node represents a task on the stream.

When instantiating a stream, a runnable graph is created, which represents a blueprint for executions. After executing the stream with the run() method for example (there’s also a runWith(Sink) method that accepts a sink as a parameter) the runnable graph is materialized and executed.

During our lab, we will see Graph Stages. Graph stages are like “boxes” that group tasks together, making them look like a single node in the final graph.

Back-pressure

One very important concept when learning about Akka streams is back-pressure. Typically, on a producer-consumer architecture, the producer will keep sending data to the consumer, without really knowing if the consumer is capable of keeping it up with the load or not. This can create a problem where a producer overloads a consumer, generating all kinds of errors and slowness.

With back-pressure, this approach is reversed. Now, it is the consumer that dictates when to receive a new message, by pulling new data at his rhythm. while no new message is asked, the producer keeps waiting for a signal and only then it starts pushing messages again.

The image below illustrates the concept in action:

Stream error handling

Of course, just like with an actor system, streams also can fail. Just like with actors, error handling is also done with supervisors, that defines policies for a stream to resume, stop or restart depending on the error.

Streams also support a recovery configuration, that allows us, for example, to chain another stream execution in case of error after several retries.

Alpakka project

The Alpakka project is an integration library composed of several components that allow us to quickly deploy integrations between several technologies, such as files, REST endpoints and even AWS technologies such as Amazon Kinesis. During the course of our lab, we will use resources from this project, so stay tuned for more!

The project documentation can be found in:

https://developer.lightbend.com/docs/alpakka/current/

Lab

Pre-requisites

So, without further delay, let’s begin! This lab requires that the reader already have some knowledge of Scala and Akka. On my blog, it is possible to read my previous post on Akka, alongside my series about the Scala language.

Creating the project & infrastructure code

To create the project, let’s begin by just creating the sbt file that will hold our project’s dependencies. We will start by creating a folder that will hold our project (all sources for the lab can be found here) and type the following on a file called build.sbt:

name := "Akka-stream-lab"
version := "1.0"
scalaVersion := "2.12.5"

enablePlugins(JavaAppPackaging)

mainClass in Compile := Some("Main")

lazy val Versions = new {
  val akkaVersion = "2.5.11"
}

lazy val akkaDependencies = Seq(
  "com.lightbend.akka" %% "akka-stream-alpakka-csv" % "0.8",
  "com.typesafe.akka" %% "akka-stream-kafka" % "0.22",
  "com.typesafe.akka" %% "akka-actor" % Versions.akkaVersion,
  "com.typesafe.akka" %% "akka-stream" % Versions.akkaVersion,
  "com.typesafe.akka" %% "akka-slf4j" % Versions.akkaVersion,
  "com.typesafe.akka" %% "akka-testkit" % Versions.akkaVersion % Test,
  "com.typesafe.akka" %% "akka-stream-testkit" % Versions.akkaVersion % Test,
  "com.typesafe.akka" %% "akka-testkit" % Versions.akkaVersion % Test
)

lazy val testDependencies = Seq(
  "org.scalacheck" %% "scalacheck" % "1.13.4" % Test,
  "org.scalamock" %% "scalamock" % "4.1.0" % Test,
  "org.mockito" % "mockito-core" % "2.19.0" % Test,
  "org.scalatest" %% "scalatest" % "3.0.5" % Test
)

lazy val loggingDependencies = Seq(
  "ch.qos.logback" % "logback-classic" % "1.2.3",
  "com.typesafe.scala-logging" %% "scala-logging" % "3.9.0",
  "org.slf4j" % "slf4j-api" % "1.7.25"
)

lazy val otherDependencies = Seq(
  "io.spray" %% "spray-json" % "1.3.5"
)

libraryDependencies ++= (
  akkaDependencies++
  loggingDependencies++
  testDependencies++
  otherDependencies
  )

As can be seen above, not only we defined a sbt project, but also included dependencies for Akka and logging, alongside Akka Streams itself. We also added a packaging plugin to simplify our use when running the project from command-line.

In order to use the plugin, we need to add it to sbt project’s definition. To do that, we create a project folder and inside create an plugins.sbt file, with the following:


logLevel := Level.Warn

addSbtPlugin("com.typesafe.sbt" % "sbt-native-packager" % "1.3.2")

Finally, we create our main Scala object that will be the App launcher for our project. We create a Scala source folder and add a Main.scala file, containing the following:


import akka.actor.ActorSystem

object Main extends App {
implicit val system: ActorSystem = ActorSystem("akka-streams-lab")
}

Our first version is very simple: It simply creates a new actor system. Of course, during our lab, it will receive more code to evolve to our final solution.

Concluding the setup, this would be the first structure of our project when seen (some other files, such as sbt’s build.properties, are created automatically when running the project using sbt):

This image is taken from Intellij, which I recommend as the IDE for the lab.

Without spoiling too much, as we can see in our next section, we will need some infrastructure code that will set it up a Kafka cluster for our use in the lab. For this, we will use Docker Compose.

So, let’s begin by creating a docker-compose.yml file, that will create our Kafka cluster. The file will be as follows:


version: '2'
services:
  zookeeper:
    image: wurstmeister/zookeeper:3.4.6
    ports:
      - 2181:2181
  kafka:
    image: wurstmeister/kafka:1.1.0
    ports:
      - 9092:9092
    environment:
      KAFKA_ADVERTISED_HOST_NAME: localhost
      KAFKA_CREATE_TOPICS: "accounts:1:1"
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181

In this simple docker-compose file, we create an embedded cluster with a broker node and a Zookeeper node and also create a accounts topic at startup. To test it out our code, with docker up and running, we can start a cluster by running:


docker-compose up -d

Finally, let’s create a logback config file for our logging. To do this, let’s create a file called logback.xml inside resources folder and enter the following:

%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} – %msg%n

That’s it! now that we have our infrastructure code ready, let’s begin the actual coding.

Lab’s use case

In our lab, we will create 2 streams. One of them will read data from a file and publish to a Kafka topic. The other one will read from this same topic and save the message on other files, this way demonstrating the flow of data from a point A to a point B.

Creating the first stream

Let’s begin by creating the first stream, which is a simple stream that will read data (accounts in our case) from a file and send to Kafka. At first, let’s code the stream in the main code itself and evolve as we develop.

First, we create a file called input1.csv and add the following:

“COD”,”NAME”,”DOCUMENT”,”AGE”,”CIVIL_STATUS”,”PHONE”,”BIRTHDAY”,”HOME_COUNTRY”,”HOME_STATE”,”HOME_CITY”,”HOME_STREET”,”HOME_STREETNUM”,”HOME_NEIGHBORHOOD”
1,”alexandre eleuterio santos lourenco 1″,”43456754356″,36,”Single”,”+5511234433443″,”21/06/1982″,”Brazil”,”São Paulo”,”São Paulo”,”false st”,3134,”neigh 1″
2,”alexandre eleuterio santos lourenco 2″,”43456754376″,37,”Single”,”+5511234433444″,”21/06/1983″,”Brazil”,”São Paulo”,”São Paulo”,”false st”,3135,”neigh 1″

That will be the csv input file for our first stream. Next, we create a case class Account to encapsulate the data from the file:


package com.alexandreesl.model

case class Account(cod: Long, name: String, document: String,
age: Int, civilStatus: String,
                   phone: String, birthday: String,
country: String, state: String,
                   city: String, street: String,
streetNum: Long, neighBorhood: String)

We also create an object to allow JSON marshaling/unmarshalling when sending/receiving messages in Kafka:


package com.alexandreesl.json

import com.alexandreesl.model.Account
import spray.json.DefaultJsonProtocol

object JsonParsing extends DefaultJsonProtocol {

  implicit val accountFormat = jsonFormat13(Account)

}

Finally, we code the stream that reads from the file and publishes to Kafka (don’t you worry, we will refactor later):


import java.nio.file.Paths

import akka.actor.ActorSystem
import akka.kafka.ProducerSettings
import akka.kafka.scaladsl.Producer
import akka.stream.scaladsl.{FileIO, Flow, Framing}
import akka.stream.{ActorMaterializer, ActorMaterializerSettings}
import akka.util.ByteString
import com.alexandreesl.model.Account
import com.typesafe.scalalogging.Logger
import org.apache.kafka.common.serialization.StringSerializer
import org.slf4j.LoggerFactory
import spray.json._
import com.alexandreesl.json.JsonParsing._
import org.apache.kafka.clients.producer.ProducerRecord
object Main extends App {

implicit val system: ActorSystem = ActorSystem("akka-streams-lab")
implicit val materializer: ActorMaterializer = ActorMaterializer(ActorMaterializerSettings(system))
implicit val ec = system.dispatcher
private val logger = Logger(LoggerFactory.getLogger("Main"))
val config = system.settings.config.getConfig("akka.kafka.producer")
val producerSettings =
ProducerSettings(config, new StringSerializer, new StringSerializer)
.withBootstrapServers("localhost:9092")

logger.info("Starting streams...")

private def convertToClass(csv: Array[String]): Account = {
Account(csv(0).toLong,
csv(1), csv(2),
csv(3).toInt, csv(4),
csv(5), csv(6),
csv(7), csv(8),
csv(9), csv(10),
csv(11).toLong, csv(12))
}

private val flow = Flow[String].filter(s => !s.contains("COD"))
.map(line => {
convertToClass(line.split(","))
})

FileIO.fromPath(Paths.get("input1.csv"))
.via(Framing.delimiter(ByteString("\n"), 4096)
.map(_.utf8String))
.via(flow)
.map(value => new ProducerRecord[String, String]("accounts", value.toJson.compactPrint))
.runWith(Producer.plainSink(producerSettings))

logger.info("Stream system is initialized!")

}

As we can see, the code is pretty straight-forward. We just added a stream that reads from a csv file and dispatches the lines to Kafka. To see the messages on Kafka, first, we start the application – don’t forget to run docker compose up first! -, and them we can use a shell inside the broker’s container, as follows:

docker exec -t -i akka-stream-lab_kafka_1 /opt/kafka/bin/kafka-console-consumer.sh –bootstrap-server :9092 –topic accounts –from-beginning

This will produce an output as follows:

{“age”:36,”birthday”:”\”21/06/1982\””,”city”:”\”São Paulo\””,”civilStatus”:”\”Single\””,”cod”:1,”country”:”\”Brazil\””,”document”:”\”43456754356\””,”name”:”\”alexandre eleuterio santos lourenco 1\””,”neighBorhood”:”\”neigh 1\””,”phone”:”\”+5511234433443\””,”state”:”\”São Paulo\””,”street”:”\”false st\””,”streetNum”:3134}
{“age”:37,”birthday”:”\”21/06/1983\””,”city”:”\”São Paulo\””,”civilStatus”:”\”Single\””,”cod”:2,”country”:”\”Brazil\””,”document”:”\”43456754376\””,”name”:”\”alexandre eleuterio santos lourenco 2\””,”neighBorhood”:”\”neigh 1\””,”phone”:”\”+5511234433444\””,”state”:”\”São Paulo\””,”street”:”\”false st\””,”streetNum”:3135}

Now that we have coded our first stream, let’s create our second stream. Don’t you worry about the messing code right now, we will refactor later when implementing error handling using actors.

Creating the second Stream

Now, let’s create our second stream. This stream will read from Kafka and generate two files, one with personal data and another with address data.

For this, we will use a graph stage, that will make the file write in parallel. We will disable autocommit for Kafka consumption and commit only at the end.

Before creating the stage, let’s do our first refactoring, by moving the Account case class to a GraphMessages object, which will hold all case classes we will use on our coding:

package com.alexandreesl.graph

import akka.kafka.ConsumerMessage

object GraphMessages {

  case class Account(cod: Long, name: String, document: String,
                     age: Int, civilStatus: String,
                     phone: String, birthday: String,
                     country: String, state: String,
                     city: String, street: String,
                     streetNum: Long, neighBorhood: String)

  case class InputMessage(acc: Account,
                          offset: ConsumerMessage.CommittableOffset)

  case class AccountPersonalData(cod: Long, name: String,
                                 document: String, age: Int,
                                 civilStatus: String,
                                 phone: String, birthday: String)

  case class AccountAddressData(cod: Long, country: String,
                                state: String, city: String,
                                street: String, streetNum: Long,
                                neighBorhood: String)

}

We also update our Json protocol accordingly, since we will use JSON marshaling for the other classes as well:

package com.alexandreesl.json

import com.alexandreesl.graph.GraphMessages.{Account, AccountAddressData, AccountPersonalData}
import spray.json.DefaultJsonProtocol

object JsonParsing extends DefaultJsonProtocol {

  implicit val accountFormat = jsonFormat13(Account)
  implicit val accountPersonalFormat = jsonFormat7(AccountPersonalData)
  implicit val accountAddressFormat = jsonFormat7(AccountAddressData)

}

Finally, let’s create our graph stage. Notice the ~> symbol? That symbol is used inside the graph stage builder to create the stage flow. This allows us to code flows in a visual manner, making a lot simpler to design stream flows.

package com.alexandreesl.graph

import java.nio.file.{Paths, StandardOpenOption}

import akka.actor.ActorSystem
import akka.kafka.ConsumerMessage
import akka.stream.{ActorMaterializer, FlowShape}
import akka.stream.scaladsl.{Broadcast, FileIO, Flow, GraphDSL, Source, Zip}
import akka.util.ByteString
import com.alexandreesl.graph.GraphMessages.{Account, AccountAddressData, AccountPersonalData, InputMessage}
import spray.json._
import com.alexandreesl.json.JsonParsing._

object AccountWriterGraphStage {

  val personal = Paths.get("personal.csv")
  val address = Paths.get("address.csv")

  def graph(implicit system: ActorSystem, materializer: ActorMaterializer) =
Flow.fromGraph(GraphDSL.create() { implicit builder =>

    import GraphDSL.Implicits._

    val flowPersonal = Flow[InputMessage].map(msg => {
      Source.single(AccountPersonalData(msg.acc.cod,
        msg.acc.name, msg.acc.document, msg.acc.age,
msg.acc.civilStatus, msg.acc.phone,
msg.acc.birthday).toJson.compactPrint + "\n")
        .map(t => ByteString(t))
        .runWith(FileIO.toPath(personal,
Set(StandardOpenOption.CREATE, StandardOpenOption.APPEND)))
      msg.acc
    })

    val flowAddress = Flow[InputMessage].map(msg => {
      Source.single(AccountAddressData(msg.acc.cod,
        msg.acc.country, msg.acc.state,
msg.acc.city, msg.acc.street, msg.acc.streetNum,
msg.acc.neighBorhood).toJson.compactPrint + "\n")
        .map(t => ByteString(t))
        .runWith(FileIO.toPath(address,
Set(StandardOpenOption.CREATE, StandardOpenOption.APPEND)))
      msg.offset
    })

    val bcastJson = builder.add(Broadcast[InputMessage](2))
    val zip = builder.add(Zip[Account, ConsumerMessage.CommittableOffset])

    bcastJson ~> flowPersonal ~> zip.in0
    bcastJson ~> flowAddress ~> zip.in1

    FlowShape(bcastJson.in, zip.out)

  })

}

On our stage, we created two flows that executed in parallel a broadcast, each one passing through one value from the original input message. In the end, we use zip to generate a tuple from the two objects that will be passed to the next stage. Finally, let’s create our stream, which will be using the graph stage as part of the stream. I promise this will be the last time we will see that big messy main object: next section we will start refactoring.

import java.nio.file.Paths

import akka.actor.ActorSystem
import akka.kafka.{ConsumerSettings, ProducerSettings, Subscriptions}
import akka.kafka.scaladsl.{Consumer, Producer}
import akka.stream.scaladsl.{FileIO, Flow, Framing, Sink}
import akka.stream.{ActorMaterializer, ActorMaterializerSettings}
import akka.util.ByteString
import com.alexandreesl.graph.AccountWriterGraphStage
import com.alexandreesl.graph.GraphMessages.{Account, InputMessage}
import com.typesafe.scalalogging.Logger
import org.apache.kafka.common.serialization.{StringDeserializer, StringSerializer}
import org.slf4j.LoggerFactory
import spray.json._
import com.alexandreesl.json.JsonParsing._
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.clients.producer.ProducerRecord

import scala.concurrent.Future

object Main extends App {

  implicit val system: ActorSystem = ActorSystem("akka-streams-lab")
  implicit val materializer: ActorMaterializer = ActorMaterializer(ActorMaterializerSettings(system))
  implicit val ec = system.dispatcher
  private val logger = Logger(LoggerFactory.getLogger("Main"))
  val configProducer = system.settings.config.getConfig("akka.kafka.producer")
  val producerSettings =
    ProducerSettings(configProducer, new StringSerializer, new StringSerializer)
      .withBootstrapServers("localhost:9092")
  val configConsumer = system.settings.config.getConfig("akka.kafka.consumer")
  val consumerSettings =
    ConsumerSettings(configConsumer, new StringDeserializer, new StringDeserializer)
      .withBootstrapServers("localhost:9092")
      .withGroupId("group1")
      .withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")

  logger.info("Starting streams...")

  private def convertToClass(csv: Array[String]): Account = {
    Account(csv(0).toLong,
      csv(1), csv(2),
      csv(3).toInt, csv(4),
      csv(5), csv(6),
      csv(7), csv(8),
      csv(9), csv(10),
      csv(11).toLong, csv(12))
  }

  private val flow = Flow[String].filter(s => !s.contains("COD"))
    .map(line => {
      convertToClass(line.split(","))
    })

  FileIO.fromPath(Paths.get("input1.csv"))
    .via(Framing.delimiter(ByteString("\n"), 4096)
      .map(_.utf8String))
    .via(flow)
    .map(value => new ProducerRecord[String, String]("accounts", value.toJson.compactPrint))
    .runWith(Producer.plainSink(producerSettings))

  Consumer
    .committableSource(consumerSettings, Subscriptions.topics("accounts"))
    .mapAsync(10)(msg =>
      Future.successful(InputMessage(msg.record.value.parseJson.convertTo[Account], msg.committableOffset))
    ).via(AccountWriterGraphStage.graph)
    .mapAsync(10) { tuple =>
      val acc = tuple._1
      logger.info(s"persisted Account: $acc")
      tuple._2.commitScaladsl()
    }.runWith(Sink.ignore)

  logger.info("Stream system is initialized!")

}

That’s it! since we started both streams in the main application, it is possible to test the stream simply by running the application, which will make the first stream to enqueue 2 messages, that it will be dequeued by the other stream. If we see the logs, we will see the following:

[INFO] [12/27/2018 23:59:52.395] [akka-streams-lab-akka.actor.default-dispatcher-3] [SingleSourceLogic(akka://akka-streams-lab)] Assigned partitions: Set(accounts-0). All partitions: Set(accounts-0)
23:59:52.436 [akka-streams-lab-akka.actor.default-dispatcher-2] INFO Main – persisted Account: Account(1,”alexandre eleuterio santos lourenco 1″,”43456754356″,36,”Single”,”+5511234433443″,”21/06/1982″,”Brazil”,”São Paulo”,”São Paulo”,”false st”,3134,”neigh 1″)
23:59:52.449 [akka-streams-lab-akka.actor.default-dispatcher-2] INFO Main – persisted Account: Account(2,”alexandre eleuterio santos lourenco 2″,”43456754376″,37,”Single”,”+5511234433444″,”21/06/1983″,”Brazil”,”São Paulo”,”São Paulo”,”false st”,3135,”neigh 1″)

And if we inspect the files, we will see that it wrote the data accordingly, as we can see below:

personal.csv:

{“age”:36,”birthday”:”\”21/06/1982\””,”civilStatus”:”\”Single\””,”cod”:1,”document”:”\”43456754356\””,”name”:”\”alexandre eleuterio santos lourenco 1\””,”phone”:”\”+5511234433443\””}
{“age”:37,”birthday”:”\”21/06/1983\””,”civilStatus”:”\”Single\””,”cod”:2,”document”:”\”43456754376\””,”name”:”\”alexandre eleuterio santos lourenco 2\””,”phone”:”\”+5511234433444\””}

address.csv:

{“city”:”\”São Paulo\””,”cod”:1,”country”:”\”Brazil\””,”neighBorhood”:”\”neigh 1\””,”state”:”\”São Paulo\””,”street”:”\”false st\””,”streetNum”:3134}
{“city”:”\”São Paulo\””,”cod”:2,”country”:”\”Brazil\””,”neighBorhood”:”\”neigh 1\””,”state”:”\”São Paulo\””,”street”:”\”false st\””,”streetNum”:3135}

Now, let’s refactor this code to be more maintainable and introduce error handling.

Implementing error handling

In order to easier error handling, we will move our streams to actors. Them we will create a supervisor, defining what to do when an error occurs.

Let’s begin by creating the first actor, called KafkaImporterActor and move the stream:


package com.alexandreesl.actor

import java.nio.file.Paths

import akka.actor.{Actor, ActorLogging, ActorSystem, Props}
import akka.kafka.ProducerSettings
import akka.kafka.scaladsl.Producer
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{FileIO, Flow, Framing}
import akka.util.ByteString
import com.alexandreesl.actor.KafkaImporterActor.Start
import com.alexandreesl.graph.GraphMessages.Account
import com.alexandreesl.json.JsonParsing._
import org.apache.kafka.clients.producer.ProducerRecord
import org.apache.kafka.common.serialization.StringSerializer
import spray.json._

import scala.concurrent.ExecutionContextExecutor

class KafkaImporterActor extends Actor with ActorLogging {

implicit val actorSystem: ActorSystem = context.system
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val dispatcher: ExecutionContextExecutor = context.system.dispatcher

private def convertToClass(csv: Array[String]): Account = {
Account(csv(0).toLong,
csv(1), csv(2),
csv(3).toInt, csv(4),
csv(5), csv(6),
csv(7), csv(8),
csv(9), csv(10),
csv(11).toLong, csv(12))
}

private val flow = Flow[String].filter(s => !s.contains("COD"))
.map(line => {
convertToClass(line.split(","))
})
private val configProducer = actorSystem.settings.config.getConfig("akka.kafka.producer")
private val producerSettings =
ProducerSettings(configProducer, new StringSerializer, new StringSerializer)
.withBootstrapServers("localhost:9092")

override def preStart(): Unit = {
self ! Start
}

override def receive: Receive = {
case Start =>
FileIO.fromPath(Paths.get("input1.csv"))
.via(Framing.delimiter(ByteString("\n"), 4096)
.map(_.utf8String))
.via(flow)
.map(value => new ProducerRecord[String, String]("accounts", value.toJson.compactPrint))
.runWith(Producer.plainSink(producerSettings))

}

}

object KafkaImporterActor {

val name = "Kafka-Importer-actor"

def props = Props(new KafkaImporterActor)

case object Start

}

On this actor, we just moved our configurations and created a receive method. On that method, we created a message case that it is fired at actor startup, making the actor starting up the stream as soon it is instantiated.

Now, let’s do the same to the other stream:


package com.alexandreesl.actor

import akka.actor.{Actor, ActorLogging, ActorSystem, Props}
import akka.kafka.{ConsumerSettings, Subscriptions}
import akka.kafka.scaladsl.Consumer
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.Sink
import com.alexandreesl.actor.KafkaExporterActor.Start
import com.alexandreesl.graph.AccountWriterGraphStage
import com.alexandreesl.graph.GraphMessages.{Account, InputMessage}
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.StringDeserializer
import spray.json._
import com.alexandreesl.json.JsonParsing._

import scala.concurrent.{ExecutionContextExecutor, Future}

class KafkaExporterActor extends Actor with ActorLogging {

implicit val actorSystem: ActorSystem = context.system
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val dispatcher: ExecutionContextExecutor = context.system.dispatcher
private val configConsumer = actorSystem.settings.config.getConfig("akka.kafka.consumer")
private val consumerSettings =
ConsumerSettings(configConsumer, new StringDeserializer, new StringDeserializer)
.withBootstrapServers("localhost:9092")
.withGroupId("group1")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")

override def preStart(): Unit = {
self ! Start
}

override def receive: Receive = {
case Start =>
Consumer
.committableSource(consumerSettings, Subscriptions.topics("accounts"))
.mapAsync(10)(msg =>
Future.successful(InputMessage(msg.record.value.parseJson.convertTo[Account], msg.committableOffset))
).via(AccountWriterGraphStage.graph)
.mapAsync(10) { tuple =>
val acc = tuple._1
log.info(s"persisted Account: $acc")
tuple._2.commitScaladsl()
}.runWith(Sink.ignore)

}

}

object KafkaExporterActor {

val name = "Kafka-Exporter-actor"

def props = Props(new KafkaExporterActor)

case object Start

}

Finally, as we moved the code to actors, we now can simplify the main object to be just this:


import akka.actor.ActorSystem
import com.alexandreesl.actor.{KafkaExporterActor, KafkaImporterActor}
import com.typesafe.scalalogging.Logger
import org.slf4j.LoggerFactory

object Main extends App {

implicit val system: ActorSystem = ActorSystem("akka-streams-lab")
private val logger = Logger(LoggerFactory.getLogger("Main"))

logger.info("Starting streams...")

system.actorOf(KafkaImporterActor.props, KafkaImporterActor.name)
system.actorOf(KafkaExporterActor.props, KafkaExporterActor.name)

logger.info("Stream system is initialized!")

}

If we run the application again, we will see that it runs just like before, proving our refactoring was a success.

Now that our code is better organized, let’s introduce a supervisor to guarantee error handling. We will define a supervisor strategy in our actors and backoff policies to make the actors restart gradually slower as errors repeat, for example, to wait for Kafka to recover from a shutdown.

To do this, we change our main object code like this:


import akka.actor.{ActorSystem, OneForOneStrategy, Props, SupervisorStrategy}
import akka.pattern.{Backoff, BackoffSupervisor}
import com.alexandreesl.actor.{KafkaExporterActor, KafkaImporterActor}
import com.typesafe.scalalogging.Logger
import org.slf4j.LoggerFactory

import scala.concurrent.duration._

object Main extends App {

implicit val system: ActorSystem = ActorSystem("akka-streams-lab")
private val logger = Logger(LoggerFactory.getLogger("Main"))

logger.info("Starting streams...")

private val supervisorStrategy = OneForOneStrategy() {
case ex: Exception =>
logger.info(s"exception: $ex")
SupervisorStrategy.Restart

}
private val importerProps: Props = BackoffSupervisor.props(
Backoff.onStop(
childProps = KafkaImporterActor.props,
childName = KafkaImporterActor.name,
minBackoff = 3.seconds,
maxBackoff = 30.seconds,
randomFactor = 0.2
).withSupervisorStrategy(supervisorStrategy)
)
private val exporterProps: Props = BackoffSupervisor.props(
Backoff.onStop(
childProps = KafkaExporterActor.props,
childName = KafkaExporterActor.name,
minBackoff = 3.seconds,
maxBackoff = 30.seconds,
randomFactor = 0.2
).withSupervisorStrategy(supervisorStrategy)
)

system.actorOf(importerProps, "Kafka-importer")
system.actorOf(exporterProps, "Kafka-exporter")

logger.info("Stream system is initialized!")

}

On our code, we now defined backoff policies that start restarting after 3 seconds, all up to 30 seconds, randomly scaling up the time between retries after each retry. As supervisor policy, we defined a OneForOne strategy, meaning that if one of the actors restart, only the faulty actor will be affected by the policy.

Finally, we define a simple policy where any errors that occur will be logged and the actor will be restarted. Since errors in the stream will also escalate to the encapsulating actor, this means that errors in the stream will also make the actor fail, causing a restart.

To make this escalation to work, we need to change our actors to make the errors inside the streams to propagate. To do this, we change the actors as follows, adding code to check the status from the stream’s futures:

KafkaImporterActor


package com.alexandreesl.actor

import java.nio.file.Paths

import akka.actor.{Actor, ActorLogging, ActorSystem, PoisonPill, Props}
import akka.kafka.ProducerSettings
import akka.kafka.scaladsl.Producer
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{FileIO, Flow, Framing}
import akka.util.ByteString
import com.alexandreesl.actor.KafkaImporterActor.Start
import com.alexandreesl.graph.GraphMessages.Account
import com.alexandreesl.json.JsonParsing._
import org.apache.kafka.clients.producer.ProducerRecord
import org.apache.kafka.common.serialization.StringSerializer
import spray.json._

import scala.concurrent.ExecutionContextExecutor
import scala.util.{Failure, Success}

class KafkaImporterActor extends Actor with ActorLogging {

implicit val actorSystem: ActorSystem = context.system
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val dispatcher: ExecutionContextExecutor = context.system.dispatcher

private def convertToClass(csv: Array[String]): Account = {
Account(csv(0).toLong,
csv(1), csv(2),
csv(3).toInt, csv(4),
csv(5), csv(6),
csv(7), csv(8),
csv(9), csv(10),
csv(11).toLong, csv(12))
}

private val flow = Flow[String].filter(s => !s.contains("COD"))
.map(line => {
convertToClass(line.split(","))
})
private val configProducer = actorSystem.settings.config.getConfig("akka.kafka.producer")
private val producerSettings =
ProducerSettings(configProducer, new StringSerializer, new StringSerializer)
.withBootstrapServers("localhost:9092")

override def preStart(): Unit = {
self ! Start
}

override def receive: Receive = {
case Start =>
val done = FileIO.fromPath(Paths.get("input1.csv"))
.via(Framing.delimiter(ByteString("\n"), 4096)
.map(_.utf8String))
.via(flow)
.map(value => new ProducerRecord[String, String]("accounts", value.toJson.compactPrint))
.runWith(Producer.plainSink(producerSettings))
done onComplete {
case Success(_) =>
log.info("I completed successfully, I am so happy :)")
case Failure(ex) =>
log.error(ex, "I received a error! Goodbye cruel world!")
self ! PoisonPill
}

}

}

object KafkaImporterActor {

val name = "Kafka-Importer-actor"

def props = Props(new KafkaImporterActor)

case object Start

}

KafkaExporterActor


package com.alexandreesl.actor

import akka.actor.{Actor, ActorLogging, ActorSystem, PoisonPill, Props}
import akka.kafka.{ConsumerSettings, Subscriptions}
import akka.kafka.scaladsl.Consumer
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.Sink
import com.alexandreesl.actor.KafkaExporterActor.Start
import com.alexandreesl.graph.AccountWriterGraphStage
import com.alexandreesl.graph.GraphMessages.{Account, InputMessage}
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.StringDeserializer
import spray.json._
import com.alexandreesl.json.JsonParsing._

import scala.concurrent.{ExecutionContextExecutor, Future}
import scala.util.{Failure, Success}

class KafkaExporterActor extends Actor with ActorLogging {

implicit val actorSystem: ActorSystem = context.system
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val dispatcher: ExecutionContextExecutor = context.system.dispatcher
private val configConsumer = actorSystem.settings.config.getConfig("akka.kafka.consumer")
private val consumerSettings =
ConsumerSettings(configConsumer, new StringDeserializer, new StringDeserializer)
.withBootstrapServers("localhost:9092")
.withGroupId("group1")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")

override def preStart(): Unit = {
self ! Start
}

override def receive: Receive = {
case Start =>
val done = Consumer
.committableSource(consumerSettings, Subscriptions.topics("accounts"))
.mapAsync(10)(msg =>
Future.successful(InputMessage(msg.record.value.parseJson.convertTo[Account], msg.committableOffset))
).via(AccountWriterGraphStage.graph)
.mapAsync(10) { tuple =>
val acc = tuple._1
log.info(s"persisted Account: $acc")
tuple._2.commitScaladsl()
}.runWith(Sink.ignore)
done onComplete {
case Success(_) =>
log.info("I completed successfully, I am so happy :)")
case Failure(ex) =>
log.error(ex, "I received a error! Goodbye cruel world!")
self ! PoisonPill
}

}

}

object KafkaExporterActor {

val name = "Kafka-Exporter-actor"

def props = Props(new KafkaExporterActor)

case object Start

}

Finally, let’s test it out. We begin by shutting down Kafka without stopping our application, by running:

docker-compose down

If we look at the logs, we will see the streams will start complaining about not connecting to Kafka. After some time, we will get an actor terminated error, caused by the poison pill we make it swallow:

[INFO] [12/28/2018 22:23:54.236] [akka-streams-lab-akka.actor.default-dispatcher-4] [akka://akka-streams-lab/system/kafka-consumer-1] Message [akka.kafka.KafkaConsumerActor$Stop$] without sender to Actor[akka://akka-streams-lab/system/kafka-consumer-1#1868012916] was not delivered. [2] dead letters encountered. If this is not an expected behavior, then [Actor[akka://akka-streams-lab/system/kafka-consumer-1#1868012916]] may have terminated unexpectedly, This logging can be turned off or adjusted with configuration settings ‘akka.log-dead-letters’ and ‘akka.log-dead-letters-during-shutdown’.
[ERROR] [12/28/2018 22:23:54.236] [akka-streams-lab-akka.actor.default-dispatcher-5] [akka://akka-streams-lab/user/Kafka-exporter/Kafka-Exporter-actor] I received a error! Goodbye cruel world!
akka.kafka.ConsumerFailed: Consumer actor terminated
at akka.kafka.internal.SingleSourceLogic.$anonfun$preStart$1(SingleSourceLogic.scala:66)
at akka.kafka.internal.SingleSourceLogic.$anonfun$preStart$1$adapted(SingleSourceLogic.scala:53)
at akka.stream.stage.GraphStageLogic$StageActor.internalReceive(GraphStage.scala:230)
at akka.stream.stage.GraphStageLogic$StageActor.$anonfun$callback$1(GraphStage.scala:198)
at akka.stream.stage.GraphStageLogic$StageActor.$anonfun$callback$1$adapted(GraphStage.scala:198)
at akka.stream.impl.fusing.GraphInterpreter.runAsyncInput(GraphInterpreter.scala:454)
at

If we just keep watching, we will see this cycle endless repeating, as streams are restarted, they fail to connect to Kafka and the poison pill is swallowed again.

To make the application come back again, let’s restart our Kafka cluster with:

docker-compose up -d

We will see that after Kafka returns, the streams will resume to normal.

And that concludes our error handling code. Of course, that it is not all we can do in this field. Another interesting error handling technique that can be used in some cases is recovering, where we can define another stream to be executed in case of a failure, as a circuit breaker. This can be seen in more detail here.

Finally, let’s test our Package plugin, by running the code in the terminal. Let’s open a terminal and run the following:

sbt stage

This will prepare our application, including a shell script to run the application. We can run it by typing:

./target/universal/stage/bin/akka-stream-lab

After entering, we will see that our application will run just like in Intellij:

Automated testing the stream

Finally, to wrap it up, we will see how to test our streams. Automated tests are important to code’s sturdiness, also allowing CI pipelines to be implemented efficiently.

Streams can be tested by using probes to run the streams and check the results. Let’s start by creating a test for the converter flow that generates accounts from csv lines – the rest of the code would just be testing third-party libraries so we will focus on our own code only – and next, we will test our graph stage.

On our tests, we will use several traits to add support to several features we will/can use. It is good practice to join all traits inside a single one so our test classes won’t have a big single line of trait declarations.

So let’s begin by creating our trait:


package com.alexandreesl.test

import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll, Matchers, WordSpecLike}

trait TestEnvironment extends WordSpecLike with Matchers 

with BeforeAndAfter with BeforeAndAfterAll

Before coding the test, we will move the flow on KafkaImporterActor to the object companion, this way allowing us to reference in the test:


package com.alexandreesl.actor

import java.nio.file.Paths

import akka.actor.{Actor, ActorLogging, ActorSystem, PoisonPill, Props}
import akka.kafka.ProducerSettings
import akka.kafka.scaladsl.Producer
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{FileIO, Flow, Framing}
import akka.util.ByteString
import com.alexandreesl.actor.KafkaImporterActor.Start
import com.alexandreesl.graph.GraphMessages.Account
import com.alexandreesl.json.JsonParsing._
import org.apache.kafka.clients.producer.ProducerRecord
import org.apache.kafka.common.serialization.StringSerializer
import spray.json._

import scala.concurrent.ExecutionContextExecutor
import scala.util.{Failure, Success}

class KafkaImporterActor extends Actor with ActorLogging {

implicit val actorSystem: ActorSystem = context.system
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val dispatcher: ExecutionContextExecutor = context.system.dispatcher

private val configProducer = actorSystem.settings.config.getConfig("akka.kafka.producer")
private val producerSettings =
ProducerSettings(configProducer, new StringSerializer, new StringSerializer)
.withBootstrapServers("localhost:9092")

override def preStart(): Unit = {
self ! Start
}

override def receive: Receive = {
case Start =>
val done = FileIO.fromPath(Paths.get("input1.csv"))
.via(Framing.delimiter(ByteString("\n"), 4096)
.map(_.utf8String))
.via(KafkaImporterActor.flow)
.map(value => new ProducerRecord[String, String]("accounts", value.toJson.compactPrint))
.runWith(Producer.plainSink(producerSettings))
done onComplete {
case Success(_) =>
log.info("I completed successfully, I am so happy :)")
case Failure(ex) =>
log.error(ex, "I received a error! Goodbye cruel world!")
self ! PoisonPill
}

}

}

object KafkaImporterActor {

private def convertToClass(csv: Array[String]): Account = {
Account(csv(0).toLong,
csv(1), csv(2),
csv(3).toInt, csv(4),
csv(5), csv(6),
csv(7), csv(8),
csv(9), csv(10),
csv(11).toLong, csv(12))
}

val flow = Flow[String].filter(s => !s.contains("COD"))
.map(line => {
convertToClass(line.split(","))
})

val name = "Kafka-Importer-actor"

def props = Props(new KafkaImporterActor)

case object Start

}

Next, we will implement an different equals method on Account class so it will work properly in test assertations:


package com.alexandreesl.graph

import akka.kafka.ConsumerMessage

object GraphMessages {

case class Account(cod: Long, name: String, document: String, age: Int, civilStatus: String,
phone: String, birthday: String, country: String, state: String,
city: String, street: String, streetNum: Long, neighBorhood: String) {
override def equals(that: Any): Boolean =
that match {
case that: Account => that.canEqual(this) && that.cod == this.cod
case _ => false
}
}

case class InputMessage(acc: Account, offset: ConsumerMessage.CommittableOffset)

case class AccountPersonalData(cod: Long, name: String, document: String, age: Int, civilStatus: String,
phone: String, birthday: String)

case class AccountAddressData(cod: Long, country: String, state: String,
city: String, street: String, streetNum: Long, neighBorhood: String)

}

Finally, let’s code the test:


package com.alexandreesl.test.importer

import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import akka.testkit.{TestKit, TestProbe}
import com.alexandreesl.actor.KafkaImporterActor
import com.alexandreesl.graph.GraphMessages.Account
import com.alexandreesl.test.TestEnvironment

import scala.concurrent.duration._

class KafkaImporterActorSpec extends TestKit(ActorSystem("MyTestSystem")) with TestEnvironment {
implicit val materializer = ActorMaterializer()
implicit val ec = system.dispatcher

override def afterAll() = TestKit.shutdownActorSystem(system)

"A csv line in a file " when {
val csv = "1,\"alexandre eleuterio santos lourenco 1\",\"43456754356\",36,\"Single\",\"+5511234433443\",\"21/06/1982\",\"Brazil\",\"São Paulo\",\"São Paulo\",\"false st\",3134,\"neigh 1\""

" should convert to Account case class " in {

val probe = TestProbe()

Source.single(csv)
.via(KafkaImporterActor.flow)
.to(Sink.actorRef(probe.ref, "completed"))
.run()

probe.expectMsg(2.seconds, Account(1, "alexandre eleuterio santos lourenco 1",
"43456754356", 36, "Single", "+5511234433443", "21/06/1982",
"Brazil", "São Paulo", "São Paulo", "false st", 3134, "neigh 1"))
probe.expectMsg(2.seconds, "completed")

}

}

}

On this code, we create a probe that waits for a message containing the Account converted from the flow and a “completed” message, that the sink will emit at the end. The 2 seconds timeout is to control how much time the probe will wait for a message to come.

Now, let’s code our second test. Before writing the test itself, let’s make a little refactoring on KafkaExporterActor, by exposing a part of the stream to the spec. This way we will test all our custom code:


package com.alexandreesl.actor

import akka.actor.{Actor, ActorLogging, ActorSystem, PoisonPill, Props}
import akka.kafka.{ConsumerSettings, Subscriptions}
import akka.kafka.scaladsl.Consumer
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Flow, Sink}
import com.alexandreesl.actor.KafkaExporterActor.Start
import com.alexandreesl.graph.AccountWriterGraphStage
import com.alexandreesl.graph.GraphMessages.{Account, InputMessage}
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.StringDeserializer
import spray.json._
import com.alexandreesl.json.JsonParsing._
import com.typesafe.scalalogging.Logger
import org.slf4j.LoggerFactory

import scala.concurrent.{ExecutionContextExecutor, Future}
import scala.util.{Failure, Success}

class KafkaExporterActor extends Actor with ActorLogging {

implicit val actorSystem: ActorSystem = context.system
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val dispatcher: ExecutionContextExecutor = context.system.dispatcher
private val configConsumer = actorSystem.settings.config.getConfig("akka.kafka.consumer")
private val consumerSettings =
ConsumerSettings(configConsumer, new StringDeserializer, new StringDeserializer)
.withBootstrapServers("localhost:9092")
.withGroupId("group1")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")

override def preStart(): Unit = {
self ! Start
}

override def receive: Receive = {
case Start =>
val done = Consumer
.committableSource(consumerSettings, Subscriptions.topics("accounts"))
.mapAsync(10)(msg =>
Future.successful(InputMessage(msg.record.value.parseJson.convertTo[Account], msg.committableOffset))
)
.via(KafkaExporterActor.flow)
.runWith(Sink.ignore)
done onComplete {
case Success(_) =>
log.info("I completed successfully, I am so happy :)")
case Failure(ex) =>
log.error(ex, "I received a error! Goodbye cruel world!")
self ! PoisonPill
}

}

}

object KafkaExporterActor {

private val logger = Logger(LoggerFactory.getLogger("KafkaExporterActor"))

def flow()(implicit actorSystem: ActorSystem,
materializer: ActorMaterializer) =
Flow[InputMessage].via(AccountWriterGraphStage.graph)
.mapAsync(10) { tuple =>
val acc = tuple._1
logger.info(s"persisted Account: $acc")
tuple._2.commitScaladsl()
}

val name = "Kafka-Exporter-actor"

def props = Props(new KafkaExporterActor)

case object Start

}

Finally, let’s code our test:


package com.alexandreesl.test.exporter

import akka.Done
import akka.actor.ActorSystem
import akka.kafka.ConsumerMessage.CommittableOffset
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import akka.testkit.{TestKit, TestProbe}
import com.alexandreesl.actor.KafkaExporterActor
import com.alexandreesl.graph.GraphMessages.{Account, InputMessage}
import com.alexandreesl.test.TestEnvironment
import org.mockito.Mockito.{times, verify, when}
import org.scalatest.mockito.MockitoSugar.mock

import scala.concurrent.Future
import scala.concurrent.duration._

class KafkaExporterActorSpec extends TestKit(ActorSystem("MyTestSystem")) with TestEnvironment {
implicit val materializer = ActorMaterializer()
implicit val ec = system.dispatcher

override def afterAll() = TestKit.shutdownActorSystem(system)

"A Account object " when {

val offset = mock[CommittableOffset]
when(offset commitScaladsl) thenReturn Future.successful(Done)
val account = Account(1, "alexandre eleuterio santos lourenco 1",
"43456754356", 36, "Single", "+5511234433443", "21/06/1982",
"Brazil", "São Paulo", "São Paulo", "false st", 3134, "neigh 1")
val inputMessage = InputMessage(account, offset)

" should pass through the graph stage " in {

val probe = TestProbe()
Source.single(inputMessage)
.via(KafkaExporterActor.flow)
.to(Sink.actorRef(probe.ref, "completed"))
.run()

probe.expectMsg(2.seconds, Done)
probe.expectMsg(2.seconds, "completed")
verify(offset, times(1)) commitScaladsl

}

}

}

On this code, again we wait for the probe to receive messages. In this case, we first receive the Done object from Kafka commit – in this case, we create a mock object in order to allow us to run tests without Kafka – and next receive our good old “completed” message. Finally, we test if our mock was called, to assure that the flow is committing messages back to Kafka after processing.

Of course, this was just a tiny taste of what can be done with Akka’s Testkit. The probe we used just for sinks can also be used for sources as well and just like we test streams, it is also possible to test actors communicating with each other in an actor solution, for example.

In the references section, it is possible to get links to all documentation supplied to this and other subjects observed on this article.

Going beyond

Of course, this brief article can’t possibly talk about every detail of Akka and Akka Streams. From subjects not talked here we can spotlight a few:

Akka FSM (Finite State Machine): It allows us to implement state machines on actors solutions;
Akka HTTP: Allows us to call and expose HTTP endpoints;
Akka persistence: Allows us to implement a persistence layer on messages flowing through Akka, in order to implement better recoverability in case of failures;
Akka Event Bus: A built-in publisher-subscriber event bus inside Akka, Allows us to implement broadcasting of messages inside actor solutions.

In the references section, it is possible to find links for this and more!

Thanks

Special thanks to Iago David Santos (LinkedIn), which revised this article and pointed some things. Thanks, Iago!

Conclusion

And that concludes our article. With a great toolkit and sturdiness, Akka Streams is a great tool to be considered when coding integrations, APIs and more. Thank you for following me again on this article, until next time!

Scala: using functional programming on the JVM – part 3

Hi, dear readers! Welcome to my blog. On this post, the last on this series, we will continue to see more features from the Scala language. If you haven’t read the previous post, please go to the “programming languages” menu option to find all of the series. So, without further delay, let’s begin!

Collections

Collections, as the name implies, are data structures where we can store and organize data. There is various types of Collections that can be used on the Scala language, all common from any programming language and with all the standard behavior from their types, such as lists, sets and maps.

On the next sections, we will see the major methods that Scala offers us to work with their collections.

So, let’s fire up the Scala REPL and begin!

Filter

As the name, implies, filter can be used to filter data from a collection, generating a subset. Let’s begin by creating a List:

val mylist = List[Integer](1,2,3,4,5)

Next, we create a function that returns if a number is even:

def isEven(n:Integer) = n % 2 == 0

And finally, we used the filter function, printing on console the even numbers:

scala> mylist.filter(n => isEven(n)).foreach(println(_))

2

4

As we can see, it printed only 2 and 4 from our list, proving that our filtering was successful.

One important thing to note on this and the other methods is that none of the methods changes the original collection, they always create and returns a new one, since they are designed to work with immutables. We can check this by printing the list:

scala> print(mylist)

List(1, 2, 3, 4, 5)

Find

The find method is similar to the filter one, but instead of returning a subset, it returns only a element from the collection. The return is a optional, typed from the same type of the element type from the collection.

For this example, we will use the same collection from our previous example. If we wanted to return only the number 2 element and print on console, all we have to do is this:

scala> println(mylist.find(n => n == 2).getOrElse(0))

2

Map

The map is another common method for collections on programming languages. His objective is to take a collection and transform his elements on new elements, that could be from a different type, generating a new collection. Let’s see a example.

On our example, we will take the numbers from our previous list and create a new list, where the numbers are transformed on strings on the format “the number is x”. If we wanted to do this transformation and print the result on console, we can do the following:

scala> mylist.map(n => "the number is " + n).foreach(println(_))

the number is 1

the number is 2

the number is 3

the number is 4

the number is 5

Flatmap

Another interesting method is the flatmap. The flatmap is similar to a map, but with one difference: when used against complex objects of nested collections, this method denormalize the results, generating a flat collection. Let’s see a example.

First we create a class:

case class classA(val a: String, val b : List[String])

Then, we create a list with objects from our class:

scala> val myobjectlist = List(classA("A",List("A","B","C")),classA("B",List("A","C")),classA("C",List("C")),classA("D",List("A","B")))

myobjectlist: List[classA] = List(classA(A,List(A, B, C)), classA(B,List(A, C)), classA(C,List(C)), classA(D,List(A, B)))

Now, let’s see the result on the REPL, if we try to map our list, using only the b attribute:

scala> val mapobjectlist = myobjectlist.map(n => n.b)

mapobjectlist: List[List[String]] = List(List(A, B, C), List(A, C), List(C), List(A, B))

As we can see above, the result is a list of lists. This gives us a extra complexity to iterate over our results, since we will need to access each internal list individually in order to obtain all the elements.

Now let’s see the same result, using flatmap this time:

scala> val mapobjectflatlist = myobjectlist.flatMap(n => n.b)

mapobjectflatlist: List[String] = List(A, B, C, A, C, C, A, B)

Now, the list is flatten to a single List, allowing us to iterate over the elements much easier.

Reduce

Another useful feature when working with collections is the reduce method. With result, as on map’s case, we make a transformation on a list, but on this case, instead of generating a new collection, we aggregate the collection, generating a new value.

The simplest and easier example we can demonstrate is simply summing up the values. If we wanted to sum up all the values from our numeric list, all we need to do is this:

scala> println(mylist.reduce((sum,n) => sum+n))

15

A important thing to take note is that, on this case, the order from which the numbers will be iterate is from left to right. If we would like to explicit this ordering or reverse it, we could do this by using the reduceLeft or reduceRight methods instead.

Fold

Fold is pretty similar to the reduce method, but with a fundamental difference: while reduce obligates us that the result must be from the same type of the source elements, fold doesn’t. Let’s see a example to better understand it.

Let’s suppose that, different from our previous example, we wanted to generate a string from the numbers of our numeric collection, separated by parentheses. We can do this using the following:

scala>  val foldlistStr = mylist.fold("")((sum,n) => sum+"("+n+")")

foldlistStr: Comparable[_ >: Integer with String <: Comparable[_ >: Integer with String <: java.io.Serializable] with java.io.Serializable] with java.io.Serializable = (1)(2)(3)(4)(5)

scala> println(foldlistStr)

(1)(2)(3)(4)(5)

scala>

As we can see, on this case, we not only had to declare the folding method, but also a empty string at beginning. That it was the aggregator variable, which is then used at each iteration to form the aggregation. This is necessary in order to allow Scala to infer what it will be the type of the result of our folding operation.

Conclusion

And so we conclude our trip on the Scala language. I hope I could bring for the reader a glimpse of the language and all his power. While is not as popular as languages such as Java or C#, it is definitely a good language worthy to be considered, specially on distributed systems where it could be used with distributed tools, such as Akka.

Thank you for following me on my series, see you next time!

Scala: using functional programming on the JVM – part 2

Hi, dear readers! Welcome to my blog. On this post, we will continue to see more features from the Scala language, such as abstract classes, traits and optionals. If you haven’t read the previous post, please go to the “programming languages” menu option to find all of the series. So, without further delay, let’s begin!

Abstract classes

Abstract classes on Scala are just like in any other OO language, that is, they are classes that have methods without implementation, that must be implemented by other classes in order to be used.

On Scala, we can create a abstract class like this, for example:

abstract class MyAbstractClass {
 def methodA(str: String): Set[String]
}

On this code, we are creating a abstract class MyAbstractClass and declaring a method called methodA which has a string as parameter and returns a Set of strings.

In order to implement the class, we could have a class as follows:

class MyAbstractClassImpl extends MyAbstractClass {
 
 def methodA(str: String): Set[String] = ???

}

On this code, we are extending the abstract class – on Scala, like Java, we can’t have multiple inheritance, so we can just extend one class – and provide a empty implementation for the method, with the keyword ???. This keyword produces the equivalent on Java as when we create a method that throws a NotImplementedError. We can see this if we try to instantiate and call the method, which will give us the following output:

scala.NotImplementedError: an implementation is missing

at scala.Predef$.$qmark$qmark$qmark(Predef.scala:284)

at MyAbstractClassImpl.methodA(MyAbstractClassImpl.scala:3)

at Main$.delayedEndpoint$Main$1(Myscript.scala:17)

at Main$delayedInit$body.apply(Myscript.scala:1)

at scala.Function0.apply$mcV$sp(Function0.scala:34)

at scala.Function0.apply$mcV$sp$(Function0.scala:34)

at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

at scala.App.$anonfun$main$1$adapted(App.scala:76)

at scala.collection.immutable.List.foreach(List.scala:378)

at scala.App.main(App.scala:76)

at scala.App.main$(App.scala:74)

at Main$.main(Myscript.scala:1)

at Main.main(Myscript.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

........omitted........

On the next post on the series, we will see how Scala’s inheritance mechanisms work on more detail. For now, let’s move on to our next topic, Traits.

Traits

Traits can be thought out like interfaces. With traits, we can create several different contracts to standardize our classes, while also providing default implementations for any method that requires it – just like default methods from Java 8 onwards.

To create a trait with 2 methods, one with a implementation and one without it, we can code like this:

trait MyLogger {
 
 def logPrintln(msg: String): Unit = println(msg)

 def log(msg: String): Unit

}

On this code, we declared 2 methods that receive a string as parameter and have void returns, one with a implementation and one without it. To test multiple traits inheritance, let’s create another trait as follows:

trait MyMathLibrary {
 
 def add(a: Double, b: Double): Double = a + b

}

If we wanted our previous class to implement our traits as well, we could just change the code as follows:

class MyAbstractClassImpl extends MyAbstractClass with MyLogger with MyMathLibrary {
 
def methodA(str: String): Set[String] = Set[String]("a","b","c")

def log(msg: String): Unit = { 

 println("this log is the same as the other method")
 println(msg)

 }

}

On the code we see that we chained the traits with the with keyword. We also provided a implementation for the abstract class’s method so we don’t receive a not implemented exception anymore.

Sealed traits & classes

Another cool feature from Scala are sealed classes and traits. If we want a class or trait to be prohibited of been extended outside of their own source file, we use the keyword sealed. This is particularly useful when implementing libraries, in order to prevent users from the library from changing the behavior of the library.

To seal a class or trait, we just change like this:

sealed abstract class MyAbstractClass {
 def methodA(str: String): Set[String]
}

Now, if we try to compile our code, we will receive the following error:

MyAbstractClassImpl.scala:1: error: illegal inheritance from sealed class MyAbstractClass

class MyAbstractClassImpl extends MyAbstractClass with MyLogger with MyMathLibrary {

                                  ^

one error found

Showing that our seal was successful. To allow our class to compile again without removing the seal, the only way is moving the abstract class to the same file of the implementation, like the following:

sealed abstract class MyAbstractClass {
 def methodA(str: String): Set[String]
}

class MyAbstractClassImpl extends MyAbstractClass with MyLogger with MyMathLibrary {
 
def methodA(str: String): Set[String] = Set[String]("a","b","c")

def log(msg: String): Unit = {

println("this log is the same as the other method")
 println(msg)

}

}

If we try to compile again, we will see that now our class can compile again as normal.

Optionals

Optionals on Scala are called options. With options, we can create code that it is resilient, since we won’t need to worry about shielding our code from null values.

When working with options, we can instantiate the Option type using 2 alternatives:

Some(value): the Some keyword allows us to return a value on optionals;
None: the None keyword allow us to represent the null value, that is, the absence of value;

Also, with options, we have two ways to get a value:

get: using this method, we receive the value inside the option, or a NoSuchElementException if the value is null;

getOrElse(value): using this method, we receive the value inside the option, or the value passed by parameter if the value is null. This way, we can guarantee a default value in case the data doesn’t exist;

Let’s see a example. On our REPL, let’s create a Map:

val mymap = Map(
 ("1", "value 1"),
 ("2", "value 2")
 )

Next we get values from the map. If we try to get values that exist and don’t exist on the map with the getOrElse method, we receive this output on console:

Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_73).
Type in expressions for evaluation. Or try :help.

scala> val mymap = Map(
 | ("1", "value 1"),
 | ("2", "value 2")
 | )
mymap: scala.collection.immutable.Map[String,String] = Map(1 -> value 1, 2 -> value 2)

scala> val value1 = mymap.get("1")
value1: Option[String] = Some(value 1)

scala> val value2 = mymap.get("2")
value2: Option[String] = Some(value 2)

scala> val value3 = mymap.get("3")
value3: Option[String] = None

scala> val value4 = mymap.get("4")
value4: Option[String] = None

scala> println(value1.getOrElse("X"))
value 1

scala> println(value2.getOrElse("X"))
value 2

scala> println(value3.getOrElse("X"))
X

scala> println(value4.getOrElse("X"))
X

scala>

This shows that optionals are a viable option on dealing with optional values on the Scala language.

Error handling

As any other language, Scala also have a error handling system. Like Java, Scala also use exceptions as forms to encapsulate errors. Previously we have seen the ??? keyword and how we receive a NotImplementedError if we try to use a method with that keyword. If we wanted to explicit do what the keyword encapsulates, we could do this:

def methodA(str: String): Set[String] = throw new NotImplementedError()

We can see that it is pretty much very straightforward from anyone who has a background on Java. The catching of exceptions are very similar to Java also, like on the following code, supposing that our method throws several types of exceptions:

try {
 methodA("test")
} catch {
 case e: IOException => println("IO exception")
 case e: Exception => println("general exception")
 case _ => println("general error")
}

Of course, we also have the finally block, that could be used as follows:

try {
 methodA("test")
} catch {
 case e: IOException => println("IO exception")
 case e: Exception => println("general exception")
 case _ => println("general error")
} finally {
 println("this executes no matter what")
}

Did you notice the “_”? That keyword was used to catch not only exceptions, but also error. On Scala we have a exception hierarchy that it is pretty much very similar to his Java counterpart, with two classes, Error and Exception, that extends from a root class called Throwable.

However, there is a key difference: Scala doesn’t have checked exceptions. That means we don’t have exceptions marked on method’s signatures as throwable neither we have the obligation to catch any exceptions that are thrown by a method. This can be considered a bad thing specially when we don’t known all the details from a code we are consuming, but it gives us flexibility to catch the exceptions wherever we want to.

Inheritance on Scala

On Scala, we have 3 types of inheritance, as follows:

Invariant: invariant inheritance means that only the exact type is allowed;
Covariant: covariant inheritance means that only the exact type and their subclasses are allowed;
Contravariant: contravariant inheritance means that only the exact type and their superclasses are allowed;

When using generics on Scala we use square brackets ([]). When declaring the generic type, we could indicate if it is covariant or contravariant using the “+” and “-” symbols respectively. So, if we wanted to create a generic class to be used for a class and their subclasses, we could declare as:

class mygenericclass[+T](val id: T)

And on the opposite side, if we wanted the class to be using a class and their superclasses, we could declare as:

class mygenericclass[-T](val id: T)

On functions, however, there is a role that must be always remembered: On functions, all the parameters are contravariant, that is, they accept values from the declared type or supertypes, and the return is always covariant, in other words, it accepts values from the declared type or their subtypes.

Implicits

One last feature we will visit on this lab are implicits. With implicits, we can wrap it up classes that already exists with new features, without needing to extend or overload the original class. Even classes from the standard libraries can be wrapped this way!

Let’s see a example. On the REPL, we create a class like this:

case class myclass(val a:String, val b:String)

Now, let’s try to instantiate and use a print method on the class:

scala> val instance = new myclass("a","b")

instance: myclass = myclass(a,b)

scala> instance.print

<console>:13: error: value print is not a member of myclass

       instance.print

                ^

scala>

Of course, we got a error, since this method doesn’t exist. Now, we create a wrapper class:

implicit class myclasswrapper(mycl:myclass) { def print = println(mycl.a+mycl.b) }

Notice the implicit keyword? That means our class was created as a implicit, meaning that if we try to invoke the print method again:

scala> instance.print

ab

It will now work, as Scala is implicit converting our class to a myclasswrapper. Please note that, before Scala 2.10, we would need to create a method with the implicit keyword and make the wrapping by hand, instead of the useful declaration on the class level.

It is important to take caution, however, of not abusing of implicits, since we can change the behavior of basically everything on the language, making a application very unpredictable if the feature is overused!

Conclusion

And that concludes our second part on the Scala series. Next, on our last part, we will learn about collections and all that we can benefit from it. Thank you for your attention, until next time!

Scala: using functional programming on the JVM – part 1

Hello, dear readers! Welcome to my blog. On this post, we will talk about Scala, a powerful language that combines the object paradigm with the functional paradigm. Scala is used on several modern solutions, such as Akka.

Scala is a JVM-based language, which means that Scala programs are transformed in Java bytecode and them are run with the JVM. This guarantees that the robust JVM is used on the background, leaving us to use the rich Scala language for programming.

This is a 3-part series focused on learning the basis of the language. On this first part we will set up our environment and learn about the Scala type system, vars, vals, classes, case classes, objects, companion objects and pattern matching. On the other parts, we will learn other features such as traits, optionals, error handling, inheritance on Scala, collection-related operations such as map, folder, reduce and more. Please don’t miss out!

So, without further delay, let’s begin our journey on the Scala language!

Setting up

In order to prepare our lab environment, first we need to install Scala. You can download the last version of Scala – this lab is using Scala 2.12.1 – on this link. If you are using Mac and homebrew, the installation is as simple as running the following command:

brew install scala

In order to test the installation, run the command:

scala -version

This will print something like the following:

Scala code runner version 2.12.1 -- Copyright 2002-2016, LAMP/EPFL and Lightbend, Inc.

REPL

The REPL is a interactive shell for running Scala programs. The name stands for the sequence of operations it realizes: Read-Eval-Print-Loop. It reads information inputed by the user, evaluates the instruction, prints the result and start over (loops). In order to use the Scala REPL, all we have to do is type scala on a terminal. This will open the REPL shell, like the following snippet:

Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_73).

Type in expressions for evaluation. Or try :help.

scala>

When we are done with the REPL, all we have to do is press Crtl+C. Another way of running Scala programs is by creating Scala scripts (.scala files). When using Scala scripts, we first compile the script using the scalac command.

This hints a important thing to notice about Scala: Scala is not dynamic typed. It has some similarities in syntax with languages like Python, but we have to remember that it is static typed, as we will see on the next section.

Scala type system

As we talked before, Scala is compiled, opposed to other languages such as Python, Clojure etc. This means that when we write programs on Scala, the interpreter infers the type of a variable (immutable or not) by the type of value that it is attributed to. Let’s see this in action.

Let’s open the Scala REPL. We type var number=0 and hit enter. The following will be printed on our console:

scala> var number=0

number: Int = 0

As we can notice, the variable was defined as a integer, since we attributed a number to it. The reader could be thinking “but this is exactly like a dynamic typed language!”. It appears so at first, but here is a catch: if we try to change the variable to another type of value, this happens:

scala> number="a string"

:12: error: type mismatch;

 found   : String("a string")

 required: Int

       number="a string"

              ^
scala>

The interpreter throws a error, saying that the variable we defined previously is a integer, so we can’t change to a string, for instance. This is fundamentally different from dynamic typed languages, where we can change the type of a variable as much as we like.

This could be seen as a weak point depending on the point of view, but must be more seeing as a design choice: using a strong typed scheme, we have more security about knowing what exactly to expect from each variable in use on the system.

This is particularly important on the functional paradigm, where we normally use more immutable variables them mutable ones, as we will talk about on the next section. One last thing before we go: although we can use the interpreter inference to create the variables, we can also explicitly define the type during the creation, like with the following variable:

scala> var number2: Int = 1

number2: Int = 1

scala>

Var vs. Val

On Scala, we can declare variables using 2 keywords: var and val. The creation code on the 2 options is essentially the same, but there’s a primary difference between the 2: vars can have theirs values changed during their lifecycles, while vals can’t.

That means vals are immutable. The closest equivalent example we can have on Java code is a constant, which means that once declared, his value will never be changed again.

When working with the functional programming paradigm, essentially we use immutables most of the time. With immutables, we have the security that our functions will always behave as intended, since a function won’t change the data, making new runs with the same parameters always returns the same results.

Let’s test if vals can’t really be changed. Let’s create a string typed val, with the following code:

val mystring = "this is a string"

Then, we try to change the string. When we do this, we will receive the following:

scala> mystring = "this is a new string"

:12: error: reassignment to val

       mystring = "this is a new string"

                ^

scala>

The interpreter has complained that we are trying to change a val, proving that vals are indeed immutable.

Classes

On Scala, everything runs on a object. That’s why despite the fact that Scala allows us to develop using the functional paradigm, we can’t say that Scala is a pure functional programming language, like Haskell, for example.

On Scala’s object hierarchy, the root class for all classes is called Any. This class has 2 subclasses: AnyValue and AnyRef. AnyValue is the root class for primitive values such as integers, floats etc – all primitives on Scala are internally wrappers. AnyRef is for classes that are not primitives, like the classes we will develop on the lab, for example.

So, let’s create our first class! to do this, let’s create a file called Myclass.scala and enter the following code:

class Myclass(val myvalue1: Int, val myvalue2: String)

That’s right. All we have to do is this one line of code, and we have a complete class at our disposal! On this line, we created a class called Myclass, with 2 attributes: myvalue1 and myvalue2. Not only that, with this line we created a constructor that receives the 2 attributes as parameters and getter accessors. All of this with just one line!

The reason because Scala created the attributes to be set at object creation is because we declared the attributes as immutables. If we had declared them as vars, then Scala would have created setter accessors as well.

Since we are talking about constructors, it is important to know that we can also overload the constructor, by defining the constructor with the keyword this. For example, if we would like to have the option of a constructor that don’t need to pass the attributes, instead using default values, we could change the class like this:

class Myclass(val myvalue1: Int, val myvalue2: String) {

  def this() = this("", "")

}

Case classes

Another interesting thing about classes are case classes. With case classes, we have a class that has already coded the hashCode, equals and toString methods. How do we do this? Simple, by modifying our class as follows:

case class Myclass(val myvalue1: Int, val myvalue2: String) {

  def this() = this("", "")

}

That’s all we have to do, we just have to include the keyword case and the methods are implemented with a default implementation. That is another good example of how Scala can simplify the developer’s life.

Objects

We talked earlier about how everything on Scala are classes. However, there are cases when we want a class to have only one instance on the entire system. We commonly call this type of class Singletons. To achieve this on Scala, we declare objects.

Objects are like classes on their body, just that they can’t be instantiated, since they already are instances. Let’s create a simple Hello World script in order to learn how to create objects.

Let’s create a file called Myscript.scala. On the file, we code this:

object Myscript extends App {

print("Hello World!")

}

And then we compile with scalac Myscript.scala. When running with scala Myscript, we get the following on the console:

Hello World!%

The App that we extended with is the hint for Scala that this object is the main script for our Scala application to run. We will see more about inheritance on future parts of this series.

Companion objects

Companion objects are like the ones we just saw previously, with just one big difference: this objects must have the same name of a class, be declared on the same file of that class and they have access to attributes and methods from that class, even the private ones.

The use of companion classes could be to create factory methods. One example of this use is the case classes we saw before, that create methods such as toString for us. Internally, when we declare case classes, Scala creates a companion object for that class.

Pattern matchers

The last feature we will talk about are pattern matchers. With pattern matchers, we can run pieces of codes by case statements, similar with switch clauses on Java. Let’s see a example.

We will use the Myclass class we created earlier. Let’s suppose we have a scenario where we want to perform a different print depending on the value of the myvalue1 attribute and print the value itself if it doesn’t fit on any of the clauses. We can do this by coding the following:

object Myscript extends App {

case class Myclass(val myvalue1: Int, val myvalue2: String)

val myclass = new Myclass(1,"Myvalue2")

val result = myclass match {
 case Myclass(1, _) => "this is value 1"
 case Myclass(2, _) => "this is value 2"
 case m => s"$m"
 }

print(result)

}

On the code above, we stated that if we have a class with the value 1 as first attribute – the second one is defined with the “_” keyword, which means that we are accepting any value for that attribute – we output the string “this is value 1”, the string “this is value 2” for the 2 value and we will output the values from the class itself for any other value. If we run the code above, we will receive this message on the terminal:

this is value 1%

Showing that our code is correct. One important thing to notice, due to good practices recommended for Scala, is that when using pattern matchers, when you get the content from the variable been matched – the case of our last clause – always use lower-case only names. That is because when declaring the name starting with a upper-case letter, the Scala interpreter will try to find a variable with that name, instead of creating a new one. So, always remember to use lower-case variables on this cases.

Conclusion

And that concludes our first trip to the Scala language. On our next parts, we will see more interesting features of the language, such as traits, inheritance and optionals. Stay tuned!

Thanks you for your attention, until next time.