Scala: using functional programming on the JVM – part 2

Standard

Hi, dear readers! Welcome to my blog. On this post, we will continue to see more features from the Scala language, such as abstract classes, traits and optionals. If you haven’t read the previous post, please go to the “programming languages” menu option to find all of the series. So, without further delay, let’s begin!

Abstract classes

Abstract classes on Scala are just like in any other OO language, that is, they are classes that have methods without implementation, that must be implemented by other classes in order to be used.

On Scala, we can create a abstract class like this, for example:

abstract class MyAbstractClass {
 def methodA(str: String): Set[String]
}

On this code, we are creating a abstract class MyAbstractClass and declaring a method called methodA which has a string as parameter and returns a Set of strings.

In order to implement the class, we could have a class as follows:

class MyAbstractClassImpl extends MyAbstractClass {
 
 def methodA(str: String): Set[String] = ???

}

On this code, we are extending the abstract class – on Scala, like Java, we can’t have multiple inheritance, so we can just extend one class – and provide a empty implementation for the method, with the keyword ???. This keyword produces the equivalent on Java as when we create a method that throws a NotImplementedError. We can see this if we try to instantiate and call the method, which will give us the following output:

scala.NotImplementedError: an implementation is missing

at scala.Predef$.$qmark$qmark$qmark(Predef.scala:284)

at MyAbstractClassImpl.methodA(MyAbstractClassImpl.scala:3)

at Main$.delayedEndpoint$Main$1(Myscript.scala:17)

at Main$delayedInit$body.apply(Myscript.scala:1)

at scala.Function0.apply$mcV$sp(Function0.scala:34)

at scala.Function0.apply$mcV$sp$(Function0.scala:34)

at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

at scala.App.$anonfun$main$1$adapted(App.scala:76)

at scala.collection.immutable.List.foreach(List.scala:378)

at scala.App.main(App.scala:76)

at scala.App.main$(App.scala:74)

at Main$.main(Myscript.scala:1)

at Main.main(Myscript.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

........omitted........

On the next post on the series, we will see how Scala’s inheritance mechanisms work on more detail. For now, let’s move on to our next topic, Traits.

Traits

Traits can be thought out like interfaces. With traits, we can create several different contracts to standardize our classes, while also providing default implementations for any method that requires it – just like default methods from Java 8 onwards.

To create a trait with 2 methods, one with a implementation and one without it, we can code like this:

trait MyLogger {
 
 def logPrintln(msg: String): Unit = println(msg)

 def log(msg: String): Unit

}

On this code, we declared 2 methods that receive a string as parameter and have void returns, one with a implementation and one without it. To test multiple traits inheritance, let’s create another trait as follows:

trait MyMathLibrary {
 
 def add(a: Double, b: Double): Double = a + b

}

If we wanted our previous class to implement our traits as well, we could just change the code as follows:

class MyAbstractClassImpl extends MyAbstractClass with MyLogger with MyMathLibrary {
 
def methodA(str: String): Set[String] = Set[String]("a","b","c")

def log(msg: String): Unit = { 

 println("this log is the same as the other method")
 println(msg)

 }

}

On the code we see that we chained the traits with the with keyword. We also provided a implementation for the abstract class’s method so we don’t receive a not implemented exception anymore.

 

Sealed traits & classes

Another cool feature from Scala are sealed classes and traits. If we want a class or trait to be prohibited of been extended outside of their own source file, we use the keyword sealed. This is particularly useful when implementing libraries, in order to prevent users from the library from changing the behavior of the library.

To seal a class or trait, we just change like this:

sealed abstract class MyAbstractClass {
 def methodA(str: String): Set[String]
}

Now, if we try to compile our code, we will receive the following error:

MyAbstractClassImpl.scala:1: error: illegal inheritance from sealed class MyAbstractClass

class MyAbstractClassImpl extends MyAbstractClass with MyLogger with MyMathLibrary {

                                  ^

one error found

Showing that our seal was successful. To allow our class to compile again without removing the seal, the only way is moving the abstract class to the same file of the implementation, like the following:

sealed abstract class MyAbstractClass {
 def methodA(str: String): Set[String]
}

class MyAbstractClassImpl extends MyAbstractClass with MyLogger with MyMathLibrary {
 
def methodA(str: String): Set[String] = Set[String]("a","b","c")

def log(msg: String): Unit = {

println("this log is the same as the other method")
 println(msg)

}

}

If we try to compile again, we will see that now our class can compile again as normal.

Optionals

Optionals on Scala are called options. With options, we can create code that it is resilient, since we won’t need to worry about shielding our code from null values.

When working with options, we can instantiate the Option type using 2 alternatives:

  • Some(value): the Some keyword allows us to return a value on optionals;
  • None: the None keyword allow us to represent the null value, that is, the absence of value;

Also, with options, we have two ways to get a value:

  • get: using this method, we receive the value inside the option, or a NoSuchElementException if the value is null;
  • getOrElse(value): using this method, we receive the value inside the option, or the value passed by parameter if the value is null. This way, we can guarantee a default value in case the data doesn’t exist;

Let’s see a example. On our REPL, let’s create a Map:

val mymap = Map(
 ("1", "value 1"),
 ("2", "value 2")
 )

Next we get values from the map. If we try to get values that exist and don’t exist on the map with the getOrElse method, we receive this output on console:

Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_73).
Type in expressions for evaluation. Or try :help.

scala> val mymap = Map(
 | ("1", "value 1"),
 | ("2", "value 2")
 | )
mymap: scala.collection.immutable.Map[String,String] = Map(1 -> value 1, 2 -> value 2)

scala> val value1 = mymap.get("1")
value1: Option[String] = Some(value 1)

scala> val value2 = mymap.get("2")
value2: Option[String] = Some(value 2)

scala> val value3 = mymap.get("3")
value3: Option[String] = None

scala> val value4 = mymap.get("4")
value4: Option[String] = None

scala> println(value1.getOrElse("X"))
value 1

scala> println(value2.getOrElse("X"))
value 2

scala> println(value3.getOrElse("X"))
X

scala> println(value4.getOrElse("X"))
X

scala>

This shows that optionals are a viable option on dealing with optional values on the Scala language.

Error handling

As any other language, Scala also have a error handling system. Like Java, Scala also use exceptions as forms to encapsulate errors. Previously we have seen the ??? keyword and how we receive a NotImplementedError if we try to use a method with that keyword. If we wanted to explicit do what the keyword encapsulates, we could do this:

def methodA(str: String): Set[String] = throw new NotImplementedError()

We can see that it is pretty much very straightforward from anyone who has a background on Java. The catching of exceptions are very similar to Java also, like on the following code, supposing that our method throws several types of exceptions:

try {
 methodA("test")
} catch {
 case e: IOException => println("IO exception")
 case e: Exception => println("general exception")
 case _ => println("general error")
}

Of course, we also have the finally block, that could be used as follows:

try {
 methodA("test")
} catch {
 case e: IOException => println("IO exception")
 case e: Exception => println("general exception")
 case _ => println("general error")
} finally {
 println("this executes no matter what")
}

Did you notice the “_”? That keyword was used to catch not only exceptions, but also error. On Scala we have a exception hierarchy that it is pretty much very similar to his Java counterpart, with two classes, Error and Exception, that extends from a root class called Throwable.

However, there is a key difference: Scala doesn’t have checked exceptions. That means we don’t have exceptions marked on method’s signatures as throwable neither we have the obligation to catch any exceptions that are thrown by a method. This can be considered a bad thing specially when we don’t known all the details from a code we are consuming, but it gives us flexibility to catch the exceptions wherever we want to.

Inheritance on Scala

On Scala, we have 3 types of inheritance, as follows:

  • Invariant: invariant inheritance means that only the exact type is allowed;
  • Covariant: covariant inheritance means that only the exact type and their subclasses are allowed;
  • Contravariant: contravariant inheritance means that only the exact type and their superclasses are allowed;

When using generics on Scala we use square brackets ([]).  When declaring the generic type, we could indicate if it is covariant or contravariant using the “+” and “-” symbols respectively. So, if we wanted to create a generic class to be used for a class and their subclasses, we could declare as:

class mygenericclass[+T](val id: T)

And on the opposite side, if we wanted the class to be using a class and their superclasses, we could declare as:

class mygenericclass[-T](val id: T)

On functions, however, there is a role that must be always remembered: On functions, all the parameters are contravariant, that is, they accept values from the declared type or supertypes, and the return is always covariant, in other words, it accepts values from the declared type or their subtypes.

Implicits

One last feature we will visit on this lab are implicits. With implicits, we can wrap it up classes that already exists with new features, without needing to extend or overload the original class. Even classes from the standard libraries can be wrapped this way!

Let’s see a example. On the REPL, we create a class like this:

case class myclass(val a:String, val b:String)

Now, let’s try to instantiate and use a print method on the class:

scala> val instance = new myclass("a","b")

instance: myclass = myclass(a,b)

scala> instance.print

<console>:13: error: value print is not a member of myclass

       instance.print

                ^

scala>

Of course, we got a error, since this method doesn’t exist. Now, we create a wrapper class:

implicit class myclasswrapper(mycl:myclass) { def print = println(mycl.a+mycl.b) }

 

Notice the implicit keyword? That means our class was created as a implicit, meaning that if we try to invoke the print method again:

scala> instance.print

ab

It will now work, as Scala is implicit converting our class to a myclasswrapper. Please note that, before Scala 2.10, we would need to create a method with the implicit keyword and make the wrapping by hand, instead of the useful declaration on the class level.

It is important to take caution, however, of not abusing of implicits, since we can change the behavior of basically everything on the language, making a application very unpredictable if the feature is overused!

Conclusion

And that concludes our second part on the Scala series. Next, on our last part, we will learn about collections and all that we can benefit from it. Thank you for your attention, until next time!

Scala: using functional programming on the JVM – part 1

Standard

Hello, dear readers! Welcome to my blog. On this post, we will talk about Scala, a powerful language that combines the object paradigm with the functional paradigm. Scala is used on several modern solutions, such as Akka.

Scala is a JVM-based language, which means that Scala programs are transformed in Java bytecode and them are run with the JVM. This guarantees that the robust JVM is used on the background, leaving us to use the rich Scala language for programming.

This is a 3-part series focused on learning the basis of the language. On this first part we will set up our environment and learn about the Scala type system, vars, vals, classes, case classes, objects, companion objects and pattern matching. On the other parts, we will learn other features such as traits, optionals, error handling, inheritance on Scala, collection-related operations such as map, folder, reduce and more. Please don’t miss out!

So, without further delay, let’s begin our journey on the Scala language!

Setting up

In order to prepare our lab environment, first we need to install Scala. You can download the last version of Scala – this lab is using Scala 2.12.1 – on this link. If you are using Mac and homebrew, the installation is as simple as running the following command:

brew install scala

In order to test the installation, run the command:

scala -version

This will print something like the following:

Scala code runner version 2.12.1 -- Copyright 2002-2016, LAMP/EPFL and Lightbend, Inc.

REPL

The REPL is a interactive shell for running Scala programs. The name stands for the sequence of operations it realizes: Read-Eval-Print-Loop. It reads information inputed by the user, evaluates the instruction, prints the result and start over (loops). In order to use the Scala REPL, all we have to do is type scala on a terminal. This will open the REPL shell, like the following snippet:

Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_73).

Type in expressions for evaluation. Or try :help.

scala>

When we are done with the REPL, all we have to do is press Crtl+C. Another way of running Scala programs is by creating Scala scripts (.scala files). When using Scala scripts, we first compile the script using the scalac command.

This hints a important thing to notice about Scala: Scala is not dynamic typed. It has some similarities in syntax with languages like Python, but we have to remember that it is static typed, as we will see on the next section.

Scala type system

As we talked before, Scala is compiled, opposed to other languages such as Python, Clojure etc. This means that when we write programs on Scala, the interpreter infers the type of a variable (immutable or not) by the type of value that it is attributed to. Let’s see this in action.

Let’s open the Scala REPL. We type var number=0 and hit enter. The following will be printed on our console:

scala> var number=0

number: Int = 0

As we can notice, the variable was defined as a integer, since we attributed a number to it. The reader could be thinking “but this is exactly like a dynamic typed language!”. It appears so at first, but here is a catch: if we try to change the variable to another type of value, this happens:

scala> number="a string"

:12: error: type mismatch;

 found   : String("a string")

 required: Int

       number="a string"

              ^
scala>

The interpreter throws a error, saying that the variable we defined previously is a integer, so we can’t change to a string, for instance. This is fundamentally different from dynamic typed languages, where we can change the type of a variable as much as we like.

This could be seen as a weak point depending on the point of view, but must be more seeing as a design choice: using a strong typed scheme, we have more security about knowing what exactly to expect from each variable in use on the system.

This is particularly important on the functional paradigm, where we normally use more immutable variables them mutable ones, as we will talk about on the next section. One last thing before we go: although we can use the interpreter inference to create the variables, we can also explicitly define the type during the creation, like with the following variable:

scala> var number2: Int = 1

number2: Int = 1

scala>

Var vs. Val

On Scala, we can declare variables using 2 keywords: var and val. The creation code on the 2 options is essentially the same, but there’s a primary difference between the 2: vars can have theirs values changed during their lifecycles, while vals can’t.

That means vals are immutable. The closest equivalent example we can have on Java code is a constant, which means that once declared, his value will never be changed again.

When working with the functional programming paradigm, essentially we use immutables most of the time. With immutables, we have the security that our functions will always behave as intended, since a function won’t change the data, making new runs with the same parameters always returns the same results.

Let’s test if vals can’t really be changed. Let’s create a string typed val, with the following code:

val mystring = "this is a string"

Then, we try to change the string. When we do this, we will receive the following:

scala> mystring = "this is a new string"

:12: error: reassignment to val

       mystring = "this is a new string"

                ^

scala>

The interpreter has complained that we are trying to change a val, proving that vals are indeed immutable.

Classes

On Scala, everything runs on a object. That’s why despite the fact that Scala allows us to develop using the functional paradigm, we can’t say that Scala is a pure functional programming language, like Haskell, for example.

On Scala’s object hierarchy, the root class for all classes is called Any. This class has 2 subclasses: AnyValue and AnyRef. AnyValue is the root class for primitive values such as integers, floats etc – all primitives on Scala are internally wrappers. AnyRef is for classes that are not primitives, like the classes we will develop on the lab, for example.

So, let’s create our first class! to do this, let’s create a file called Myclass.scala and enter the following code:

class Myclass(val myvalue1: Int, val myvalue2: String)

That’s right. All we have to do is this one line of code, and we have a complete class at our disposal! On this line, we created a class called Myclass, with 2 attributes: myvalue1 and myvalue2. Not only that, with this line we created a constructor that receives the 2 attributes as parameters and getter accessors. All of this with just one line!

The reason because Scala created the attributes to be set at object creation is because we declared the attributes as immutables. If we had declared them as vars, then Scala would have created setter accessors as well.

Since we are talking about constructors, it is important to know that we can also overload the constructor, by defining the constructor with the keyword this. For example, if we would like to have the option of a constructor that don’t need to pass the attributes, instead using default values, we could change the class like this:

class Myclass(val myvalue1: Int, val myvalue2: String) {

  def this() = this("", "")

}

Case classes

Another interesting thing about classes are case classes. With case classes, we have a class that has already coded the hashCode, equals and toString methods. How do we do this? Simple, by modifying our class as follows:

case class Myclass(val myvalue1: Int, val myvalue2: String) {

  def this() = this("", "")

}

That’s all we have to do, we just have to include the keyword case and the methods are implemented with a default implementation. That is another good example of how Scala can simplify the developer’s life.

Objects

We talked earlier about how everything on Scala are classes. However, there are cases when we want a class to have only one instance on the entire system. We commonly call this type of class Singletons. To achieve this on Scala, we declare objects.

Objects are like classes on their body, just that they can’t be instantiated, since they already are instances. Let’s create a simple Hello World script in order to learn how to create objects.

Let’s create a file called Myscript.scala. On the file, we code this:

object Myscript extends App {

print("Hello World!")

}

And then we compile with scalac Myscript.scala. When running with scala Myscript, we get the following on the console:

Hello World!%

The App that we extended with is the hint for Scala that this object is the main script for our Scala application to run. We will see more about inheritance on future parts of this series.

Companion objects

Companion objects are like the ones we just saw previously, with just one big difference: this objects must have the same name of a class, be declared on the same file of that class and they have access to attributes and methods from that class, even the private ones.

The use of companion classes could be to create factory methods. One example of this use is the case classes we saw before, that create methods such as toString for us. Internally, when we declare case classes, Scala creates a companion object for that class.

Pattern matchers

The last feature we will talk about are pattern matchers. With pattern matchers, we can run pieces of codes by case statements, similar with switch clauses on Java. Let’s see a example.

We will use the Myclass class we created earlier. Let’s suppose we have a scenario where we want to perform a different print depending on the value of the myvalue1 attribute and print the value itself if it doesn’t fit on any of the clauses. We can do this by coding the following:

object Myscript extends App {

case class Myclass(val myvalue1: Int, val myvalue2: String)

val myclass = new Myclass(1,"Myvalue2")

val result = myclass match {
 case Myclass(1, _) => "this is value 1"
 case Myclass(2, _) => "this is value 2"
 case m => s"$m"
 }

print(result)

}

On the code above, we stated that if we have a class with the value 1 as first attribute – the second one is defined with the “_” keyword, which means that we are accepting any value for that attribute – we output the string “this is value 1”, the string “this is value 2” for the 2 value and we will output the values from the class itself for any other value. If we run the code above, we will receive this message on the terminal:

this is value 1%

Showing that our code is correct. One important thing to notice, due to good practices recommended for Scala, is that when using pattern matchers, when you get the content from the variable been matched – the case of our last clause – always use lower-case only names. That is because when declaring the name starting with a upper-case letter, the Scala interpreter will try to find a variable with that name, instead of creating a new one. So, always remember to use lower-case variables on this cases.

Conclusion

And that concludes our first trip to the Scala language. On our next parts, we will see more interesting features of the language, such as traits, inheritance and optionals. Stay tuned!

Thanks you for your attention, until next time.

Refactoring:improving the design of existing code (book review)

Standard

Hi, Dear readers! Welcome to my blog. On this post, I will review a famous book of Martin Fowler, which focus on a refactoring techniques. But after all, why refactoring matters?

Definition

According to Fowler, a refactoring consists of modifying code in order to improve his readability and capacity to change, without changing his behavior. When refactoring, our objectives is to make the code easier to be read by humans and also improving his structure and design, making changes motivated by business rules easier to implement. Other benefits are that a cleaner code makes it easier to spot bugs, alongside fastening the development of new code on top of a well organized production code.

When refactor?

Fowler defends that refactoring should be done on 3 situations:

  • When you add a new functionality;
  • When you find a bug;
  • When you do a code review;

On this situations, you are forced to make changes on the code structure, making ideal situations for refactoring.

Pitfalls

When refactoring, there is some common pitfalls that could hinder the refactoring. The most common ones are the databases and the interfaces from the code.

Database schemas could be hard to change, specially if the database is old with millions of rows. This produces a splash effect on the code that manipulates the database, making more difficult to make changes on the code. Also the interfaces (APIs, libraries or even a single class inside a component) could be a challenge to refactoring, since a change on a interface could cascade to a change on lots of client’s code.

In order to solve this problem, the better approach for databases is to isolate the database’s logic on his own layer, allowing the “dirtier” code to be evolved in a more controlled manner. As for the interfaces issue, the better approach is to allow the old and new interfaces to coexist, while a migration work is conducted.

When not refactor?

According to Fowler, there is one situation when you shouldn’t refactor: when the code is so bad, that it is better to be written from scratch. This is a difficult rule to be measured as to when the code is bad enough to be rewritten. Some good hints could be if the code is infested by bugs or if it is identified that it has so much refactoring points, that fixing it up could end up rewritten most of the code.

Refactoring and performance

Sometimes, when refactoring, we could incur on refactorings that cause some performance degradation. Of course that it is up to the business to measure up how much this degradation is unbearable to meet the requirements, but as a general rule, we can assume that a more organized code is a easier code to fine tune. So, if we refactor first and improve his readability and design, it will be easier to make a performance tune later.

Unit testings

Another key point defended by Fowler is the need to develop unit tests for the code. With unit tests, we can develop refactorings in small steps (“baby steps”), receiving rapid feedback from the tests, so if anything breaks we can easily and fast make fixes during the refactoring process.

Refactoring catalog

Here there are some brief descriptions of some of the refactoring patterns that I found more interesting. Complete descriptions with examples can be found on Fowler’s book, that you can find on the links at the end of this post.

Extract Method

This refactoring consists of taking some code that can be grouped together and extract the code as his own method, this way improving readability.

Introduce Explaining Variable

This refactoring consists of taking a big and complex conditional and simplifying by turning his operators onto variables, this way making the conditional more self explanatory.

Replace Method with Method Object

This refactoring consists of a situation when you have a method that is better to have some code extracted to his own method, but it refers to a lot of variables that hinders the operation. On this case, this refactoring applies, consisting of taking all the variables and the method and moving to a new object, making a easier environment to make the extract method refactoring.

Move Method

This refactoring consists of moving a method from one class to another. This makes sense when the old class has less uses for his own method then the class he is moved to.

Extract Class

This refactoring consists of when you have a class that it is doing work that could be better organized if divided in two. On this case, we move the common behavior and data (methods and fields) that could form a new class and move it, making a delegation from the old one.

Remove Middle Man

This refactoring consists of when a class has lots of delegating methods to another class, which introduces unnecessary code. On this case, we create a accessor to the instance of the object itself, making it so the callers can call the methods from the class themselves, so after creating the accessor we remove all delegating methods.

Consolidate Conditional Expression

This refactoring consists of when you have several conditionals that returns the same value. On this case, we refactor the conditionals by creating a single one, commonly by creating a method, making the code more clear and simple.

Remove Control Flag

This refactoring consists of when you have a conditional flag that controls the behavior inside a loop. By using control commands like break and continue, we can remove the control flag, simplifying the code.

Replace Conditional with Polymorphism

This refactoring consists of when you have a method on a class that has conditional behavior depending on the type of the object. On this case, we extract each leg of the conditional and create a subclass around the different behaviors, until the method turns out to be empty, in which case we turn the method to abstract on the now superclass of the hierarchy. We may have to change the constructor of the class on a factory method.

Introduce Nul Object

This refactoring consists of when we have various null checks for data on the callers of a object. On this case, we create a object to represent null, that returns all the default data that should be used when the data was null. This way, we don’t have to make checks for null on the callers anymore, since the behavior on the null object will cover the check’s circumstances.

Preserve Whole Object

This refactoring consists of when you have a method call that is preceded by calling several data accessors to get lots of data from other object to pass as parameters for the call. On this case, we change the method to pass the object itself, removing the calls from the data accessors by moving them to inside the method itself. This refactoring not only simplifies the code on the caller’s side, but also simplifies changes if the method needs more data from the object passed by parameter on the future.

Replace Constructor with Factory Method

This refactoring consists of when we want to include more behavior on a constructor then it normally has it. This is specially true on class hierarchies, where the object construction must reflect the corresponding subclass depending on the type of the object. On this case, we change the constructors to a more restrictive access (private or protected)  and create a static factory method at the top of the hierarchy, allowing the dynamic creation of the objects.

Replace Error Code with Exception

This refactoring comes in handy when we have code that returns error codes when something breaks. Error codes are common on languages such as Unix and C, but on Java, we have a much more powerful tool: exceptions. With exceptions, we can easily separate the code that fix the errors from the normal code. So in this case, our best approach is to change the error code’s return to exception throws, which make a much more readable and organized structure.

Replace Exception with Test

This refactoring occurs when we have a code on a try-catch block, that has on the catch block some code that could be moved to be performed before the error occurs. This is typically found when we have predicted errors that can occur on some cases, but we use the catch block as part of our program’s logic. By changing the logic on the catch block to a test (if) before the code that breaks, we can remove the try -catch altogether, making a better readable and consistent code, that doesn’t rely on errors to work.

Extract Subclass

This refactoring consists of when we have some methods and fields that are used only on some instances of the class. On this case, we move the methods and fields to a new subclass, where they could be better organized and maintained.

Extract Superclass

This refactoring is opposed to the previous one, since we create a superclass instead of a subclass. If we have identified common behavior from two different classes, we create a superclass with the common behavior from the two and make both of them subclasses from the created superclass.

Form Template Method

This refactoring consists of when we have two classes that has methods with equal or very similar logic, that needs to be called on a certain order. On this case, we equalize the interfaces of the methods of both classes to be equal and creates a superclass where we move the common methods from both classes and create a orchestration method for the order of the calls. This improves the code on reusability and hierarchy organization.

Conclusion

And so we conclude our introduction to Martin Fowler’s Refactoring book. With good didactic and good examples, the book is a must read that I highly recommend!

Thank you for following me on this post, until next time.

Buy the book now!

Curator: Implementing purge routines on your Elasticsearch cluster

Standard

Hi, dear readers! Welcome to my blog. On this post, we will learn how to use the Curator project to create purge routines on a Elasticsearch cluster.

When we have a cluster crunching logs and other data types from our systems, it is necessary to configure process that manages this data, doing actions like purges and backups. For this purpose, the Curator project comes in handy.

Curator is a Python tool, that allows several types of actions. On this post, we will focus on 2 actions, purge and backup. To install Curator, we can use pip, like the command bellow:

sudo pip install elasticsearch-curator

Once installed, let’s begin preparing our cluster to make the backups, by a backup repository. A backup repository is a Elasticsearch feature, that process backups and save them on a persistent store. On this case, we will configure the backups to be stored on a Amazon S3 bucket. First, let’s install AWS Cloud plugin for Elasticsearch, by running the following command on each of the cluster’s nodes:

bin/plugin install cloud-aws

And before we restart our nodes, we configure the AWS credentials for the cluster to connect to AWS, by configuring them on the elasticsearch.yml file:

cloud:
  aws:
    access_key: <access key>
    secret_key: <secret key>

Finally, let’s configure our backup repository, using Elasticsearch REST API:

PUT /_snapshot/elasticsearch_backups
{
 “type”: “s3”,
 “settings”: {
 “bucket”: “elastic-bckup”,
 “region”: “us-east-1”
 }
}

On the command above, we created a new backup repository, called “elasticsearch-backups”, also defining the bucket where the backups will be created. With our repository created, let’s create our YAMLs to configure Curator.

The first YAML is “curator-config.yml”, where we configure details such as the cluster address. A configuration example could be as follows:

client:
  hosts:
    — localhost
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  aws_key:
  aws_secret_key:
  aws_region:
  ssl_no_validate: False
  http_auth:
  timeout: 240
  master_only: False
logging:
  loglevel: INFO
  logfile:
  logformat: default
  blacklist: [‘elasticsearch’, ‘urllib3’]

The other YAML is “curator-action.yml”, where we configure a action list to be executed by Curator. On the example, we have indexes of data from Twitter, with the prefix “twitter”, where we first create a backup from indexes that are more then 2 days old and after the backup, we purge the data:

actions:
 1:
   action: snapshot
   description: >-
     Make backups of indices older then 2 days.
   options:
     repository: elasticsearch_backups
     name: twitter-%Y.%m.%d
     ignore_unavailable: False
     include_global_state: True
     partial: False
     wait_for_completion: True
     skip_repo_fs_check: False
     timeout_override:
     continue_if_exception: False
     disable_action: False
   filters:
   — filtertype: age
     source: creation_date
     direction: older
     unit: days
     unit_count: 2
     exclude:
  2:
    action: delete_indices
    description: >-
      Delete indices older than 2 days (based on index name).
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    — filtertype: pattern
      kind: prefix
      value: twitter-
      exclude:
    — filtertype: age
      source: name
      direction: older
      timestring: ‘%Y.%m.%d’
      unit: days
      unit_count: 2
      exclude:

With the YAMLs configured, we can execute Curator, with the following command:

curator — config curator-config.yml curator-action.yml

The command will generate a log from the actions performed, showing that our configurations were a success:

2016–08–27 16:14:36,576 INFO Action #1: snapshot
2016–08–27 16:14:40,814 INFO Creating snapshot “twitter-2016.08.27” from indices: [u’twitter-2016.08.14', u’twitter-2016.08.25']
2016–08–27 16:15:34,725 INFO Snapshot twitter-2016.08.27 successfully completed.
2016–08–27 16:15:34,725 INFO Action #1: completed
2016–08–27 16:15:34,725 INFO Action #2: delete_indices
2016–08–27 16:15:34,769 INFO Deleting selected indices: [u’twitter-2016.08.14', u’twitter-2016.08.25']
2016–08–27 16:15:34,769 INFO — -deleting index twitter-2016.08.14
2016–08–27 16:15:34,769 INFO — -deleting index twitter-2016.08.25
2016–08–27 16:15:34,860 INFO Action #2: completed
2016–08–27 16:15:34,861 INFO Job completed.

That’s it! Now it is just schedule this script to execute from time to time – once per day, for example – and we will have automated backups and purges.

Thank you for following me on this post, until next time.

Elastalert: implementing rich monitoring with Elasticsearch

Standard

Hi, dear readers! Welcome to my blog. On this post, we will take a tour on a open source project developed by Yelp, called Elastalert. Focused on enriching Elasticsearch’s role as a monitoring tool, it allow us to query Elasticsearch, sending alerts to different types of tools, such as e-mail boxes, Telegram chats, JIRA issues and more. So, without further delay, let’s go deep on the tool!

Set up

In order to set up Elastalert, we need to clone the project’s Git repository and install it with Python. If the reader doesn’t have Python or Git installed, I recommend following the instructions here for Python and here for Git. For this tutorial, I am using a Unix OS, but the instructions are similar for other environments such as Linux. Also, on this tutorial I am using virtulenv, in order to keep my Python interpreter “clean”. The reader can find instructions to install virtualenv here.

To display the alerts, we will use a Telegram channel, which will receive alerts sent by a Telegram bot. In order to prepare the bot, we need a Telegram account and use the Bot Father (@BotFather) to create the bot, then create a public channel on telegram and associate the bot on the channel’s admins. The instructions to make this configurations can be found here. In order to easy the steps for the reader, I leave the bot created for this lab (@elastalerthandson) published for anyone who wants to use this bot on his own telegram channels for testing!

With all the tools installed and ready, let’s begin by cloning the Elastalert Git repository. To do this, we run the following command, on the folder of our choice:

git clone https://github.com/Yelp/elastalert.git

After running the command, we will see that a folder called “elastalert” was created. Before we proceed, we will also create a virtualenv environment, where we will install Elastalert. We do this by running:

virtualenv virtualenvelastalert

After creating the virtual environment – which will create a folder called “virtualenvelastalert” -, we need to activate it before we proceed with the install. To do this, we run the following command, assuming the reader is on the same folder of the previous command:

source virtualenvelastalert/bin/activate

After activating, we will notice that the name of our virtual environment is now written as a prefix on the shell, meaning that it is activated. Now, to install elastalert, we navigate to the folder created previously by our git clone command and type the following:

python setup.py install
sudo pip install -r requirements.txt

That’s it! Now that we have Elastalert installed, let’s continue the setup by creating the Elasticsearch index that it will be used as a metadata repository by Elastalert.

Creating the metadata index

In order to create Elastalert’s index, we run the command:

elastalert-create-index

The command-line tool will ask us some settings such as the name we want for the index and the ip/port of our Elasticsearch’s cluster. After providing the settings, the tool will create the index, like we can see on the picture bellow:Creating the configuration files

All the configuration on Elastalert is made by YAML files. The main configuration file for the tool is called by default as “config.yaml” and is located on the same folder where we start Elastalert – which we will do in some moments. For our main configuration file, let’s create a file called “config.yaml” like the following:

rules_folder: rules_folder

run_every:
  seconds: 40

buffer_time:
  minutes: 15

es_host: 192.168.99.100

es_port: 9200

writeback_index: elastalert_status

alert_time_limit:
  days: 2

On the config above, we defined:

  • The rules_folder property which defines the folder where our rules will be (all YAML files on the folder will be processed);
  • The run_every property will make Elastalert to run all the rules on a 40 seconds frequency;
  • The buffer_time property will make Elastalert cache the last period of time defined by the range of the property. This approach is used when the queries made on Elasticsearch are not on real time data;
  • The host ip of the Elasticsearch’s node used to query the alerts;
  • The host port of the Elasticsearch’s node used to query the alerts;
  • The index used to store the metadata, that we created on the previous section;
  • The maximum period of time Elastalert will hold a alert that the delivery has failed, making retries during the period;

Now, let’s create the “rules_folder” folder and create 3 YAML files, which will hold our rules:

  • twitter_flatline.yaml
  • twitter_frequency.yaml
  • twitter-blacklist.yaml

On this rules, we will test 3 types of rules Elastalert can manage:

  • The flatline rule, which will alert when the number of documents find for a search drop bellow a threshold;
  • The frequency rule, which will alert when a number of documents for a certain period of time is reached;
  • The blacklist rule, which will alert when any document containing a list of words is found on the timeframe collected by the tool;

Of course, there’s other rule types alongside those that we will cover on this lab, like the spike rule that can detect abnormal grows or shrinks on data across a time period, or the whitelist rule, which alert on any documents that contain any words from a list. More information about rules and their types can be found at the references on the end of this post.

For this lab, we will use a elasticsearch index with twitter data. The reader can found more information about how to set up a ELK environment on my ELK series. The Logstash configuration file used on this lab is as follows:

input {
      twitter {
        consumer_key => "XXXXXXXXXXXXXXXXX"
        consumer_secret => "XXXXXXXXXXXXXXXX"
        keywords => ["coca cola","java","elasticsearch","amazon"]
        oauth_token => "XXXXXXXXXXXXXXX"
        oauth_token_secret => "XXXXXXXXXXXXXX"
    }
}



 output {
      stdout { codec => rubydebug }
      elasticsearch {
            hosts => [ "192.168.99.100:9200" ]
            index => "twitter-%{+YYYY.MM.dd}"
        }
}

With our ELK stack set up and running, let’s begin creating the rules. First, we create the frequency rule, by configuring the respective YAML file with the following code:

name: Twitter frequency rule

type: frequency

index: twitter-*

num_events: 3

timeframe:
  minutes: 15

realert:
  hours: 2

filter:
- query:
   query_string:
    query: "message:amazon"


alert:
- "telegram"

telegram_bot_token: 184186982:AAGpJRyWQ2Rb_RcFXncGrJrBrSK7BzoVFU8

telegram_room_id: "@elastalerthandson"

On the following file we configure a frequency rule. The rule is configured by setting the following properties:

  • name: This property defines the rule’s name. This property acts as the rule ID;
  • type: This property defines the type of rule we are creating;
  • index: This property defines the index on Elasticsearch where we want to make the searches;
  • num_events: The number of documents necessary to be found in order to fire the alert;
  • timeframe: The time period which will be queried to check the rule;
  • realert: This property defines the time period that Elastalert will stop realerting the rule after the first match, preventing the users to be flooded with alerts;
  • filter: This property is where we configure the query that will be send to Elasticsearch in order to check the rule;
  • alert: This property is a list of targets which we want our alerts to be send. On our case, we just defined the telegram target;
  • telegram_bot_token: On this property we set the access token from our bot, as received by the Botfather;
  • telegram_room_id: On this property we define the id of the channel we want the alerts to be sent;

As we can see, is a very straightforward and simple configuration file. For the flatline config, we configure our respective YAML as follows:

name: Twitter flatline rule

type: flatline

index: twitter-*

threshold: 30

timeframe: 
 minutes: 5

realert:
 minutes: 30

use_count_query: true

doc_type: logs

alert:
- "telegram"

telegram_bot_token: 184186982:AAGpJRyWQ2Rb_RcFXncGrJrBrSK7BzoVFU8

telegram_room_id: "@elastalerthandson"

The configuration is pretty much the same of the previous file, with the exception of 3 new properties:

  • threshold: This property defines the minimum amount of documents expected for the rule to receive in order that a alert is not needed to be sent;
  • use_count_query: This property defines that Elastalert must use the count API from Elasticsearch. This API returns just the number of documents for the rule to be validated, eliminating the need to process the query data;
  • doc_type: This property is needed by the count API aforementioned, in order to query the document count for a specific document type;

Finally, let’s configure our final rule, coding the final YAML as follows:

name: Twitter blacklist rule

type: blacklist

index: twitter-*

compare_key: message

blacklist:
- "android"
- "java"

realert:
  hours: 4

filter:
- query:
   query_string:
    query: "*"

alert:
- "telegram"

telegram_bot_token: 184186982:AAGpJRyWQ2Rb_RcFXncGrJrBrSK7BzoVFU8

telegram_room_id: "@elastalerthandson"

On this file, the new properties that we needed to configure are:

  • compare_key: This property defines the field on the documents that Elastalert will check the blacklist;
  • blacklist: This property is a list of words which Elastalert will compare against the documents in order to check if any document has a blacklisted word;

And that concludes our configuration. Now, let’s run Elastalert!

Running Elastalert

To run Elastalert, all we need to do is run a command like this, on the same folder of our YAML structure – where “config.yaml” is located:

elastalert --start NOW --verbose

On the command above, we set the flag “–start” to define that we want Elastalert to start the measurings from now up and the “–verbose” flag to print all the info log messages.

The simplest of the rules to test it out is the flatline rule. All we have to do is wait for about 5 minutes with Elasticsearch running and Logstash stopped – so no documents are streaming. After the wait, we can see on our channel that a alert is received on the channel:

And, as the time passes, we will receive other alerts as well, like the frequency alert:

Conclusion

And so we conclude our tutorial about Elastalert. With a simple usage, we can see that we can construct really powerful alerts for our Elasticsearch system, enforcing the rule of the search engine on a monitoring ecosystem. Thank you for following me on another post, until next time.

Continue reading

Thoughtworks technology radar is out!

Standard

Hi, dear readers! Welcome to my blog. On this post, I want to bring your attention to the new technology radar from Thoughtworks, which just came out. With interesting insights like the rising of open source tools, the enormous explosion on container’s adoptions and applications based on the reactive programming model, the radar is a good source of information to keep up to date with the IT world.

The radar can be found on:

https://www.thoughtworks.com/pt/radar

Elasticsearch: Consumindo dados real-time com ELK

Standard

Hi, dear readers! Welcome to my blog. On this post, I am happy to share with you another accomplishment to my career. It was just released from the publisher “Casa do código” a new book called “Elasticsearch: Consumindo dados real-time com ELK”, which is proudly written by me. The book is a much more complete overview then the one from my series about ELK on this blog (see part 1, part 2 and final) and, as the preview suggest, a dissection of the 3 major tools of the Elastic toolset, Elasticsearch, Kibana and Logstash. On this book, you will find:

  • A simple and didactic explanation about each of the tools, with a practical case of using the ELK stack to monitor the logs from some Spring Boot’s microservices;
  • A deep explanation about using elasticsearch’s advanced indexing features and his powerful search capabilities. This covers another practical examples as well, such as consuming a twitter public stream and indexing the stream with Elasticsearch;
  • A full chapter dedicated to Elasticsearch’s administration, teaching how to purge, backup, monitor and configure another aspects like security on a cluster;

The book is currently been sold as a ebook, but there’s plans to sell the book in printed version as well, by the publisher’s web store. If you liked the content, it would be a great honour for me if you buy my book! Thank you.

The book can be acquired on the following link:

https://www.casadocodigo.com.br/products/livro-elasticsearch