Curator: Implementing purge routines on your Elasticsearch cluster

Standard

Hi, dear readers! Welcome to my blog. On this post, we will learn how to use the Curator project to create purge routines on a Elasticsearch cluster.

When we have a cluster crunching logs and other data types from our systems, it is necessary to configure process that manages this data, doing actions like purges and backups. For this purpose, the Curator project comes in handy.

Curator is a Python tool, that allows several types of actions. On this post, we will focus on 2 actions, purge and backup. To install Curator, we can use pip, like the command bellow:

sudo pip install elasticsearch-curator

Once installed, let’s begin preparing our cluster to make the backups, by a backup repository. A backup repository is a Elasticsearch feature, that process backups and save them on a persistent store. On this case, we will configure the backups to be stored on a Amazon S3 bucket. First, let’s install AWS Cloud plugin for Elasticsearch, by running the following command on each of the cluster’s nodes:

bin/plugin install cloud-aws

And before we restart our nodes, we configure the AWS credentials for the cluster to connect to AWS, by configuring them on the elasticsearch.yml file:

cloud:
  aws:
    access_key: <access key>
    secret_key: <secret key>

Finally, let’s configure our backup repository, using Elasticsearch REST API:

PUT /_snapshot/elasticsearch_backups
{
 “type”: “s3”,
 “settings”: {
 “bucket”: “elastic-bckup”,
 “region”: “us-east-1”
 }
}

On the command above, we created a new backup repository, called “elasticsearch-backups”, also defining the bucket where the backups will be created. With our repository created, let’s create our YAMLs to configure Curator.

The first YAML is “curator-config.yml”, where we configure details such as the cluster address. A configuration example could be as follows:

client:
  hosts:
    — localhost
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  aws_key:
  aws_secret_key:
  aws_region:
  ssl_no_validate: False
  http_auth:
  timeout: 240
  master_only: False
logging:
  loglevel: INFO
  logfile:
  logformat: default
  blacklist: [‘elasticsearch’, ‘urllib3’]

The other YAML is “curator-action.yml”, where we configure a action list to be executed by Curator. On the example, we have indexes of data from Twitter, with the prefix “twitter”, where we first create a backup from indexes that are more then 2 days old and after the backup, we purge the data:

actions:
 1:
   action: snapshot
   description: >-
     Make backups of indices older then 2 days.
   options:
     repository: elasticsearch_backups
     name: twitter-%Y.%m.%d
     ignore_unavailable: False
     include_global_state: True
     partial: False
     wait_for_completion: True
     skip_repo_fs_check: False
     timeout_override:
     continue_if_exception: False
     disable_action: False
   filters:
   — filtertype: age
     source: creation_date
     direction: older
     unit: days
     unit_count: 2
     exclude:
  2:
    action: delete_indices
    description: >-
      Delete indices older than 2 days (based on index name).
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    — filtertype: pattern
      kind: prefix
      value: twitter-
      exclude:
    — filtertype: age
      source: name
      direction: older
      timestring: ‘%Y.%m.%d’
      unit: days
      unit_count: 2
      exclude:

With the YAMLs configured, we can execute Curator, with the following command:

curator — config curator-config.yml curator-action.yml

The command will generate a log from the actions performed, showing that our configurations were a success:

2016–08–27 16:14:36,576 INFO Action #1: snapshot
2016–08–27 16:14:40,814 INFO Creating snapshot “twitter-2016.08.27” from indices: [u’twitter-2016.08.14', u’twitter-2016.08.25']
2016–08–27 16:15:34,725 INFO Snapshot twitter-2016.08.27 successfully completed.
2016–08–27 16:15:34,725 INFO Action #1: completed
2016–08–27 16:15:34,725 INFO Action #2: delete_indices
2016–08–27 16:15:34,769 INFO Deleting selected indices: [u’twitter-2016.08.14', u’twitter-2016.08.25']
2016–08–27 16:15:34,769 INFO — -deleting index twitter-2016.08.14
2016–08–27 16:15:34,769 INFO — -deleting index twitter-2016.08.25
2016–08–27 16:15:34,860 INFO Action #2: completed
2016–08–27 16:15:34,861 INFO Job completed.

That’s it! Now it is just schedule this script to execute from time to time – once per day, for example – and we will have automated backups and purges.

Thank you for following me on this post, until next time.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.