Hi, dear readers! Welcome to my blog. On this post, we will learn how to use the Curator project to create purge routines on a Elasticsearch cluster.
When we have a cluster crunching logs and other data types from our systems, it is necessary to configure process that manages this data, doing actions like purges and backups. For this purpose, the Curator project comes in handy.
Curator is a Python tool, that allows several types of actions. On this post, we will focus on 2 actions, purge and backup. To install Curator, we can use pip, like the command bellow:
sudo pip install elasticsearch-curator
Once installed, let’s begin preparing our cluster to make the backups, by a backup repository. A backup repository is a Elasticsearch feature, that process backups and save them on a persistent store. On this case, we will configure the backups to be stored on a Amazon S3 bucket. First, let’s install AWS Cloud plugin for Elasticsearch, by running the following command on each of the cluster’s nodes:
bin/plugin install cloud-aws
And before we restart our nodes, we configure the AWS credentials for the cluster to connect to AWS, by configuring them on the elasticsearch.yml file:
cloud: aws: access_key: <access key> secret_key: <secret key>
Finally, let’s configure our backup repository, using Elasticsearch REST API:
PUT /_snapshot/elasticsearch_backups { “type”: “s3”, “settings”: { “bucket”: “elastic-bckup”, “region”: “us-east-1” } }
On the command above, we created a new backup repository, called “elasticsearch-backups”, also defining the bucket where the backups will be created. With our repository created, let’s create our YAMLs to configure Curator.
The first YAML is “curator-config.yml”, where we configure details such as the cluster address. A configuration example could be as follows:
client: hosts: — localhost port: 9200 url_prefix: use_ssl: False certificate: client_cert: client_key: aws_key: aws_secret_key: aws_region: ssl_no_validate: False http_auth: timeout: 240 master_only: False logging: loglevel: INFO logfile: logformat: default blacklist: [‘elasticsearch’, ‘urllib3’]
The other YAML is “curator-action.yml”, where we configure a action list to be executed by Curator. On the example, we have indexes of data from Twitter, with the prefix “twitter”, where we first create a backup from indexes that are more then 2 days old and after the backup, we purge the data:
actions: 1: action: snapshot description: >- Make backups of indices older then 2 days. options: repository: elasticsearch_backups name: twitter-%Y.%m.%d ignore_unavailable: False include_global_state: True partial: False wait_for_completion: True skip_repo_fs_check: False timeout_override: continue_if_exception: False disable_action: False filters: — filtertype: age source: creation_date direction: older unit: days unit_count: 2 exclude: 2: action: delete_indices description: >- Delete indices older than 2 days (based on index name). options: ignore_empty_list: True timeout_override: continue_if_exception: False disable_action: False filters: — filtertype: pattern kind: prefix value: twitter- exclude: — filtertype: age source: name direction: older timestring: ‘%Y.%m.%d’ unit: days unit_count: 2 exclude:
With the YAMLs configured, we can execute Curator, with the following command:
curator — config curator-config.yml curator-action.yml
The command will generate a log from the actions performed, showing that our configurations were a success:
2016–08–27 16:14:36,576 INFO Action #1: snapshot 2016–08–27 16:14:40,814 INFO Creating snapshot “twitter-2016.08.27” from indices: [u’twitter-2016.08.14', u’twitter-2016.08.25'] 2016–08–27 16:15:34,725 INFO Snapshot twitter-2016.08.27 successfully completed. 2016–08–27 16:15:34,725 INFO Action #1: completed 2016–08–27 16:15:34,725 INFO Action #2: delete_indices 2016–08–27 16:15:34,769 INFO Deleting selected indices: [u’twitter-2016.08.14', u’twitter-2016.08.25'] 2016–08–27 16:15:34,769 INFO — -deleting index twitter-2016.08.14 2016–08–27 16:15:34,769 INFO — -deleting index twitter-2016.08.25 2016–08–27 16:15:34,860 INFO Action #2: completed 2016–08–27 16:15:34,861 INFO Job completed.
That’s it! Now it is just schedule this script to execute from time to time – once per day, for example – and we will have automated backups and purges.
Thank you for following me on this post, until next time.