Hi, dear readers! Welcome to my blog. On this post, we will learn how to use the Curator project to create purge routines on a Elasticsearch cluster.
When we have a cluster crunching logs and other data types from our systems, it is necessary to configure process that manages this data, doing actions like purges and backups. For this purpose, the Curator project comes in handy.
Curator is a Python tool, that allows several types of actions. On this post, we will focus on 2 actions, purge and backup. To install Curator, we can use pip, like the command bellow:
sudo pip install elasticsearch-curator
Once installed, let’s begin preparing our cluster to make the backups, by a backup repository. A backup repository is a Elasticsearch feature, that process backups and save them on a persistent store. On this case, we will configure the backups to be stored on a Amazon S3 bucket. First, let’s install AWS Cloud plugin for Elasticsearch, by running the following command on each of the cluster’s nodes:
bin/plugin install cloud-aws
And before we restart our nodes, we configure the AWS credentials for the cluster to connect to AWS, by configuring them on the elasticsearch.yml file:
cloud:
aws:
access_key: <access key>
secret_key: <secret key>
Finally, let’s configure our backup repository, using Elasticsearch REST API:
PUT /_snapshot/elasticsearch_backups
{
“type”: “s3”,
“settings”: {
“bucket”: “elastic-bckup”,
“region”: “us-east-1”
}
}
On the command above, we created a new backup repository, called “elasticsearch-backups”, also defining the bucket where the backups will be created. With our repository created, let’s create our YAMLs to configure Curator.
The first YAML is “curator-config.yml”, where we configure details such as the cluster address. A configuration example could be as follows:
client:
hosts:
— localhost
port: 9200
url_prefix:
use_ssl: False
certificate:
client_cert:
client_key:
aws_key:
aws_secret_key:
aws_region:
ssl_no_validate: False
http_auth:
timeout: 240
master_only: False
logging:
loglevel: INFO
logfile:
logformat: default
blacklist: [‘elasticsearch’, ‘urllib3’]
The other YAML is “curator-action.yml”, where we configure a action list to be executed by Curator. On the example, we have indexes of data from Twitter, with the prefix “twitter”, where we first create a backup from indexes that are more then 2 days old and after the backup, we purge the data:
actions:
1:
action: snapshot
description: >-
Make backups of indices older then 2 days.
options:
repository: elasticsearch_backups
name: twitter-%Y.%m.%d
ignore_unavailable: False
include_global_state: True
partial: False
wait_for_completion: True
skip_repo_fs_check: False
timeout_override:
continue_if_exception: False
disable_action: False
filters:
— filtertype: age
source: creation_date
direction: older
unit: days
unit_count: 2
exclude:
2:
action: delete_indices
description: >-
Delete indices older than 2 days (based on index name).
options:
ignore_empty_list: True
timeout_override:
continue_if_exception: False
disable_action: False
filters:
— filtertype: pattern
kind: prefix
value: twitter-
exclude:
— filtertype: age
source: name
direction: older
timestring: ‘%Y.%m.%d’
unit: days
unit_count: 2
exclude:
With the YAMLs configured, we can execute Curator, with the following command:
curator — config curator-config.yml curator-action.yml
The command will generate a log from the actions performed, showing that our configurations were a success:
2016–08–27 16:14:36,576 INFO Action #1: snapshot 2016–08–27 16:14:40,814 INFO Creating snapshot “twitter-2016.08.27” from indices: [u’twitter-2016.08.14', u’twitter-2016.08.25'] 2016–08–27 16:15:34,725 INFO Snapshot twitter-2016.08.27 successfully completed. 2016–08–27 16:15:34,725 INFO Action #1: completed 2016–08–27 16:15:34,725 INFO Action #2: delete_indices 2016–08–27 16:15:34,769 INFO Deleting selected indices: [u’twitter-2016.08.14', u’twitter-2016.08.25'] 2016–08–27 16:15:34,769 INFO — -deleting index twitter-2016.08.14 2016–08–27 16:15:34,769 INFO — -deleting index twitter-2016.08.25 2016–08–27 16:15:34,860 INFO Action #2: completed 2016–08–27 16:15:34,861 INFO Job completed.
That’s it! Now it is just schedule this script to execute from time to time – once per day, for example – and we will have automated backups and purges.
Thank you for following me on this post, until next time.