Ingest index recommendations

Wayne_Taylor · May 10, 2017, 2:22am

Dear ES Team,

I have now got a few months worth of data into ES via Logstash.

Some of the volumes are small e.g. several hundred MB per day. Whereas some are bigger (still not large) e.g. 10-15Gb per day.

Based upon the volumes is it better I use one bigger index e.g. by month vs YYYY.MM.DD? or just one overall big index. I did see some posts and also this after some searching:

Seems like the recommendation here is to us LS to merge the daily indexes into one monthly index.

Looking for some guidance on recommendations here.

Thanks
Wayne

theuntergeek · May 10, 2017, 2:38am

Curator actually does do this now, since Elasticsearch introduced the reindex API.

Wayne_Taylor · May 10, 2017, 2:40am

Thanks @theuntergeek. In terms of my question on the change from index by YYYY.MM.DD should i do this especially for the smaller indexes or leave this and have curator handle it?

theuntergeek · May 10, 2017, 3:26am

I would perhaps rethink the need for daily indices and use the Rollover API (which is also supported in Curator now) to only "rollover" indices when they have hit a certain number of documents and/or a month/week/number of days in age. This approach could reduce the increase in shard count associated with a lot of small, daily indices.

Wayne_Taylor · May 18, 2017, 2:47pm

@theuntergeek tried approach - got questions and wasn't sure which forum to post into:

Added an alias:

POST /_aliases
{
"actions" : [
{ "add" : { "index" : "myindex-*", "alias" : "purchases" } }
]
}

My action yml file:

actions:
1:
action: rollover
description: rollover purchase logs
options:
name: purchases
conditions:
max_age: 90d
extra_settings:
index.number_of_shards: 3
index.number_of_replicas: 1
timeout_override:
continue_if_exception: False
disable_action: False

Message from curator:

✘ wtaylor@wtaylor-mbp  ~/Downloads  curator --config curator.yml --dry-run rollover.yml

2017-05-18 09:38:57,565 INFO Preparing Action ID: 1, "rollover"
2017-05-18 09:38:57,693 INFO Trying Action ID: 1, "rollover": rollover purchase logs
2017-05-18 09:38:57,731 ERROR "alias" must only reference one index: {u'myindex-2017.05.08': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.09': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.06': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.07': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.04': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.05': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.02': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.03': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.01': {u'aliases': {u'purchases': {}}}, u'myindex-2017.04.30': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.11': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.10': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.13': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.12': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.15': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.14': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.17': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.16': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.18': {u'aliases': {u'purchases': {}}}, u'myindex-2017.04.01': {u'aliases': {u'purchases': {}}}, u'myindex-2017.04.29': {u'aliases': {u'purchases': {}}}, u'myindex-2017.04.28': {u'aliases': {u'purchases': {}}}, u'myindex-2017.04.27': {u'aliases': {u'purchases': {}}}}
2017-05-18 09:38:57,731 ERROR Failed to complete action: rollover. <type 'exceptions.ValueError'>: Unable to perform index rollover with alias "purchases". See previous logs for more details.

theuntergeek · May 18, 2017, 3:10pm

Rollover aliases must be a 1:1 match. The official documentation has this example of a rollover capable index + alias:

PUT /logs-000001 
{
  "aliases": {
    "logs_write": {}
  }
}

# Add > 1000 documents to logs-000001

POST /logs_write/_rollover 
{
  "conditions": {
    "max_age":   "7d",
    "max_docs":  1000
  }
}

The index has to end in an incrementable number.

Wayne_Taylor · May 18, 2017, 3:25pm

hi @theuntergeek understood, but as i mentioned in my original approach I put indexes by day and the recommendation was to use rollover. So now i am confused. Trying to understand how i will solve this for my historical data.

My approach going forward is to put into one big index and then use rollover

Wayne

theuntergeek · May 18, 2017, 3:47pm

How do you query your historical data now? If you only use kibana, then you should be able to have your index pattern defined in a way that matches your historical data and a newer rollover-friendly pattern.

You could keep rules in Curator that will slowly purge out your historical data as the indices are currently named without it hurting anything else.

Wayne_Taylor · May 19, 2017, 11:15am

@theuntergeek because the timeseries volume is low we do keep several months available and yes they're actively been used. Its mainly used for us to look at trending in timelion for historical patterns.

Understanding that right now data is ingested by LS to ES and done daily. I could change this to one big fat index and then rollover that. How do i move all the old documents though?

Seems like best suggestion maybe to use reindex API via curator. I'll give that a whirl.

Wayne

system · June 16, 2017, 11:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sharding by time Elasticsearch	16	1497	July 6, 2017
Rollover Index based on timestamp field Elasticsearch	8	1440	September 6, 2019
Rollover daily indexes Elasticsearch	7	3304	May 30, 2017
Decrease "Real time" latency for large indices Elasticsearch	9	399	July 6, 2017
Deleting the 30 day olderdata from the elastic search Elasticsearch	8	863	July 16, 2018

Ingest index recommendations

Related topics