Ingest index recommendations


(Wayne Taylor) #1

Dear ES Team,

I have now got a few months worth of data into ES via Logstash.

Some of the volumes are small e.g. several hundred MB per day. Whereas some are bigger (still not large) e.g. 10-15Gb per day.

Based upon the volumes is it better I use one bigger index e.g. by month vs YYYY.MM.DD? or just one overall big index. I did see some posts and also this after some searching:

Seems like the recommendation here is to us LS to merge the daily indexes into one monthly index.

Looking for some guidance on recommendations here.

Thanks
Wayne


(Aaron Mildenstein) #2

Curator actually does do this now, since Elasticsearch introduced the reindex API.


(Wayne Taylor) #3

Thanks @theuntergeek. In terms of my question on the change from index by YYYY.MM.DD should i do this especially for the smaller indexes or leave this and have curator handle it?


(Aaron Mildenstein) #4

I would perhaps rethink the need for daily indices and use the Rollover API (which is also supported in Curator now) to only "rollover" indices when they have hit a certain number of documents and/or a month/week/number of days in age. This approach could reduce the increase in shard count associated with a lot of small, daily indices.


(Wayne Taylor) #5

@theuntergeek tried approach - got questions and wasn't sure which forum to post into:

Added an alias:

POST /_aliases
{
"actions" : [
{ "add" : { "index" : "myindex-*", "alias" : "purchases" } }
]
}

My action yml file:

actions:
1:
action: rollover
description: rollover purchase logs
options:
name: purchases
conditions:
max_age: 90d
extra_settings:
index.number_of_shards: 3
index.number_of_replicas: 1
timeout_override:
continue_if_exception: False
disable_action: False

Message from curator:

✘ wtaylor@wtaylor-mbp  ~/Downloads  curator --config curator.yml --dry-run rollover.yml

2017-05-18 09:38:57,565 INFO Preparing Action ID: 1, "rollover"
2017-05-18 09:38:57,693 INFO Trying Action ID: 1, "rollover": rollover purchase logs
2017-05-18 09:38:57,731 ERROR "alias" must only reference one index: {u'myindex-2017.05.08': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.09': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.06': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.07': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.04': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.05': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.02': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.03': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.01': {u'aliases': {u'purchases': {}}}, u'myindex-2017.04.30': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.11': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.10': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.13': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.12': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.15': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.14': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.17': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.16': {u'aliases': {u'purchases': {}}}, u'myindex-2017.05.18': {u'aliases': {u'purchases': {}}}, u'myindex-2017.04.01': {u'aliases': {u'purchases': {}}}, u'myindex-2017.04.29': {u'aliases': {u'purchases': {}}}, u'myindex-2017.04.28': {u'aliases': {u'purchases': {}}}, u'myindex-2017.04.27': {u'aliases': {u'purchases': {}}}}
2017-05-18 09:38:57,731 ERROR Failed to complete action: rollover. <type 'exceptions.ValueError'>: Unable to perform index rollover with alias "purchases". See previous logs for more details.


(Aaron Mildenstein) #6

Rollover aliases must be a 1:1 match. The official documentation has this example of a rollover capable index + alias:

PUT /logs-000001 
{
  "aliases": {
    "logs_write": {}
  }
}

# Add > 1000 documents to logs-000001

POST /logs_write/_rollover 
{
  "conditions": {
    "max_age":   "7d",
    "max_docs":  1000
  }
}

The index has to end in an incrementable number.


(Wayne Taylor) #7

hi @theuntergeek understood, but as i mentioned in my original approach I put indexes by day and the recommendation was to use rollover. So now i am confused. Trying to understand how i will solve this for my historical data.

My approach going forward is to put into one big index and then use rollover

Wayne


(Aaron Mildenstein) #8

How do you query your historical data now? If you only use kibana, then you should be able to have your index pattern defined in a way that matches your historical data and a newer rollover-friendly pattern.

You could keep rules in Curator that will slowly purge out your historical data as the indices are currently named without it hurting anything else.


(Wayne Taylor) #9

@theuntergeek because the timeseries volume is low we do keep several months available and yes they're actively been used. Its mainly used for us to look at trending in timelion for historical patterns.

Understanding that right now data is ingested by LS to ES and done daily. I could change this to one big fat index and then rollover that. How do i move all the old documents though?

Seems like best suggestion maybe to use reindex API via curator. I'll give that a whirl.

Wayne


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.