I am looking for a writeup on when/how Elasticsearch decides to create new indices.
I went to list all indexes on my single node ELK stack server expecting to find exactly two: the one being fed by Logstash and the one created by Kibana. What I found was literally dozens of logstash indexes split by date, all open, and with no real rhyme nor reason to when they were created. (the kibana one was there also so at least I got that part right).
Where can I read up on this so I can stop it (or at least control it) ?
The only time Elasticsearch will create an index is if it is asked to do so.
That may be from explicit request via a mapping, or implicitly if a create request is made for an index that doesn't exist.
All yellow (expected since I am single node)
All open (?)
All from logstash (they all follow the same naming convention "logstash-date")
All pri 5
All rep 1
Doc count is all over the place - as small as 11k, as large as 27k
Store size all over the place - 74mb to 105mb
OK so logstash is causing them to be created. Presumably each index is holding a unique subset of the overall data I actually ingest. Since the names don't give me any context clues, how do I determine which data ends up in which index? I can tell you I didn't actively ask for any indices to be created. Whatever happened was triggered by logstash at a date AFTER I plugged in the pipeline.
Better yet, how can I tell logstash to give me an index name that is actually meaningful?
The obvious guess is that a new index request happens every time the Logstash thread is restarted. Is there a way to tell Logstash NOT to ask for a new index and instead feed into the latest current index?
I am happy to post my pipeline, but it isn't very exciting.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.