I'm implementing elasticsearch for storing log-data from different sources. Its log-data from windows dhcp an nps/radius servers but also syslog data from different sources. I have a logstash instance that filters the data and generates a lot of different fields for each source.
Now I have to decide to put the data into one common index or into different indices. At the moment I have the following indices (using ilm):
filebeat-dhcp-{now/d}-00001
filebeat-nps-{now/d}-00001
filebeat-...….
logstash-%{[vendor]}-00001
It would be less work for me to use one common index or only one filebeat and one logstash
index. But this "common" index will contain a large number of fields. Is this a problem for elasticsearch performance? Also I had problems with the 1024-field-limit when upgrading from 6 to 7 so - less fields would be preferable?
Using ILM and thinking about your index management are definitely very much on the right track and what we'd recommend doing in such a situation.
How many fields are you expecting the common index to have, roughly? Generally I wouldn't have said more fields is such a big problem. I'd have expected 10-20 new fields per log type (dhcp, nps, etc.), with a number of common fields like timestamp. But hitting the 1024 limit is on a different level entirely.
I'd have expected 10-20 new fields per log type (dhcp, nps, etc.), with a number of common fields like timestamp. But hitting the 1024 limit is on a different level entirely.
Yes, it will be about 6-10 types with 10-20 fields. The 1024-problem occured, because I used the preloaded filebeat template - which got larger when I bulk-changed fieldnames.
So - for my setup I should:
use ILM
use my own index template (not one that comes with the installation)
With this I can use one common index for all my logging-stuff. Right?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.