I have the problem that we regularly run into the threshold of 1000 fields per per index. Can I prevent this from happening by setting just a few fields I really need to analyzed and all others to not_analyzed?
This means I have no way of putting a large number of different fields into Elasticsearch without having overboarding metadata?
I talked to the developers and they told me, they need all the data in Elasticsearch for "manual" review in Kibana, but they only need very few fields to be searchable or graphable.
In this special project developers have to log each and every variable with its value from their software. This is a very special project where policies like this can not be changed. This policy is so fixed that even Elastic Stack might be fully replaced with another tool if that might be able to allow logging all data. I really don't hope that this happens.
I didn't want to increase the limit because when I allow for, say, 1500 fields, next months they will hit that limit again, and so on. Therefore I want to have a way which scales to have a lot more than 1000 fields per index.
Basically it adds to the cluster state, which is stored in memory. So a large cluster state can potentially cause excess memory use.
It also increases cardinality, aka the diversity of the data. This means that lucene needs more resources to store the same amount of data and Elasticsearch needs more resources to query it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.