I have done some studying about types by reading below two articles.
and thought using 1 type / index rather than n types / index has more benefits in terms of less downside. So in my environment I am thinking of creating index with flat mapping definition and consolidate the fields for various log types. Splitting every type of logs into multiple index costs too many shards per node for me.
But I came up with a question will the search and aggregation performance same in below cases? Flattening the mapping breaks the context of a log but I would like to get advice if setting field value explicitly saves sparseness.
For example,
Case1 ( having type 1 and type 2 in a index)
Index : A
Logs include below fields
log A
F1 : a
F2 : b
log B
F1 : a
F3 : c
Index A will be
docA (type1)
F1 : a
F2 : b
F3 : nil
docB (type2)
F1 : a
F2 : nil
F3 : c
Case2 (type1 only)
Index : A
Logs include below fields
logA
F1 : a
F2 : b
logB
F1 : a
F3 : c
Index A will be
docA (type1)
F1 : a
F2 : b
F3 : "Not Defined" <-Explicitly set a string
docB(type1)
F1 : a
F2 : "Not Defined" <-Explicitly set a string
F3 : c
It's the same as 1 index with 50 shards versus 50 indices with 1 shard each. ES does the same amount of work.
Why does it break the context?
Ultimately sparseness doesn't apply to field mappings, it applies to the values in those fields. Splitting things out into their own index may or may not increase sparseness, it's hard to say. However we have done a lot of work in lucene and ES to improve handling of sparse values so I am not sure if this type change is really going to cause a negative impact.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.