Question about the sparseness

Hello.

My environment is elasticsearch 5.4.0 .

I have done some studying about types by reading below two articles.

and thought using 1 type / index rather than n types / index has more benefits in terms of less downside. So in my environment I am thinking of creating index with flat mapping definition and consolidate the fields for various log types. Splitting every type of logs into multiple index costs too many shards per node for me.

But I came up with a question will the search and aggregation performance same in below cases? Flattening the mapping breaks the context of a log but I would like to get advice if setting field value explicitly saves sparseness.

For example,

Case1 ( having type 1 and type 2 in a index)

Index : A

Logs include below fields
log A
F1 : a
F2 : b

log B
F1 : a
F3 : c

Index A will be
docA (type1)
F1 : a
F2 : b
F3 : nil

docB (type2)
F1 : a
F2 : nil
F3 : c

Case2 (type1 only)

Index : A

Logs include below fields
logA
F1 : a
F2 : b

logB
F1 : a
F3 : c

Index A will be

docA (type1)
F1 : a
F2 : b
F3 : "Not Defined" <-Explicitly set a string

docB(type1)
F1 : a
F2 : "Not Defined" <-Explicitly set a string
F3 : c

Thank you for reading.

It's the same as 1 index with 50 shards versus 50 indices with 1 shard each. ES does the same amount of work.

Why does it break the context?

Ultimately sparseness doesn't apply to field mappings, it applies to the values in those fields. Splitting things out into their own index may or may not increase sparseness, it's hard to say. However we have done a lot of work in lucene and ES to improve handling of sparse values so I am not sure if this type change is really going to cause a negative impact.

@warkolm

Thank you for the reply. You are right it does not break the context.

However, I also found the github thread that type will be deprecated in the future release.

I guess this will be another reason I should use 1 type per index.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.