Guidance on Using Multiple Indexes vs One Index for Time Series Data from Multiple Sources

nik9000 · October 23, 2015, 5:55pm

Like @otisg says, types might be the right thing here. Its what they are for. Each has a fixed cost in heap and cluster state maintenance time for types can help here. The drawbacks come around scoring (IDF information is shared), aggregations (types can hide sparseness and make the doc values less efficient), and types (if two devices send data with the same name but a different type then you'll have a bad time). If you can live with these things then I think you should look at types.

Sparseness is the biggest problem I can think of - I don't know the doc value data structures super well but think of it like this: they pick the number of bits they use to represent integers for an entire segment. If you have two devices, one that only stores data in the 0-255 range and another that stores in the 0-64k range and you mix them into one segment then the whole segment will need two bytes per integer rather than the one. But if you only use these value for searching (not sorting or scripts or aggregation) then this shouldn't come up.

You might have some success with the total_shards_per_node setting - its a bit touchy because its a hard limit, but it could be useful to make sure that the shards you are actively writing to are more evenly spread out.

In general you'll get more IO load on indexing with one large index than four smaller indexes. But your milage may vary. Its complex.

Topic		Replies	Views
Indexing different type logs Elasticsearch	2	357	May 13, 2018
One large index vs. many smaller indexes Elasticsearch	5	10614	July 6, 2017
ES indexing strategy Elasticsearch	4	3085	July 5, 2017
One index vs multiple indexes? Elasticsearch	7	4967	February 26, 2019
To multiple index or not to multiple index...that is the question Elasticsearch	11	1102	February 28, 2017

Guidance on Using Multiple Indexes vs One Index for Time Series Data from Multiple Sources

Related topics