Sparse index

Spacefish · August 1, 2017, 11:51am

Hello,

i have a ES Index with aprox. 10 Million documents. It is running on a three node cluster, with 6gb Java Heap per node..

when i use this index in kibana i see considerate slowdown, it takes up to 20 seconds for a count aggregation above all documents for example..

There is no I/O Bottleneck but i see 100% CPU usage on all the nodes for the time of the query..

The data is really sparse.. We have approx. 8000 different fields but only 20-30 are used per document..
Documents share 8 common fields and there are are some 100 different document "types" regarding the combination of the rest of the "sparse" fields (not types in ES)..

I already disabled norms for these fields, but document_ids are required..

I thought about creating some 100 different indices (one index per document "type") and using a myindex-* in kibana, but i guess this will be even slower!

We already tried creating a index with the 8 common fields + 2 fields called "name" and "value".. But this is pretty unhandy when plotting the data as you don´t get a nice dropdown in kibana (with 8000 rows in our case) to select the field you are interested in.. Furthermore the relation between different values belonging to the same "document" is lost.
Using different queries for different series in the same chart in kibana is complicated/unhandy as well..

What´s the general suggestion to index such "sparse" data / preprocess it? The Index per "Type" approach?

jpountz · August 1, 2017, 12:43pm

I would move the types that have the most documents to their own index, and keep the long tail of types that have contained numbers of documents in a shared index.

Spacefish · August 1, 2017, 12:51pm

that´s a good idea, thanks!

system · August 29, 2017, 12:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Poor query performance after removing types for ES6 Elasticsearch	1	558	November 15, 2017
ES bulk index performance Elasticsearch	1	365	March 4, 2020
7 seconds to index document once i get close to 2million documents Elasticsearch	4	758	April 1, 2018
Trouble Handling Large Volume data - Slow Kibana Dashboard Elasticsearch	5	1089	July 24, 2019
Kibana very slow Elastic Cloud Enterprise (ECE)	9	12870	July 4, 2017

Sparse index

Related topics