Is sparse data on not indexed field affect on ES performance?

majid_zarrin · December 21, 2017, 8:49am

I have sparse data based on a filed named type.
assume

if (type = A ) fields x and y appears
if(type=B) fields w and z appears.
if(type=C) no extra fields appears ( about 70% of data ~ 2 billion data )

I know that sparsity affects on Elastic search performance.
I planned to overcome this problem by setting sparse fields index to false and create another index for indexing them.
is my strategy true?
thank you.

jpountz · December 21, 2017, 2:40pm

Spasity affects mostly:

norms, which are on by default on indexed text fields
doc values, which are on by default on keyword, date, ip and numeric fields

I don't think it will help, then your other index will be sparse?
If you really need to support sparse data, I'd suggest that you use Elasticsearch 6.x which has much better support for sparsity.

animageofmine · January 9, 2018, 6:43pm

@jpountz What is the cost of sparsity? Is it disk space only assuming norms and doc_values are enabled? Or does this consume heap space as well?

Meaning, if there are too many sparse fields across documents in an index, would it consume additional disk space compared to the same index with dense data?

system · February 6, 2018, 6:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sparse Documents with "store" fields set to false Elasticsearch	4	902	July 5, 2017
Unindexed fields and sparsity Elasticsearch	1	405	March 8, 2018
Figuring out what sparse fields are present in ES index Elasticsearch	1	458	July 5, 2017
Question about the sparseness Elasticsearch	3	636	June 23, 2017
In ES >=6.0, is sparsity for doc_values & norms still bad? Elasticsearch	5	766	June 21, 2018

Is sparse data on not indexed field affect on ES performance?

Related topics