Is sparse data on not indexed field affect on ES performance?

I have sparse data based on a filed named type.
assume

if (type = A ) fields x and y appears
if(type=B) fields w and z appears.
if(type=C) no extra fields appears ( about 70% of data ~ 2 billion data )

I know that sparsity affects on Elastic search performance.
I planned to overcome this problem by setting sparse fields index to false and create another index for indexing them.
is my strategy true?
thank you.

Spasity affects mostly:

  • norms, which are on by default on indexed text fields
  • doc values, which are on by default on keyword, date, ip and numeric fields

I don't think it will help, then your other index will be sparse?
If you really need to support sparse data, I'd suggest that you use Elasticsearch 6.x which has much better support for sparsity.

2 Likes

@jpountz What is the cost of sparsity? Is it disk space only assuming norms and doc_values are enabled? Or does this consume heap space as well?

Meaning, if there are too many sparse fields across documents in an index, would it consume additional disk space compared to the same index with dense data?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.