I have sparse data based on a filed named type.
assume
if (type = A ) fields x and y appears
if(type=B) fields w and z appears.
if(type=C) no extra fields appears ( about 70% of data ~ 2 billion data )
I know that sparsity affects on Elastic search performance.
I planned to overcome this problem by setting sparse fields index to false and create another index for indexing them.
is my strategy true?
thank you.
norms, which are on by default on indexed text fields
doc values, which are on by default on keyword, date, ip and numeric fields
I don't think it will help, then your other index will be sparse?
If you really need to support sparse data, I'd suggest that you use Elasticsearch 6.x which has much better support for sparsity.
@jpountz What is the cost of sparsity? Is it disk space only assuming norms and doc_values are enabled? Or does this consume heap space as well?
Meaning, if there are too many sparse fields across documents in an index, would it consume additional disk space compared to the same index with dense data?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.