How does elasticsearch store repeated values across documents?

JonathanAaron · March 1, 2018, 7:28pm

Say I have a document

{"name":"first","class": "Type-A"}

Then another document

{"name":"second","class": "Type-A"}

Does it store Type-A twice or does it reference repeated values. Lets say I have a million docs like this referring to Type-A? Do ES optimize this somehow?

Thanks,

Jonathan

dadoonet · March 2, 2018, 1:06am

Two things. Elasticsearch builds an inverted index out of that basically, the inverted index part will contain something like:

Type-A: 1, 2

Where 1 and 2 are the documents id. That's schematic as other things are also added.

But, elasticsearch will also add a stored field named _source which will contain:

{"name":"first","class": "Type-A"}

And

{"name":"second","class": "Type-A"}

As is. But this is compressed by default.

Normally, you don't really have to think about all this.

Do ES optimize this somehow?

As yes, ES optimized as much as possible all that.

JonathanAaron · March 2, 2018, 4:11pm

Thanks for the response. My team is facing space issues and we want to add another attribute to our documents to query by, but that involves updating billions of documents. So I'm trying to estimate the space cost. Seems the inverted index space would be trivial. Anyway to figure out how elastic compresses things?

dadoonet · March 2, 2018, 5:15pm

My team is facing space issues

Well. Really often the cost of complexity and the tradeoffs are much bigger than buying new hardware (disks). But I can't tell for you.

Anyway to figure out how elastic compresses things?

More about compression here: Index Modules | Elasticsearch Reference [6.2] | Elastic

You can read this section which gives a lot of advices:

system · March 30, 2018, 5:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Are ENUM stored efficiently? Elasticsearch	4	6934	July 5, 2017
How to store and search for same values in a field Elasticsearch	1	339	May 14, 2020
Multiple fields with different values in a same document Elasticsearch	6	593	July 5, 2017
Question about compression of field with same value for every document Elasticsearch	1	310	July 24, 2019
Document compression - duplicate fields Elasticsearch	2	362	November 18, 2020

How does elasticsearch store repeated values across documents?

Related topics