Why do we need field-data?

Utkarsh_Pyne · April 24, 2017, 8:43am

Inverted index is the mapping of terms to document whereas field-data/doc values is the mapping of documents to terms. In inverted index for a field, unique values of that field in the index is the key whereas in field-data document IDs is the key. Now while aggregations we create buckets based on the field values which we don't know beforehand.

My question is isn't there a function like getKeys() to get all the keys of the inverted index which will be the unique values of a field in that index and then using each value of this set to access an entry in the inverted index and traverse through the list of documents which are mapped to this key in the inverted index and update the buckets ?

I know this is not how it happens today and I'm guessing that there is a good reason for not doing so, I'm interested in knowing that reason.

Mark_Harwood · April 24, 2017, 9:04am

Speed.

That's pretty much it. To use a book analogy - if you want to know what's on pages 1, 5 and 7 of a book you turn to pages 1, 5 and 7 directly and read them. You don't go to the index at the back of the book, scan the alphabetic list of all words and for each word scan their list of page-mentions to see if they include 1, 5 or 7.

Utkarsh_Pyne · April 24, 2017, 1:19pm

Mark, this makes sense in case of multi-level aggregation where on the first level documents will be divided into buckets and in the second level buckets if we have an index based on document IDs it's much cheaper but in case single level aggregations how will field-data make a difference ? Let's say we had 3 docs with only one field X which can have two unique values A & B, so inverted index would look like,

A-> 1,3
B-> 2

Field-data will look like,
1->A
2->B
3->A

Now if I do an aggregation, I have to traverse through all 3 documents to retrieve from field-data and similarly will require 3 operations to get the same result from inverted index.

Mark_Harwood · April 24, 2017, 1:27pm

The most common use case is user searches for smartphone in field text and then sorts matching docs by doc values in the field popularity or groups up ranges of doc values in the field price or manufacturer.

We use the inverted index to quickly find the list of doc IDs that contain "smartphone" but then use doc values (previously using fielddata) to quickly retrieve those price and popularity values for the thousands of documents that match.

system · May 22, 2017, 1:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Illustration of DocValues, Fielddata and Inverted Index Elasticsearch	1	301	November 23, 2021
What's the difference between "store" and "doc_values" in field properties? Elasticsearch	7	8002	January 24, 2017
Why inverted index is not good at Aggregation? Elasticsearch	12	918	September 20, 2019
Doc values vs inverted index Elasticsearch	2	767	July 5, 2017
Indexing performance with doc values (particularly with larger number of fields) Elasticsearch	2	576	July 6, 2017

Why do we need field-data?

Related topics