Version 2.0: using doc values for result list

Tzahi · July 25, 2015, 9:51pm

AFAIK in the current version, query results were always returned from the fields stored in the fdt file. To get values from the doc values (dvd file) you had to use the aggregation framework.

AFAIK in the coming version 2.0, all fields will be stored by default as doc values (minus analysed text fields).

Question: will query results be read from the dvd file (doc values) and not from the slower fdt file ?

Thanks, Tzahi

jpountz · July 27, 2015, 4:42pm

Query results will still be loaded from the stored fields, meaning the fdt file. It is indeed slower if your index fits entirely in RAM, but it also knows about the json structure and provides much better latency if you happen to have lots of fields or a larger index than your main memory.

Tzahi · July 28, 2015, 7:01am

Thanks for the answer.

I am trying to implement a join in ES(sort of). What I need is to retrieve a single field ( the _IDs of linked documents) and the _ID of the containing document. I am using a _count + "facets" aggregation.

Is this the best\only way to retrieve a single field value ?

Thanks again, Tzahi.

jpountz · July 28, 2015, 7:25am

If you want to retrieve all matching _id values, then this option has the downside that it does not support pagination, so your only option is to retrieve every id in a single request, which will not scale if you have a large index.

In the future, we might be able to optimize the single-field use-case by going to doc values instead of stored fields, but this is something which is not implemented today. Additionally, it might prove challenging for some fields. For instance for dates, we store a formatted date in stored fields while doc values only store the timestamp in milliseconds.

I assume that you already know of parent/child relations which allow to perform (limited) joins?

Tzahi · July 28, 2015, 2:29pm

Thanks so much again.
as you said - parent\child does not support many-to-many links, so It does not help my case.
I intend to store the source _ID, link type and target _ID in a single multi-valued field. this will give the full link while accessing a single location in the doc values column.
Is it possible to write a custom aggregation that will scale better than facets for my use case?

Thanks again for this inspiring product

jpountz · July 28, 2015, 4:17pm

Many-to-many relations are very hard to deal with in a distributed system. At least the one-to-many case can colocate data that have relations in the same partition, but this is generally not possible with many-to-many relations. To be honest, I think the only way to tackle such a problem would be to use some heuristics and eg. only follow relations of the top matches of the first query (similarly to how the top_children query did).

Topic		Replies	Views
Is it possible to get query results from document values? Elasticsearch	3	397	July 6, 2017
Scan query that returns document values only is heavily accessing the *.FDT file Elasticsearch	4	805	July 6, 2017
Aggregations in 2.1.0 much slower than 1.6.0 Elasticsearch	34	4336	July 5, 2017
Indexing performance with doc values (particularly with larger number of fields) Elasticsearch	2	570	July 6, 2017
Doc Values vs Field Data Questions Elasticsearch	6	1821	July 6, 2017

Version 2.0: using doc values for result list

Related topics