Scan query that returns document values only is heavily accessing the *.FDT file

Hi all,

I have a tests index with 43 million documenst. there is a string document
value for each document. (about 5-10 character value for each document)

Mapping is:

{

"myindex" : {

"mappings" : {

  "num_type" : {

    "_type" : {

      "store" : true

    },

    "properties" : {

      "doc_value" : {

        "type" : "string",

        "doc_values_format" : "default"

      },

      "int1" : {

        "type" : "integer",

        "index" : "analyzed",

        "store" : true

      },

      "int2" : {

.

.

.

I need to retrieve the document values only for queries that may return
about 100,000 documents result set. I do not need ranking or anything else
that will slow this down.

My understanding is that if the query is only a filter – ranking is not
computed, and it is faster.

Here is a small python program to test it:

*import *elasticsearch

es = elasticsearch.Elasticsearch()

results = es.search("myindex", "num_type",
{
"fields":["doc_value"],
"size":1000,
"query": {"filtered": {
"query": {"match_all":{}}
,"filter": {
"term": {"r_int3": 929}}
}}
},scroll="10s",search_type="scan")

while True:
results = es.scroll(results["_scroll_id"], scroll="10s")
if len(results["hits"]["hits"]) <= 0:
break

The query runs pretty slow, and I see there is huge number of access to the
*.fdt (field data) file.

But I ask for a document value field – so why does ES access the *.fdt.

Thanks a lot in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89480f13-b00e-4e3f-a538-15fdbd18f073%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Doc values are stored in the .fdt files.

Jörg

On Sun, Nov 23, 2014 at 11:52 PM, Tzahi jakubovitz tzahij@hotmail.com
wrote:

Hi all,

I have a tests index with 43 million documenst. there is a string document
value for each document. (about 5-10 character value for each document)

Mapping is:

{

"myindex" : {

"mappings" : {

  "num_type" : {

    "_type" : {

      "store" : true

    },

    "properties" : {

      "doc_value" : {

        "type" : "string",

        "doc_values_format" : "default"

      },

      "int1" : {

        "type" : "integer",

        "index" : "analyzed",

        "store" : true

      },

      "int2" : {

.

.

.

I need to retrieve the document values only for queries that may return
about 100,000 documents result set. I do not need ranking or anything else
that will slow this down.

My understanding is that if the query is only a filter – ranking is not
computed, and it is faster.

Here is a small python program to test it:

*import *elasticsearch

es = elasticsearch.Elasticsearch()

results = es.search("myindex", "num_type",
{
"fields":["doc_value"],
"size":1000,
"query": {"filtered": {
"query": {"match_all":{}}
,"filter": {
"term": {"r_int3": 929}}
}}
},scroll="10s",search_type="scan")

while True:
results = es.scroll(results["_scroll_id"], scroll="10s")
if len(results["hits"]["hits"]) <= 0:
break

The query runs pretty slow, and I see there is huge number of access to
the *.fdt (field data) file.

But I ask for a document value field – so why does ES access the *.fdt.

Thanks a lot in advance.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/89480f13-b00e-4e3f-a538-15fdbd18f073%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/89480f13-b00e-4e3f-a538-15fdbd18f073%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEsDnXCbmV0tGmNwuYvAwdW-t%2BYJhf6mYmbN4ZVf3fMrQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks
Sorry - I did not stress this is document values and not field values.
Document values are stores in DVD file. which is small, compressed format.
I defined it to avoide having to access and parse the lucene document from
the huge FDT file (in my test- FDT file is 1000 times bigger than DVD file).
see
https://lucene.apache.org/core/4_3_1/core/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.html
.

I still try to avoide accessing the FDT file - it makes my query toooo slow.

Thanks again.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd6ed6a9-f1c7-47c4-be3d-833553cb2bf6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Oh, sorry. Yess, doc values are in .dvd files.

I assume that ES still puts hidden "type" and "uid" field in .fdt. But I'm
also surprised, there should be not much disk access for that.

Jörg

On Mon, Nov 24, 2014 at 10:04 AM, Tzahi jakubovitz tzahij@hotmail.com
wrote:

Thanks
Sorry - I did not stress this is document values and not field values.
Document values are stores in DVD file. which is small, compressed format.
I defined it to avoide having to access and parse the lucene document from
the huge FDT file (in my test- FDT file is 1000 times bigger than DVD file).
see
Lucene42DocValuesFormat (Lucene 4.3.1 API)
.

I still try to avoide accessing the FDT file - it makes my query toooo
slow.

Thanks again.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cd6ed6a9-f1c7-47c4-be3d-833553cb2bf6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cd6ed6a9-f1c7-47c4-be3d-833553cb2bf6%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEnzt3BFr-6jmQ6voNxn9pkG5bsdYnK-iV8HauRTRkKyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.