First query is slow, rest of them are fast. How to boost this?


#1

I'm using this kind of mapping (well, it's a shortener version in order to make the question easier) on a children-parent relationship where item is the parent and user_items is the children.

curl -XPUT 'localhost:9200/myindex?pretty=true' -d '{
  "mappings": {
    "items": {
       "dynamic": "strict",
       "properties" : {
            "title" : { "type": "string" },
            "body" : { "type": "string" },
}},
    "user_items": {
      "dynamic": "strict",
      "_parent": {"type": "items" },
      "properties" : {
            "user_id" : { "type": "integer" },
            "source_id" : { "type": "integer" },
}}}}'

And the type of query I usually make:

curl -XGET 'localhost:9200/myindex/items/_search?pretty=true' -d '{
    "query": {
      "bool": {
         "must": [
            {
               "query_string": {
                  "fields": ["title", "body"],
                  "query": "mercado"
               }
            },
            {
               "has_child": {
                  "type": "user_items",
                  "query": {
                     "term": {
                        "user_id": 655
    }}}}]}}}'

On this query it has to search on the fields title and body the string mercado on a given user_id, in this case 655

I read that the reason of being so slow the first query is that it
gets cacheed and then the rest queries are fast because it works with
the cached content.

I read I can make the first query faster using eager to preload my data (using "loading" : "eager", right?) but I dont know what do I've to preload. Do I've to use the earger on title and body as follows?

{
  "mappings": {
    "items": {
       "dynamic": "strict",
       "properties" : {
            "title" : { "type": "string" ,
                        "fielddata": {
                            "loading" : "eager"}},
            "body" : { "type": "string",
                        "fielddata": {
                            "loading" : "eager"}},
}},
    "user_items": {
      "dynamic": "strict",
      "_parent": {"type": "items" },
      "properties" : {
            "user_id" : { "type": "integer" },
            "source_id" : { "type": "integer" },
}}}}'

Any other recommendation fot boosting/cacheeing the first query is welcome. Thanks in advance

PS: I'm using ES 2.3.2 under a Linux machine and I've a total of 25.396.369 documents.


(Michael McCandless) #2

I think that's the right place to set eager: https://www.elastic.co/guide/en/elasticsearch/guide/current/preload-fielddata.html#eager-fielddata

However, it's much better to switch to doc values instead, which are disk-based / off-heap. This lets the OS manage the "hot" pages itself, without paying a huge penalty for whoever first needs the values for this field.

Mike McCandless


#3

Thanks for the reply, @mikemccand

I've a little doubt wi the doc values.

Can I set the doc_values to false when all the fields are using the following analyzer?

curl -XPUT 'localhost:9200/myindex/_settings?pretty=true' -d '{
    "analysis": {
      "analyzer": {
        "default": {
          "tokenizer" : "standard",
          "filter":  [ "lowercase", "asciifolding"],
          "char_filter" : "html_strip"
}}}}'

(as the body or the title contains some html, I want to remove all the tags from it and that's why I'm using html_strip)

I dont know if I misunderstood the docs, but it seems that you can't use doc_values : false when you are using analyzed fields.


(system) #4