Elasticseach query optimizations


(roopednra) #1

Is there any way to optimize query in Elasticsearch? I am using below
query. Its taking average 15-20s and sometimes it little bit fast 4-5s.

My server configuration :- Centos 6.3, 8 Core 16GB RAM

{
"fields": [
  "_id",
  "aff_id",
  "post_uri",
  "blog_cat",
  "cat_score",
  "secondary_cat",
  "secondary_cat_score",
  "title",
  "_score"
],
"min_score": 0.0134,
"query": {
  "bool": {
     "must": [
        {
           "query_string": {
              "fields": [
                 "title"
              ],
              "query": "Archery OR Athletics OR Badminton OR Basketball 

OR Beach Volleyball OR Boxing OR Canoe Slalom OR Canoe Sprint OR Cycling
BMX OR Cycling Mountain Bike OR Cycling Road OR Cycling Track OR Diving OR
Equestrian / Dressage OR Equestrian / Eventing OR Equestrian / Jumping OR
Fencing OR Football OR Golf OR Gymnastics Artistic"
}
}
],
"must_not": [],
"should": []
}
}

I read article regarding Elasticsearch query optimization

Tried solution change query like below but doesn't get any difference.

    {
   "fields": [
        "aff_id",
        "post_uri",
        "blog_cat",
        "cat_score",
        "secondary_cat",
        "secondary_cat_score",
        "title"
   ],
   "query": {
      "filtered": {
         "query": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "url.cat": "sports"
                     }
                  },
                  {
                     "range": {
                        "main_cat.sports": {
                           "gte": ".15"
                        }
                     }
                  }
               ]
            }
         },
         "filter": {
            "query": {
               "query_string": {
                  "fields": [
                     "body",
                     "title"
                  ],
                  "query": "Archery OR Athletics OR Badminton OR 

Basketball OR Beach Volleyball OR Boxing OR Canoe Slalom OR Canoe Sprint OR
Cycling BMX OR Cycling Mountain Bike OR Cycling Road OR Cycling Track OR
Diving OR Equestrian / Dressage OR Equestrian / Eventing OR Equestrian /
Jumping OR Fencing OR Football OR Golf OR Gymnastics Artistic"
}
}
}
}
},
"from": 0,
"size": 1000
}

Any help would be greatly appreciated. Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0969654e-f982-4cd3-9990-fa0a2daf303d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #2

A couple of suggestions:

  1. You probably want range condition to go down to the filter part also (so
    bool it with the query_string filter)

  2. The term (url.cat=sports) query can potentially move down to the filter
    section too (so bool it with the query_string filter)

  3. The query_string/query filter is not cached by default but you can turn
    it on by setting _cache: true, like for example:

{
"query": {
"filtered": {
"filter": {
"fquery": {
"query": {
"query_string": {
"query": "blahblah"
}
},
"_cache": true
}
}
}
}
}

Once the filters are cached (and remain static), the succeeding query calls
should be a lot faster.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bf89f4ca-5a73-47aa-b0be-a8b3a9bc99c8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(roopednra) #3

@Binh Ly, Thanks for your response.

I have tried to change query as per your suggestion and I analysed after
using "_cache:true"
performance little bit increase after second/third hit.

I am trying in SENSE chrome plugin.

I have tried third point to change my query but its not working for me.

{
  "query": {
    "filtered": {
      "filter": {
        "fquery": {
          "query": {
            "query_string": {
              "query": "blahblah"
            },
            "bool": 
            {
                 "must": [
                    {
                       "term": {
                          "url.cat": {
                             "value": "sports"
                          }
                       }
                    },
                    {
                       "range": {
                          "main_cat.sports": {
                             "gte": ".15"
                          }
                       }
                    }
                 ]
          }
          },
          "_cache": true
        }
      }
    }
  }
}

But below query works. Am I doing something wrong? Can you please suggest me

POST _search?preference=_primary
{
"fields": [
"aff_id",
"post_uri",
"blog_cat",
"cat_score",
"secondary_cat",
"secondary_cat_score",
"title"
],
"query": {
"filtered": {
"query": {
"query_string": {
"default_field": "title",
"query": "blahblah"
}
},
"filter": {
"fquery": {
"query": {
"bool": {
"must": [
{
"term": {
"url.cat": {
"value": "sports"
}
}
},
{
"range": {
"main_cat.sports": {
"gte": ".15"
}
}
}
]
}
},
"_cache": true
}
}
}
},
"from": 0,
"size": 1000
}

On Wednesday, 12 February 2014 20:47:25 UTC+5:30, Binh Ly wrote:

A couple of suggestions:

  1. You probably want range condition to go down to the filter part also
    (so bool it with the query_string filter)

  2. The term (url.cat=sports) query can potentially move down to the
    filter section too (so bool it with the query_string filter)

  3. The query_string/query filter is not cached by default but you can turn
    it on by setting _cache: true, like for example:

{
"query": {
"filtered": {
"filter": {
"fquery": {
"query": {
"query_string": {
"query": "blahblah"
}
},
"_cache": true
}
}
}
}
}

Once the filters are cached (and remain static), the succeeding query
calls should be a lot faster.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/abce9e99-9bb1-4494-9bd2-84f63a6708bc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #4

The rough syntax should be something like this:

{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
{
"term": ...
},
{
"range": ...
},
"fquery": {
"query": {
"query_string": {
"query": "blahblah"
}
},
"_cache": true
}
}
]
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d8f02f4-b8ec-4b21-98ce-18ceb432b251%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #5

Should be roughly something like this:

{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": { }
},
{
"range": {}
},
{
"fquery": {
"query": {
"query_string": {
"query": "blahblah"
}
},
"_cache": true
}
}
]
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5edc6392-a80b-474d-a2b2-e46e2f2b9554%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6