Field per language and total number of fields performance concern


(Eduard Dudar) #1

Hi there,

Are there any field restrictions and best practices when building a query that contains A LOT of fields as the result of multi-lingual fields? I have 4 properties: name, description, keywords, and transcript. For each of them, I do shingles and language-dependent analysis. I target 15 languages. In total it gets to 120+ fields in a query like this:

GET video/video/_search
{
   "query": {
      "filtered": {
         "query": {
            "dis_max": {
               "tie_breaker": 0.2,
               "queries": [
                  {
                     "query_string": {
                        "query": "...",
                        "fields": [ "name_shingles_en^10.0", "name_lang_en^2.5", "name_shingles_*^5.0", "name_lang_*^1.25" ],
                        "use_dis_max": false,
                        "analyze_wildcard": true
                     }
                  },
                  {
                     "query_string": {
                        "query": "...",
                        "fields": [ "description_shingles_en^5.0", "description_lang_en^1.25", "description_shingles_*^2.5", "description_lang_*^0.625" ],
                        "use_dis_max": false,
                        "analyze_wildcard": true
                     }
                  },
                  {
                     "query_string": {
                        "query": "...",
                        "fields": [ "keyword_shingles_en^2.0", "keyword_lang_en^0.5", "keyword_shingles_*^1.0", "keyword_lang_*^0.25" ],
                        "use_dis_max": false,
                        "analyze_wildcard": true
                     }
                  },
                  {
                     "query_string": {
                        "query": "...",
                        "fields": [ "transcript_shingles_en^1.0", "transcript_lang_en^0.25", "transcript_shingles_*^0.5", "transcript_lang_*^0.125" ],
                        "use_dis_max": false,
                        "analyze_wildcard": true
                     }
                  },
                  {
                     "query_string": {
                        "query": "...",
                        "fields": [ "_all^0.1" ],
                        "analyze_wildcard": true
                     }
                  }
               ]
            }
         },
         "filter": {...}
      }
   }
}

Under 800-900 RPS load CPU on the cluster skyrockets and well as response time in comparison to query without _*. Typical response time is about 15-20ms becomes 150+ which is an order of magnitude. As far as everything written here is quite close to what ES guide says to do I'm wondering if there's any limitation on the number of fields or something else

https://www.elastic.co/guide/en/elasticsearch/guide/1.x/one-lang-fields.html
https://www.elastic.co/guide/en/elasticsearch/guide/1.x/_best_fields.html (name, description, etc): dis_max
https://www.elastic.co/guide/en/elasticsearch/guide/1.x/most-fields.html (shingles + language): bool

Thanks!


(system) #2