Elasticsearch | Max_Expansions and copy_to field in Multi_Match query

pranav24 · March 29, 2019, 8:50am

Hi,
I am finding few issues with max_expansions. Moreover there is limited information over this.
I have created two indices. Both of them contains a Copy_to field. The name given to both of it is same. I have over 100 fields in index "index_one" and 50 fields in Index "index_two"
My mapping is similar in both indexes (although few fields are different among these indices) :

  {  "settings": {
           "number_of_shards": "2",
              "analysis": {"filter": { "indexFilter": { "type": "pattern_capture","preserve_original": "true",
                            "patterns": ["([@,$,%,&,!,.,#,^,*]+)", "([\\w,.]+)","([\\w,@]+)",  "([-]+)", "(\\w+)"]  } },
                    "analyzer": {
                        "indexAnalyzer": {
                            "filter": [ "indexFilter",    "lowercase"   ],
                            "tokenizer": "whitespace"  },
                        "searchAnalyzer": { "filter": [  "lowercase" ], "tokenizer": "whitespace"   } }  } } },
  	  "mappings": {  "_doc": {"properties": {
  			"ID": { "type": "text","term_vector": "with_positions_offsets",
                        "fields": {"keyword": {
                                "type": "keyword",
                                "ignore_above": 256 }  },
                        "analyzer": "indexAnalyzer",
                        "search_analyzer": "searchAnalyzer"
                    },
                    "FILE_NAME": {
                        "type": "text",
                        "term_vector": "with_positions_offsets",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256 }   },
                        "copy_to": [ "MONOLITHIC_FIELD"  ],
                        "analyzer": "indexAnalyzer",
                        "search_analyzer": "searchAnalyzer"
                    },
			"TEXT_ID": {
                        "type": "text",
                        "term_vector": "with_positions_offsets",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256 }  },
                        "copy_to": [ "MONOLITHIC_FIELD"  ],
                        "analyzer": "indexAnalyzer",
                        "search_analyzer": "searchAnalyzer"
                    },
                    "SCAN_COPY": { "type": "text",
                        "term_vector": "with_positions_offsets",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256  } },
                        "copy_to": [ "MONOLITHIC_FIELD"  ],
                        "analyzer": "indexAnalyzer",
                        "search_analyzer": "searchAnalyzer"  } }} }}

Documents indexed is :
POST : http://11.43.1.130:9993/index_one/_doc/DATA13185

{
"SCAN_COPY": "DATA13185.pdf",
 "TEXT_ID": "DATA13185",
"FILE_NAME": "DATA13185.pdf",
"ID": "439132553aab42f484fa515b2e52d899"   }

POST : http://11.43.1.130:9993/index_two/_doc/DATA13185_c2a40abf-916a-4529-9d07-3ed1796c2c19

{ "TEXT_ID": "DATA13185",
        "ID": "439132553aab42f484fa515b2e52d899"  }

I have a query running on both of these indices at once.
The query is :

{"from" : 0,  "query": {
    "bool": {"should": [   {
    "bool": {"should": [{
        "multi_match": {
           "query":"data13185",
           "boost": "10",        
           "operator":"and",                    
           "analyzer":"whitespace",
                  "fuzziness": "AUTO:4,7",
                  "prefix_length":1,
                  "max_expansions":50,
                 "fields":["MONOLITHIC_FIELD"] }}],"minimum_should_match":1}}],"minimum_should_match":1,"filter": [ { "terms":  { "ID.keyword" : ["439132553aab42f484fa515b2e52d899"]} } ]}},"highlight": {
        "type" : "unified",
           "fields": { "*": { "require_field_match" : "false" }   }  }}

The highlight results are :

 "hits": {
        "total": 1,
        "max_score": 40.16386,
        "hits": [
            {
                "_index": "index_two",
                "_type": "_doc",
                "_id": "DATA13185_c2a40abf-916a-4529-9d07-3ed1796c2c19",
"highlight": {
                    "TEXT_ID": [
                        "<em>DATA13185</em>"
                    ],
                    "ID": [
                        "<em>439132553aab42f484fa515b2e52d899</em>"
                    ],
                    "ID.keyword": [  "<em>439132553aab42f484fa515b2e52d899</em>"]
                }  }  ]  }

Search matches only in one of the index i.e., "index_two" even though max_expansions in set to 50. I know max_expansions work on a shard level but this should not cause the issue.
But once when I change the max_expansions value to 150, both the documents in two different indices are matched.

{"from" : 0, "size" : 10,  "query": {
    "bool": {"should": [   {
    "bool": {"should": [{
        	
          "multi_match": {
           "query":"data13185",
           "boost": "10",        
           "operator":"and",                    
           "analyzer":"whitespace",
                  "fuzziness": "AUTO:4,7",
                  "prefix_length":1,
                  "max_expansions":150,
                  
    			     "fields":["MONOLITHIC_FIELD"]
}}],"minimum_should_match":1}}],"minimum_should_match":1,"filter": [ { "terms":  { "ID.keyword" : ["439132553aab42f484fa515b2e52d899"]} } ]}},"highlight": {
        "type" : "unified",
          
        "fields": {
            "*": { "require_field_match" : "false"
            }  } }}

The search results are :

     "hits": {
            "total": 2,
            "max_score": 90.1923,
            "hits": [
                {
                "_index": "index_one",
                "_type": "_doc",
                "_id": "DATA13185",
                "_score": 90.1923,
    "highlight": {
                        "SCAN_COPY": [  "<em>DATA13185.pdf</em>" ],
                    "TEXT_ID": [   "<em>DATA13185</em>"  ],
                    "ID": [ "<em>439132553aab42f484fa515b2e52d899</em>"  ],
                    "FILE_NAME": [ "<em>DATA13185.pdf</em>",
                    "ID.keyword": ["<em>439132553aab42f484fa515b2e52d899</em>"  ]}   },
 {   **The second hit is same as the one mentioned in previous search result. Due to restriction of characters I have not added it**  ]}

pranav24 · April 1, 2019, 5:58am

@elastic Please provide any information

pranav24 · April 3, 2019, 6:08am

@elastic Please provide suggestions if any

system · May 1, 2019, 6:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch \| max_expansions True meaning Elasticsearch	2	561	April 16, 2019
Copy_to within multi fields Elasticsearch	1	756	July 6, 2017
Elastic 1.7 multi_field/copy_to question Elasticsearch	1	618	July 5, 2017
Use copy_to for multiple fields, but the analyzer for each field is different from the analyzer for search_analyzer Elasticsearch	2	524	June 22, 2019
Guidance on mapping and query Elasticsearch	3	172	June 14, 2023

Elasticsearch | Max_Expansions and copy_to field in Multi_Match query

Related topics