Hi,
I am finding few issues with max_expansions. Moreover there is limited information over this.
I have created two indices. Both of them contains a Copy_to field. The name given to both of it is same. I have over 100 fields in index "index_one" and 50 fields in Index "index_two"
My mapping is similar in both indexes (although few fields are different among these indices) :
{ "settings": {
"number_of_shards": "2",
"analysis": {"filter": { "indexFilter": { "type": "pattern_capture","preserve_original": "true",
"patterns": ["([@,$,%,&,!,.,#,^,*]+)", "([\\w,.]+)","([\\w,@]+)", "([-]+)", "(\\w+)"] } },
"analyzer": {
"indexAnalyzer": {
"filter": [ "indexFilter", "lowercase" ],
"tokenizer": "whitespace" },
"searchAnalyzer": { "filter": [ "lowercase" ], "tokenizer": "whitespace" } } } } },
"mappings": { "_doc": {"properties": {
"ID": { "type": "text","term_vector": "with_positions_offsets",
"fields": {"keyword": {
"type": "keyword",
"ignore_above": 256 } },
"analyzer": "indexAnalyzer",
"search_analyzer": "searchAnalyzer"
},
"FILE_NAME": {
"type": "text",
"term_vector": "with_positions_offsets",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256 } },
"copy_to": [ "MONOLITHIC_FIELD" ],
"analyzer": "indexAnalyzer",
"search_analyzer": "searchAnalyzer"
},
"TEXT_ID": {
"type": "text",
"term_vector": "with_positions_offsets",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256 } },
"copy_to": [ "MONOLITHIC_FIELD" ],
"analyzer": "indexAnalyzer",
"search_analyzer": "searchAnalyzer"
},
"SCAN_COPY": { "type": "text",
"term_vector": "with_positions_offsets",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256 } },
"copy_to": [ "MONOLITHIC_FIELD" ],
"analyzer": "indexAnalyzer",
"search_analyzer": "searchAnalyzer" } }} }}
Documents indexed is :
POST : http://11.43.1.130:9993/index_one/_doc/DATA13185
{
"SCAN_COPY": "DATA13185.pdf",
"TEXT_ID": "DATA13185",
"FILE_NAME": "DATA13185.pdf",
"ID": "439132553aab42f484fa515b2e52d899" }
POST : http://11.43.1.130:9993/index_two/_doc/DATA13185_c2a40abf-916a-4529-9d07-3ed1796c2c19
{ "TEXT_ID": "DATA13185",
"ID": "439132553aab42f484fa515b2e52d899" }
I have a query running on both of these indices at once.
The query is :
{"from" : 0, "query": {
"bool": {"should": [ {
"bool": {"should": [{
"multi_match": {
"query":"data13185",
"boost": "10",
"operator":"and",
"analyzer":"whitespace",
"fuzziness": "AUTO:4,7",
"prefix_length":1,
"max_expansions":50,
"fields":["MONOLITHIC_FIELD"] }}],"minimum_should_match":1}}],"minimum_should_match":1,"filter": [ { "terms": { "ID.keyword" : ["439132553aab42f484fa515b2e52d899"]} } ]}},"highlight": {
"type" : "unified",
"fields": { "*": { "require_field_match" : "false" } } }}
The highlight results are :
"hits": {
"total": 1,
"max_score": 40.16386,
"hits": [
{
"_index": "index_two",
"_type": "_doc",
"_id": "DATA13185_c2a40abf-916a-4529-9d07-3ed1796c2c19",
"highlight": {
"TEXT_ID": [
"<em>DATA13185</em>"
],
"ID": [
"<em>439132553aab42f484fa515b2e52d899</em>"
],
"ID.keyword": [ "<em>439132553aab42f484fa515b2e52d899</em>"]
} } ] }
Search matches only in one of the index i.e., "index_two" even though max_expansions in set to 50. I know max_expansions work on a shard level but this should not cause the issue.
But once when I change the max_expansions value to 150, both the documents in two different indices are matched.
{"from" : 0, "size" : 10, "query": {
"bool": {"should": [ {
"bool": {"should": [{
"multi_match": {
"query":"data13185",
"boost": "10",
"operator":"and",
"analyzer":"whitespace",
"fuzziness": "AUTO:4,7",
"prefix_length":1,
"max_expansions":150,
"fields":["MONOLITHIC_FIELD"]
}}],"minimum_should_match":1}}],"minimum_should_match":1,"filter": [ { "terms": { "ID.keyword" : ["439132553aab42f484fa515b2e52d899"]} } ]}},"highlight": {
"type" : "unified",
"fields": {
"*": { "require_field_match" : "false"
} } }}
The search results are :
"hits": {
"total": 2,
"max_score": 90.1923,
"hits": [
{
"_index": "index_one",
"_type": "_doc",
"_id": "DATA13185",
"_score": 90.1923,
"highlight": {
"SCAN_COPY": [ "<em>DATA13185.pdf</em>" ],
"TEXT_ID": [ "<em>DATA13185</em>" ],
"ID": [ "<em>439132553aab42f484fa515b2e52d899</em>" ],
"FILE_NAME": [ "<em>DATA13185.pdf</em>",
"ID.keyword": ["<em>439132553aab42f484fa515b2e52d899</em>" ]} },
{ **The second hit is same as the one mentioned in previous search result. Due to restriction of characters I have not added it** ]}