Thank you so much! it is working and I no longer see the warning.
But I see some odd behavior. For example:
GET /index_66.175.209.189/_search
{
"query": {
"match": {
"content": {
"query": "13#abc",
"analyzer": "my_analyzer"
}}
},
"highlight" : {
"require_field_match": false,
"fields": {
"content" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] }
}
}
}
In the response, it should show all the files with hit as <em> 13#abc</em>
But only first file is showing the correct hit. Other files are not showing the exact hit value.
Hit response for the file test00001.doc
"meta" : { },
"file" : {
"extension" : "doc",
"content_type" : "text/plain; charset=ISO-8859-1",
"created" : "2020-03-18T23:26:02.269+0000",
"last_modified" : "2020-03-18T23:26:02.269+0000",
"last_accessed" : "2020-03-24T02:43:19.709+0000",
"indexing_date" : "2020-03-24T16:43:10.348+0000",
"filesize" : 144,
"filename" : "test00001.doc",
"url" : "file:///var/www/html/file-scanner/ESFiles/test00001.doc"
},
"path" : {
"root" : "ca5776bfff1151c16ccddbc7a154d40",
"virtual" : "/test00001.doc",
"real" : "/var/www/html/file-scanner/ESFiles/test00001.doc"
}
},
"highlight" : {
"content" : [
"phil-22 pete-34 john-34\n swati@gmail.com \n 123-35-5252 \n 12-32-3525 \n <em>13#abc</em> abc!"
]
}
},
other 15000+ file's responses are as below.
"meta" : { },
"file" : {
"extension" : "doc",
"content_type" : "text/plain; charset=ISO-8859-1",
"created" : "2020-03-18T23:34:28.593+0000",
"last_modified" : "2020-03-18T23:34:28.593+0000",
"last_accessed" : "2020-03-24T02:43:11.799+0000",
"indexing_date" : "2020-03-24T16:43:02.220+0000",
"filesize" : 189,
"filename" : "test04552.doc",
"url" : "file:///var/www/html/file-scanner/ESFiles/test04552.doc"
},
"path" : {
"root" : "ca5776bfff1151c16ccddbc7a154d40",
"virtual" : "/test04552.doc",
"real" : "/var/www/html/file-scanner/ESFiles/test04552.doc"
}
},
"highlight" : {
"content" : [
"john-34\n swati@gmail.com \n bcd01942@gmail.co.in\nbfajk903141@outlook.gov\n123-35-5252 \n 12-32-3525 \n <em>13</em>"
]
}
The same is happening when I try with the pattern query.
s = Search(using=client, index=["index_66.175.209.189"]).query("regexp", content="[0-9]{2}\#[a-zA-Z]+")
response = s.execute()
clearly the special characters are getting recognized but not for all files though.
Could you please tell me is it because of any mapping issue ?
-Lisa