Hey,
I have a field locality in my index having 'ashok vihar phase 2'
as one of the document. The corresponding setting is
{"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"tokenizer": "standard",
"filter": ["shingle_filter", "remove_duplicates"],
}
},
"filter": {
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 4,
"output_unigrams": False,
"output_unigrams_if_no_shingles": True,
}
}
}
}
So it should ideally create 6 shingles i.e. 'ashok vihar, 'ashok vihar phase', 'ashok vihar phase 2', 'vihar phase'
and so on.
When my input search is: 'ashok vihar 2'
and I use the explain-api to see how is it maching I get:
{
"value": 4.618802,
"description": "sum of:",
"details": [
{
"value": 4.618802,
"description": "weight(Synonym(locality_shingle:ashok vihar locality_shingle:ashok vihar 2) in 89) [PerFieldSimilarity], result of:",
"details": [
{
"value": 4.618802,
"description": "score from ScriptedSimilarity(weightScript=[null], script=[Script{type=inline, lang='painless', idOrCode='double norm = 1.0/Math.sqrt(doc.length); return query.boost * norm;', options={}, params={}}]) computed from:",
"details": [
{
"value": 1.0,
"description": "weight",
"details": []
},
{
"value": 8.0,
"description": "query.boost",
"details": []
},
{
"value": 17042,
"description": "field.docCount",
"details": []
},
{
"value": 27606,
"description": "field.sumDocFreq",
"details": []
},
{
"value": 27606,
"description": "field.sumTotalTermFreq",
"details": []
},
{
"value": 111,
"description": "term.docFreq",
"details": []
},
{
"value": 111,
"description": "term.totalTermFreq",
"details": []
},
{
"value": 1.0,
"description": "doc.freq",
"details": []
},
{
"value": 3,
"description": "doc.length",
"details": []
}
]
}
It creates weight(Synonym(locality_shingle: ashok vihar)). I'm unable to understand how's the shingle matching working and also how the doc_length turns out to be 3. Also despite of creating a separate field for shingles it seems to be using the lucene SynonymQuery .