Unexpected Shingle Behaviour

Hey,
I have a field locality in my index having 'ashok vihar phase 2' as one of the document. The corresponding setting is

{"settings": {
       "analysis": {
           "analyzer": {
               "custom_analyzer": {
                   "tokenizer": "standard",
                   "filter": ["shingle_filter", "remove_duplicates"],
               }
           },
           "filter": {
               "shingle_filter": {
                   "type": "shingle",
                   "min_shingle_size": 2,
                   "max_shingle_size": 4,
                   "output_unigrams": False,
                   "output_unigrams_if_no_shingles": True,
               }
           }
}
}

So it should ideally create 6 shingles i.e. 'ashok vihar, 'ashok vihar phase', 'ashok vihar phase 2', 'vihar phase' and so on.

When my input search is: 'ashok vihar 2'

and I use the explain-api to see how is it maching I get:

{
                        "value": 4.618802,
                        "description": "sum of:",
                        "details": [
                            {
                                "value": 4.618802,
                                "description": "weight(Synonym(locality_shingle:ashok vihar locality_shingle:ashok vihar 2) in 89) [PerFieldSimilarity], result of:",
                                "details": [
                                    {
                                        "value": 4.618802,
                                        "description": "score from ScriptedSimilarity(weightScript=[null], script=[Script{type=inline, lang='painless', idOrCode='double norm = 1.0/Math.sqrt(doc.length); return query.boost * norm;', options={}, params={}}]) computed from:",
                                        "details": [
                                            {
                                                "value": 1.0,
                                                "description": "weight",
                                                "details": []
                                            },
                                            {
                                                "value": 8.0,
                                                "description": "query.boost",
                                                "details": []
                                            },
                                            {
                                                "value": 17042,
                                                "description": "field.docCount",
                                                "details": []
                                            },
                                            {
                                                "value": 27606,
                                                "description": "field.sumDocFreq",
                                                "details": []
                                            },
                                            {
                                                "value": 27606,
                                                "description": "field.sumTotalTermFreq",
                                                "details": []
                                            },
                                            {
                                                "value": 111,
                                                "description": "term.docFreq",
                                                "details": []
                                            },
                                            {
                                                "value": 111,
                                                "description": "term.totalTermFreq",
                                                "details": []
                                            },
                                            {
                                                "value": 1.0,
                                                "description": "doc.freq",
                                                "details": []
                                            },
                                            {
                                                "value": 3,
                                                "description": "doc.length",
                                                "details": []
                                            }
                                        ]
                                    }

It creates weight(Synonym(locality_shingle: ashok vihar)). I'm unable to understand how's the shingle matching working and also how the doc_length turns out to be 3. Also despite of creating a separate field for shingles it seems to be using the lucene SynonymQuery .

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.