Fieldnorm with shingle filter

ss123 · June 19, 2015, 4:01am

Hi, I am trying to understand the fieldnorm calculation in elasticsearch for documents indexed with a shingle analyzer - it seems different from what I would expect. Specifically, this is the analyzer I used:

{
"index" : {
"analysis" : {
"filter" : {
"shingle_filter" : {
"type" : "shingle",
"max_shingle_size" : 3
}
},
"analyzer" : {
"my_analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["word_delimiter", "lowercase", "shingle_filter"]
}
}
}
}
}

This is the mapping used:
{
"docs": {
"properties": {
"text" : {"type": "string", "analyzer" : "my_analyzer"}
}
}
}

And I posted a few documents:

{"text" : "the"}
{"text" : "the quick"}
{"text" : "the quick brown"}
{"text" : "the quick brown fox jumps"}
...

When using the following query with the explain API,

{
"query": {
"match": {
"text" : "the"
}
}
}

I get the following fieldnorms (other details omitted for brevity):

"_source": {
"text": "the quick"
},
"_explanation": {
"value": 0.625,
"description": "fieldNorm(doc=0)"
}

"_source": {
"text": "the quick brown fox jumps over the"
},
"_explanation": {
"value": 0.375,
"description": "fieldNorm(doc=0)"
}

The values seem to suggest that ES sees 2 terms for the 1st document ("the quick") and 7 terms for the 2nd document ("the quick brown fox jumps over the"), excluding the shingles. Is it possible to configure ES to calculated field norm with the shingled terms too (ie. all terms returned by the analyzer)?

Topic		Replies	Views
Fuzzy searching on shingles filter getting problem Elasticsearch	1	634	November 6, 2018
Fuzzy searching on shingles filter getting problem for search Elasticsearch	1	408	November 9, 2018
How does fieldNorm calculated in the example Elasticsearch	4	1874	July 5, 2017
Shingles and terms aggregation not working as expected Elasticsearch	2	841	August 11, 2020
List all matching shingles Elasticsearch	1	604	May 8, 2018

Fieldnorm with shingle filter

Related topics