Hi!
I indexed a lot of documents. Two are very similar as they have the same
name and the same city. The only difference is that on of these two has
some fields with a lot of text in it. When I do a search over _all fields I
would expect that both results have a very similar score. But the one with
the fields full of text has a significant lower score than the one with the
short text. In my case the scores are ~0.2 for the one with long texts and
~0.6 for the one with the short text.
So, how can I make sure, these two documents get a similar score?
Robert.
My analyzers:
{'analysis':{
'analyzer':{
'indexAnalyzer':{
'type':'custom',
'tokenizer':'standard',
'filter':['lowercase','mynGram']
},
'searchAnalyzer':{
'type':'custom',
'tokenizer':'standard',
'filter':['standard','lowercase','mynGram']
}
},
'filter':{
'mynGram':{
'type':'nGram',
'min_gram'2,
'max_gram':50
}
}
}}
My mapping:
{
'name':{
'type':'string',
'include_in_all':true,
'boost':5,
},
'city':{
'type':'string',
'include_in_all':true,
},
'someTextField':{
'type':'string',
'include_in_all':true,
},
'someOtherTextField':{
'type':'string',
'include_in_all':true,
}
}
My documents:
{
'name':'Wirtschaftsinformatik',
'city':'Hamburg',
'someTextField':'',
'someOtherTextField':''
}
{
'name':'Wirtschaftsinformatik',
'city':'Hamburg',
'someTextField':'A long text. Bla bla bla.',
'someOtherTextField':'Another long text. Bla bla bla.'
}