Hi,
How do I get ES to ignore the term frequency since it is not releveant for
the type of document I'm using?
I'm using two ES types to handle two kind of data that need different
analyzers.
I'm trying to query the 2 types using a multi_match but I would like to
ignore the term frequency.
I tried using "index_options" : "docs" on my fields but I'm still getting
different scores depending on the term frequency.
Mapping:
curl -XPOST "localhost:9200/myindex" -d '
{
"settings":{
"index":{
"analysis":{
"filter" : {
"name_nGram" : {
"max_gram" : 100,
"min_gram" : 2,
"type" : "edge_ngram"
},
"strip_hydrid_sign_filter":{
"pattern":"\u00D7",
"replacement":"",
"type": "pattern_replace"
}
},
"analyzer":{
"name_index" : {
"filter" : [
"lowercase","asciifolding","name_nGram"
],
"tokenizer" : "keyword"
},
"full_name_index" : {
"filter" : [
"lowercase","asciifolding"
],
"tokenizer" : "keyword"
},
"scientificname_index" : {
"filter" : [
"lowercase","asciifolding","strip_hydrid_sign_filter","name_nGram"
],
"tokenizer" : "keyword"
},
"name_search" : {
"filter" : [
"lowercase","asciifolding"
],
"tokenizer" : "keyword"
},
"scientificname_search" : {
"filter" : [
"lowercase","asciifolding","strip_hydrid_sign_filter"
],
"tokenizer" : "keyword"
}
}
}
}
},
"mappings" : {
"taxon" : {
"properties" : {
"taxonname" : {
"type" : "multi_field",
"fields":{
"taxonname":{
"type" : "string",
"index_analyzer" : "full_name_index",
"search_analyzer" : "name_search",
"omit_norms" : true,
"index_options" : "docs"
},
"ngrams":{
"type" : "string",
"index_analyzer" : "scientificname_index",
"search_analyzer" : "scientificname_search",
"omit_norms" : true,
"index_options" : "docs"
}
}
}
}
},
"vernacular" : {
"properties" : {
"vernacularname" : {
"type" : "multi_field",
"fields":{
"vernacularname":{
"type" : "string",
"index_analyzer" : "full_name_index",
"search_analyzer" : "name_search",
"omit_norms" : true,
"index_options" : "docs"
},
"ngrams":{
"type" : "string",
"index_analyzer" : "name_index",
"search_analyzer" : "name_search",
"omit_norms" : true,
"index_options" : "docs"
}
}
}
}
}
}
}'
Data:
curl -XPUT "localhost:9200/myindex/taxon/1" -d '{
"taxonname":"Carex capitata"
}'
curl -XPUT "localhost:9200/myindex/taxon/2" -d '{
"taxonname":"Carex heleonastes"
}'
curl -XPUT "localhost:9200/myindex/taxon/3" -d '{
"taxonname":"Carex buckleyi"
}'
curl -XPUT "localhost:9200/myindex/vernacular/1" -d '{
"vernacularname":"carex de Richardson"
}'
curl -XPUT "localhost:9200/myindex/vernacular/2" -d '{
"vernacularname":"carex du lac Tahoe"
}'
Query:
curl
"localhost:9200/myindex/_search?search_type=dfs_query_then_fetch&pretty=1" -d
'{
"query":{
"bool":{
"should":[
{
"multi_match" : {
"query" : "carex",
"fields" : [ "taxonname", "taxonname.ngrams" ]
}
},
{
"multi_match" : {
"query" : "carex",
"fields" : ["vernacularname", "vernacularname.ngrams"]
}
}
]
}
}
}'
This would give a better score for vernacularname than taxonname since they
have different term frequency.
So, how can I ignore the term frequency so vernacularname and taxonname
would have the same score or, there is a better way to achieve that?
Thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.