Hi,
I'm trying to create a search engine over users that searches over people's
names and other metadata like where they work (so a user could query "tim
cook apple"). I have a mapping schema (pasted below) with a first_name,
last_name, and affils column (which will usually be a list of affiliations
like "apple"). Each field is indexed with the full tokens as well as an
additional "partial" field that has ngrams.
To query against it, I was originally using a bool/should that hit each of
the fields (like in
http://elasticsearch-users.115913.n3.nabble.com/help-needed-with-the-query-td3177477.html#a3178856).
The issue is that if someone was named "Tim Cook" and worked at "Tim Cook
Design" they would come up to the top, even for "Tim Cook Apple". In our
case, we really only want to count each token to the score once so that Tim
Cook at Tim Cook Design scores as well as Tim Cook at Apple for the query
"Tim Cook." I switched to a query_string query (pasted below) which does
better but still is giving more weight to those cases (now "Tim Cook Apple"
works, but "Tim Cook" still gives more weight to the one at Tim Cook
Deisgn) .
So, how can I customize the scoring of query_search (or another multi field
query) to only let a token contribue to the score in one field?
Thanks! I'm pretty new to elasticsearch/lucene, so sorry if this is obvious.
===========
Mapping setup:
curl -XPUT 'http://127.0.0.1:9200/people_search/?pretty=1' -d '
{
"mappings" : {
"person" : {
"properties" : {
"_id" : {
"type" : "integer",
"index" : "no"
},
"last_name" : {
"fields" : {
"partial" : {
"search_analyzer" : "full_name",
"index_analyzer" : "partial_name",
"type" : "string"
},
"last_name" : {
"type" : "string",
"analyzer" : "full_name"
}
},
"type" : "multi_field"
},
"first_name" : {
"fields" : {
"partial" : {
"search_analyzer" : "partial_name_search",
"index_analyzer" : "partial_name",
"type" : "string"
},
"first_name" : {
"type" : "string",
"analyzer" : "full_name"
}
},
"type" : "multi_field"
},
"affils" : {
"fields" : {
"partial" : {
"search_analyzer" : "full_name",
"index_analyzer" : "partial_name",
"type" : "string"
},
"names" : {
"type" : "string",
"analyzer" : "full_name"
}
},
"type" : "multi_field"
}
}
}
},
"settings" : {
"analysis" : {
"filter" : {
"name_ngrams" : {
"side" : "front",
"max_gram" : 10,
"min_gram" : 2,
"type" : "edgeNGram"
},
"name_ngrams_search" : {
"side" : "front",
"max_gram" : 10,
"min_gram" : 2,
"type" : "edgeNGram"
}
},
"analyzer" : {
"full_name" : {
"filter" : [
"standard",
"lowercase",
"asciifolding"
],
"type" : "custom",
"tokenizer" : "standard"
},
"partial_name" : {
"filter" : [
"standard",
"lowercase",
"asciifolding",
"name_ngrams"
],
"type" : "custom",
"tokenizer" : "standard"
},
"partial_name_search" : {
"filter" : [
"standard",
"lowercase",
"asciifolding",
"name_ngrams_search"
],
"type" : "custom",
"tokenizer" : "standard"
}
}
}
}
}
'
=============
Insert some data
curl -XPOST 'http://127.0.0.1:9200/_bulk?pretty=1' -d '
{"index" : {"_index" : "people_search", "_type" : "person", "_id" : 1}}
{"_id" : 1, "last_name" : "Cook", "first_name" : "Tim", "affils":["Apple"]}
{"index" : {"_index" : "people_search", "_type" : "person", "_id" : 2}}
{"_id" : 2, "last_name" : "Cook", "first_name" : "Tim", "affils":["Tim Cook
Design", "Random co"]}
'
===============
Query:
curl -XPOST 'people_search/person/_search?search_type=dfs_query_then_fetch'
-d"
{'query': {'query_string': {'fields': ['first_name.partial',
'first_name.first_name^1.5',
'last_name.partial',
'last_name.last_name^1.5',
'affils.partial',
'affils.names^1.5'],
'query': 'Tim Cook',
'use_dis_max': True}}}
"
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.