Problem:
Recently I wanted to do a proximity search on elastic search index. I wanted to search all docs where ‘measles’ and ‘vaccin*’ were with 25 characters to each other. Plus I wanted both of them to be in order.
The elastic search proximity search wasn’t an option because of two reasons.
-
Proximity search doesn’t support wildcards. e.g (“measles vaccine”)~25 is supported but (“measles vacci*”)~25 or (“measle* vacci*”) is not supported.
-
Proximity search doesn’t check the respect the order of words in phrase e.g (“measles vaccine”)~25 and (“vaccine measles”)~25 will give same results.
Solution:
Few examples to resolve this issue using span_near
- (“measles vacci*”)~25
{
"query": {
"span_near": {
"clauses": [
{
"span_or": {
"clauses": [
{
"span_term": {
"text": "measles"
}
}
]
}
},
{
"span_or": {
"clauses": [
{
"span_multi": {
"match": {
"prefix": {
"text": {
"value": "vacci"
}
}
}
}
}
]
}
}
],
"slop": 25,
"in_order": "true”,
"collect_payloads": "true"
}
}
}
// in_order can be used to toggle between ordered or unordered.
-
“measle* vacci*”
{
"query": {
"span_near": {
"clauses": [
{
"span_or": {
"clauses": [
{
"span_multi": {
"match": {
"prefix": {
"text": {
"value": "measle"
}
}
}
}
}
]
}
},
{
"span_or": {
"clauses": [
{
"span_multi": {
"match": {
"prefix": {
"text": {
"value": "vacci"
}
}
}
}
}
]
}
}
],
"slop": 0,
"in_order": "true",
"collect_payloads": "true"
}
}
} -
Grouping. Now lets assume you want to find all docs where (canada OR toronto OR “North york”) NEAR (measles OR vaccin*). And they should be near to each other by 30 characters.
{
"query": {
"span_near": {
"clauses": [
{
"span_or": {
"clauses": [
{
"span_near": {
"clauses": [
{
"span_term": {
"text": "North"
}
},
{
"span_term": {
"text": "york"
}
}
],
"slop": 0,
"in_order": "true",
"collect_payloads": "true"
}
},
{
"span_term": {
"text": "toronto"
}
},
{
"span_term": {
"text": "canada"
}
}
]
}
},
{
"span_or": {
"clauses": [
{
"span_term": {
"text": "measles"
}
},
{
"span_multi": {
"match": {
"prefix": {
"text": {
"value": "vaccin"
}
}
}
}
}
]
}
}
],
"slop": 30,
"in_order": "false",
"collect_payloads": "true"
}
}
}
If any one knows better solution than this one please comment. Any suggestions how I can build a parser to take query from user e.g (quick AND near(foxes OR rats, toronto OR ontario, 30)) and convert that to elastic search span_near using above workaround. Boolean operators and parenthesis precedence is what I am finding hard to handle. Any open source PHP library which can help me change user written queries with parenthesis and boolean operator to ES filters.
@shay banon, @ steven @uri: any plans to have such operator support in the query_string
References:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-span-near-query.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-span-or-query.html