Match every token position in the field when using synonyms


(Dany Gielow) #1

In my Elasticsearch index I have documents that have multiple tokens at the
same position.

I want to get a document back when I match at least one token at every
position.
The order of the tokens is not important. How can I accomplish that?
I use Elasticsearch 0.90.5.

Example:

I index a document like this.

{
    "field":"red car"
}

I use a synonym token filter that adds synonyms at the same positions as
the original token.
So now in the field, there are 2 positions:

  • Position 1: "red"
  • Position 2: "car", "automobile"

My solution for now:

To be able to ensure that all positions match, I index the maximum position
as well.

{
    "field":"red car",
    "max_position": 2
}

I have a custom similarity that extends from DefaultSimilarity and returns
1 tf(), idf() and lengthNorm(). The resulting score is the number of
matching terms in the field.

Query:

{
    "custom_score": {
        "query": {
             "match": {
                 "field": "a car is an automobile"
             }
        },
        "_script": "_score*100/doc[\"max_position\"]+_score"
    },
    "min_score":"100"
}

Enter code here...

Problem with my solution:
The above search should not match the document, because there is no token
"red" in the query string. But it matches, because Elasticsearch counts the
matches for car and automobile as two matches and that gives a score of 2
which leads to a script score of 102, which satisfies the "min_score".

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/77d52c69-8862-4e10-8036-470bf4ca8189%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Sebastian Briesemeister) #2

I am also very keen on answer!! If you find a solution, let me know!

Sebastian

On Thursday, 16 January 2014 15:12:23 UTC+1, Dany Gielow wrote:

In my Elasticsearch index I have documents that have multiple tokens at
the same position.

I want to get a document back when I match at least one token at every
position.
The order of the tokens is not important. How can I accomplish that?
I use Elasticsearch 0.90.5.

Example:

I index a document like this.

{
    "field":"red car"
}

I use a synonym token filter that adds synonyms at the same positions as
the original token.
So now in the field, there are 2 positions:

  • Position 1: "red"
  • Position 2: "car", "automobile"

My solution for now:

To be able to ensure that all positions match, I index the maximum
position as well.

{
    "field":"red car",
    "max_position": 2
}

I have a custom similarity that extends from DefaultSimilarity and returns
1 tf(), idf() and lengthNorm(). The resulting score is the number of
matching terms in the field.

Query:

{
    "custom_score": {
        "query": {
             "match": {
                 "field": "a car is an automobile"
             }
        },
        "_script": "_score*100/doc[\"max_position\"]+_score"
    },
    "min_score":"100"
}

Enter code here...

Problem with my solution:
The above search should not match the document, because there is no token
"red" in the query string. But it matches, because Elasticsearch counts the
matches for car and automobile as two matches and that gives a score of 2
which leads to a script score of 102, which satisfies the "min_score".

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ca3aaaaa-dffc-4714-8940-0278cf70a7cf%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3