Percolate: Document fields concatenating; causing unexpected matches

This is a re-post of a bug I raised at github about a month agohttps://github.com/elasticsearch/elasticsearch/issues/2476,
but I haven't heard back about it.

To recreate the issue (against ES 0.20.0 RC1), please see this bash script:

Basically a document with field1: A and field2: B will match against a
percolator for the exact phrase "A B" even though that exact phrase does
not occur in either field, but only when the fields are concatenated.

Just wondering if a fix for this is on any road map. Thanks.

--

On Sun, 2013-01-06 at 17:22 -0800, Laser Jesus wrote:

This is a re-post of a bug I raised at github about a month ago, but I
haven't heard back about it.

To recreate the issue (against ES 0.20.0 RC1), please see this bash
script: Bash script to recreate a bug in Elastic Search whereby separate document fields are merged together to create word combinations that don't actually exist in any individual field · GitHub

Basically a document with field1: A and field2: B will match against a
percolator for the exact phrase "A B" even though that exact phrase
does not occur in either field, but only when the fields are
concatenated.

The reason for this is that the values from field1 and field2 are being
concatenated and indexed in the _all field, which is what you are
querying.

You have two choices:

  1. query field1 and field2 separately, instead of the _all field
  2. set the position_offset_gap of the _all field to (eg) 100

clint

--