Preventing phrase search from matching across sentence boundaries

Hi

I'd like to know if there is a way to configure analysis so periods and
commas result in a position increment. The purpose of this is to that
phrase queries will not match across sentence boundaries.

i.e. a span_term query with terms "one" and "two" with a slop of zero would
match a document containing "one two" but not one containing "one. two"

Regards

Robin

--

Hi Robin,

I can't think of an analysis component that does this out-of-box but it is
a requirement that comes up often. If you're comfortable creating a
TokenFilter yourself then you could write one that inflates the position
increment at whatever characters are of interest. Alternatively you could
break up your data before indexing it into Elasticsearch, so each sentence
or part of a sentence was a new value. Multiple values for a field are
indexed with large position increments in between them.

On Tuesday, November 13, 2012 4:13:28 AM UTC+13, Robin Hughes wrote:

Hi

I'd like to know if there is a way to configure analysis so periods and
commas result in a position increment. The purpose of this is to that
phrase queries will not match across sentence boundaries.

i.e. a span_term query with terms "one" and "two" with a slop of zero
would match a document containing "one two" but not one containing "one.
two"

Regards

Robin

--