Preventing phrase search from matching across sentence boundaries

Robin_Hughes · November 12, 2012, 3:13pm

Hi

I'd like to know if there is a way to configure analysis so periods and
commas result in a position increment. The purpose of this is to that
phrase queries will not match across sentence boundaries.

i.e. a span_term query with terms "one" and "two" with a slop of zero would
match a document containing "one two" but not one containing "one. two"

Regards

Robin

--

Chris_Male · November 12, 2012, 10:42pm

Hi Robin,

I can't think of an analysis component that does this out-of-box but it is
a requirement that comes up often. If you're comfortable creating a
TokenFilter yourself then you could write one that inflates the position
increment at whatever characters are of interest. Alternatively you could
break up your data before indexing it into Elasticsearch, so each sentence
or part of a sentence was a new value. Multiple values for a field are
indexed with large position increments in between them.

On Tuesday, November 13, 2012 4:13:28 AM UTC+13, Robin Hughes wrote:

Hi

I'd like to know if there is a way to configure analysis so periods and
commas result in a position increment. The purpose of this is to that
phrase queries will not match across sentence boundaries.

i.e. a span_term query with terms "one" and "two" with a slop of zero
would match a document containing "one two" but not one containing "one.
two"

Regards

Robin

--

Topic		Replies	Views
Analyzer for numbers with commas? Elasticsearch	0	80	May 7, 2024
Proximity searches - sentenses and paragraphs Elasticsearch	1	1045	July 5, 2017
Enable_position_increments not working for phrase queries with stopwords Elasticsearch	5	1311	July 27, 2019
Near by queries within the boundary of sentences Elasticsearch	2	592	July 6, 2017
Proximity phrase matching Elasticsearch	2	460	July 6, 2017

Preventing phrase search from matching across sentence boundaries

Related topics