I'd like to know if there is a way to configure analysis so periods and
commas result in a position increment. The purpose of this is to that
phrase queries will not match across sentence boundaries.
i.e. a span_term query with terms "one" and "two" with a slop of zero would
match a document containing "one two" but not one containing "one. two"
I can't think of an analysis component that does this out-of-box but it is
a requirement that comes up often. If you're comfortable creating a
TokenFilter yourself then you could write one that inflates the position
increment at whatever characters are of interest. Alternatively you could
break up your data before indexing it into Elasticsearch, so each sentence
or part of a sentence was a new value. Multiple values for a field are
indexed with large position increments in between them.
On Tuesday, November 13, 2012 4:13:28 AM UTC+13, Robin Hughes wrote:
Hi
I'd like to know if there is a way to configure analysis so periods and
commas result in a position increment. The purpose of this is to that
phrase queries will not match across sentence boundaries.
i.e. a span_term query with terms "one" and "two" with a slop of zero
would match a document containing "one two" but not one containing "one.
two"
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.