However when I try run my application that then indexes my data, I get the
following error:
Caused by IllegalArgumentException: enablePositionIncrements=false is notsupported anymore
as of Lucene 4.4 as it can create broken token streams
->> 40 | checkPositionIncrement in org.apache.lucene.analysis.
util.FilteringTokenFilter
Indeed, checking the 4.4 API docs, setEnablePositionIncrements() is
deprecated so my question is:
How do I get rid of underscores '_' in a shingle filter without this
The problem is that the default is set to true, and with it set to true, my
shingle filter results include underscores because of the stop filter in
use, which I don't want. Traditionally the way to get rid of this was to
set enablePositionIncrements to false in the stop filter. This is no longer
possible, hence my predicament.
On Wednesday, 4 September 2013 14:49:44 UTC+2, Jörg Prante wrote:
Drop enable_position_increments parameter or set it to true.
In shingle filters, you should set min_shingle_size to 2.
среда, 4 сентября 2013 г., 17:16:47 UTC+3 пользователь Jondow написал:
Hi Jörg,
The problem is that the default is set to true, and with it set to true,
my shingle filter results include underscores because of the stop filter in
use, which I don't want. Traditionally the way to get rid of this was to
set enablePositionIncrements to false in the stop filter. This is no longer
possible, hence my predicament.
On Wednesday, 4 September 2013 14:49:44 UTC+2, Jörg Prante wrote:
Drop enable_position_increments parameter or set it to true.
In shingle filters, you should set min_shingle_size to 2.
the old disabling of position increments was bogus.
for example a stop filter could remove a token and "move" a synonym
from one word to another.
so this option conflated two unrelated things: whether or not a "gap"
should be introduced when a word is removed, and whether any existing
positions (e.g. from synonyms) should be respected.
in my opinion (but i have not thought it over in a while, look at the
issue age) its possible to prevent the introduction of gaps while
still respecting existing ones: https://issues.apache.org/jira/browse/LUCENE-4065
On Wed, Dec 18, 2013 at 11:54 PM, Michael Cheremuhin micherr@gmail.com wrote:
Hi Jondow,
Is there any progress on the issue?
среда, 4 сентября 2013 г., 17:16:47 UTC+3 пользователь Jondow написал:
Hi Jörg,
The problem is that the default is set to true, and with it set to true,
my shingle filter results include underscores because of the stop filter in
use, which I don't want. Traditionally the way to get rid of this was to set
enablePositionIncrements to false in the stop filter. This is no longer
possible, hence my predicament.
On Wednesday, 4 September 2013 14:49:44 UTC+2, Jörg Prante wrote:
Drop enable_position_increments parameter or set it to true.
In shingle filters, you should set min_shingle_size to 2.
There is a 'fillter_token' that is configurable in the shingle token filter. The default is "_". I believe we can change it to "" so that it's empty if it encounters any stop words. Not sure if this is the best practice though.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.