Hi all,
I'm testing my app on elasticsearch-0.17.0-SNAPSHOT (built from
563ad625c0f69f3ff0f4c39f46421b1dc2c91b6f) and in my app I'm doing a
term facet on a field. I noticed a difference in behavior. If the
field contains "foo_bar", in 0.16 it would be tokenized as 2 tokens
["foo", "bar"], but in 0.17 it remains a single token ["foo_bar"]. I
have absolutely zero configuration change on my ES instance, it's a
complete vanilla install from the commit above. My mapping is created
dynamically without me specifying anything about it.
Hence my question: Did ES / Lucene start tokenizing fields differently?
Are you sure that in 0.16 it gets tokenized into 2 tokens? I ran the following on 0.15.2, 0.16.0 (where some analysis behavior changed when upgrading to Lucene 0.16.0), and master, and in all of them, it tokenizes into a single token (using the default, standard analyzer).
On Wednesday, May 11, 2011 at 8:03 PM, tsuna wrote:
Hi all,
I'm testing my app on elasticsearch-0.17.0-SNAPSHOT (built from
563ad625c0f69f3ff0f4c39f46421b1dc2c91b6f) and in my app I'm doing a
term facet on a field. I noticed a difference in behavior. If the
field contains "foo_bar", in 0.16 it would be tokenized as 2 tokens
["foo", "bar"], but in 0.17 it remains a single token ["foo_bar"]. I
have absolutely zero configuration change on my ES instance, it's a
complete vanilla install from the commit above. My mapping is created
dynamically without me specifying anything about it.
Hence my question: Did ES / Lucene start tokenizing fields differently?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.