Problem with switching stemmers

Hi all,

I encountered a problem when switching stemmer token filters using
elasticsearch 0.90.2. A curl recreation can be found here:
https://gist.github.com/jl982/6149763.

To summarize, if I create an index using the default settings, insert a
single document that contains the word "drawing", add the kstem token
filter, and change it to porter_stem token filter, then I don't seem to be
able to find this document with the keyword "draw" or "drawing". Any ideas
why this would be the case?

Jianneng

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You must create the index with the character filters, token filters, and
analyzers you want before you put any documents into it.

In your case, the documents are already added, so adding your filters and
analysers after the fact is a no-op.

Brian

On Sunday, August 4, 2013 5:28:08 AM UTC-4, Jianneng Li wrote:

Hi all,

I encountered a problem when switching stemmer token filters using
elasticsearch 0.90.2. A curl recreation can be found here:
stemmer_change.sh · GitHub.

To summarize, if I create an index using the default settings, insert a
single document that contains the word "drawing", add the kstem token
filter, and change it to porter_stem token filter, then I don't seem to be
able to find this document with the keyword "draw" or "drawing". Any ideas
why this would be the case?

Jianneng

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Brian,

But if you look at Admin Indices Update
Settingshttp://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings/,
at the bottom of the page, there is a section on "updating index analysis".

In my case, if I skip the step of setting the kstem token filter and go
straight to porter_stem, then both "draw" and "drawing" return the
document. So apparently adding the stemmer does has an effect, but changing
the stemmer doesn't.

Thanks,

Jianneng

On Sun, Aug 4, 2013 at 10:20 AM, InquiringMind brian.from.fl@gmail.comwrote:

You must create the index with the character filters, token filters, and
analyzers you want before you put any documents into it.

In your case, the documents are already added, so adding your filters and
analysers after the fact is a no-op.

Brian

On Sunday, August 4, 2013 5:28:08 AM UTC-4, Jianneng Li wrote:

Hi all,

I encountered a problem when switching stemmer token filters using
elasticsearch 0.90.2. A curl recreation can be found here:
https://gist.github.com/**jl982/6149763https://gist.github.com/jl982/6149763
.

To summarize, if I create an index using the default settings, insert a
single document that contains the word "drawing", add the kstem token
filter, and change it to porter_stem token filter, then I don't seem to be
able to find this document with the keyword "draw" or "drawing". Any ideas
why this would be the case?

Jianneng

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/PsockacXcfk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jianneng,

I'm not an ES developer, and am just another user whose view is outside
looking in. The docs have gotten better, but my own experiences have been
necessary to really determine how it works.

That said, I haven't fully explored all the use cases of load + update
settings + reindex + query. When my initial experience was to create
default mappings on disk, I would update them and then try to reindex but
would see no effects. As I learned, that's because ES won't read or re-read
the settings unless the index is updated.

Then I moved on to creating my own simplified "schema" in JSON. I then used
the built-in Jackson stream parser to read it, and created the index
settings and mappings that ES wanted to see: Much more complicated and hard
to get right, but it did help me understand them much better.

And now I have a tool to let me change the mappings around on a whim, and
it's so easy that now, all I really ever do is update my schema, export the
current data (or just delete and reload for performance tests), and
re-create the index using the new settings and mappings that pop out of my
schema generator. In other words, using the typically flawed car analogy,
ES settings and mappings are a lot like a manual gearbox without
synchronizers. It works, but it takes time and patience to learn. My schema
is like an automatic transmission: Not as much fine-grained control, but
does nearly everything that's needed but is so simple and easy to use.

So in your case, if you add a reindex step for each existing document after
the update to the settings, maybe you'd find your queries work as you
expect. If not, I have some tools to explore this more in depth; and I hope
to find some spare time to try your scenario.

Brian

On Sunday, August 4, 2013 12:00:52 PM UTC-4, Jianneng Li wrote:

Hey Brian,

But if you look at Admin Indices Update Settingshttp://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings/,
at the bottom of the page, there is a section on "updating index analysis".

In my case, if I skip the step of setting the kstem token filter and go
straight to porter_stem, then both "draw" and "drawing" return the
document. So apparently adding the stemmer does has an effect, but changing
the stemmer doesn't.

Thanks,

Jianneng

On Sun, Aug 4, 2013 at 10:20 AM, InquiringMind <brian....@gmail.com<javascript:>

wrote:

You must create the index with the character filters, token filters, and
analyzers you want before you put any documents into it.

In your case, the documents are already added, so adding your filters and
analysers after the fact is a no-op.

Brian

On Sunday, August 4, 2013 5:28:08 AM UTC-4, Jianneng Li wrote:

Hi all,

I encountered a problem when switching stemmer token filters using
elasticsearch 0.90.2. A curl recreation can be found here:
https://gist.github.com/**jl982/6149763https://gist.github.com/jl982/6149763
.

To summarize, if I create an index using the default settings, insert a
single document that contains the word "drawing", add the kstem token
filter, and change it to porter_stem token filter, then I don't seem to be
able to find this document with the keyword "draw" or "drawing". Any ideas
why this would be the case?

Jianneng

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/PsockacXcfk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

As I learned, that's because ES won't read or re-read the settings unless
the index is created.

On Sunday, August 4, 2013 2:09:42 PM UTC-4, InquiringMind wrote:

That said, I haven't fully explored all the use cases of load + update
settings + reindex + query. When my initial experience was to create
default mappings on disk, I would update them and then try to reindex but
would see no effects. As I learned, that's because ES won't read or re-read
the settings unless the index is updated.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Brian,

Thanks for sharing your experience. By "add a reindex step", did you mean
deleting the old index and insert all documents into a new one?

Sure, please let me know if you get a chance to dig deeper into this issue.
The main reason I started this topic was that I was wondering if this is a
bug, or because I wasn't understanding the behavior correctly.

Jianneng

On Sun, Aug 4, 2013 at 2:11 PM, InquiringMind brian.from.fl@gmail.comwrote:

As I learned, that's because ES won't read or re-read the settings unless
the index is created.

On Sunday, August 4, 2013 2:09:42 PM UTC-4, InquiringMind wrote:

That said, I haven't fully explored all the use cases of load + update
settings + reindex + query. When my initial experience was to create
default mappings on disk, I would update them and then try to reindex but
would see no effects. As I learned, that's because ES won't read or re-read
the settings unless the index is updated.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/PsockacXcfk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.