How to know details about how a document is indexed?

Hi,

I would like to know how a given document has been analyzed by ES / Lucene
for debug needs.

Here is my story :

I define a setting (new analyzer) when creating my index. My analyzer is a
lowercase with keyword (aka key_lowercase)

I push document into this index. There is a field in it (prop1.prop2.prop3)
with the value "term1 term2 term3"

When I analyze this value with the _analyze API, I can see that there is
only one token "term1 term2 term3". That's what I am waiting for.

I try to search on this field with a wildcard like "term1*term3". No
results.

If I try to search for "term1", I got a result. So it seems that my Document
field has been tokenized into 3 tokens.

So, that's why I would like to know if there is a way to see what ES/Lucene
have done with my document.

Any idea ? Changing the log level ?

Thanks for any help

David.

There isn't a way to do it, aside from getting the mapping back and making sure its using the analyzer you use. Its not logged... .

On Wednesday, June 15, 2011 at 7:34 PM, David Pilato wrote:

Hi,

I would like to know how a given document has been analyzed by ES / Lucene for debug needs.

Here is my story :
I define a setting (new analyzer) when creating my index. My analyzer is a lowercase with keyword (aka key_lowercase)
I push document into this index. There is a field in it (prop1.prop2.prop3) with the value “term1 term2 term3”
When I analyze this value with the _analyze API, I can see that there is only one token “term1 term2 term3”. That’s what I am waiting for.
I try to search on this field with a wildcard like “term1*term3”. No results.
If I try to search for “term1”, I got a result. So it seems that my Document field has been tokenized into 3 tokens.

So, that’s why I would like to know if there is a way to see what ES/Lucene have done with my document…

Any idea ? Changing the log level ?

Thanks for any help
David.

Thanks Shay.

As far as I can see, my mapping seems to be ok.

It seems that my problem doesn’t occur on first level properties of my document (such as prop1) but occur on sublevel properties (such as prop1.prop2.prop3).

I will try to make new tests tomorrow and gist it.

Cheers

De : Shay Banon [mailto:shay.banon@elasticsearch.com]
Envoyé : mercredi 15 juin 2011 18:48
À : users@elasticsearch.com
Objet : Re: How to know details about how a document is indexed ?

There isn't a way to do it, aside from getting the mapping back and making sure its using the analyzer you use. Its not logged... .

Ok. Problem is solved now.

Just for sharing with others.

An old json mapping file was in my config/mappings dir.

So when I put mapping with ES API, my new mapping is not "active".

But, it seems that when I ask for the current mapping with ES API (curl -X GET http://localhost:9200/index/doctype/_mapping), ES seems to return the mapping I tried to put and not the active mapping.

I write "seems to" because I did not make any deeper tests as I found my issue.
I will try to reproduce it in the next days and if the problem occurs again, I will gist it.

Thanks for reading :wink:
David

When you ask for the mapping of a type, elasticsearch will return the mapping that is exactly used for it (the one built), it won't return something that its not using...

On Thursday, June 16, 2011 at 9:58 PM, dadoonet wrote:

Ok. Problem is solved now.

Just for sharing with others.

An old json mapping file was in my config/mappings dir.

So when I put mapping with ES API, my new mapping is not "active".

But, it seems that when I ask for the current mapping with ES API (curl -X
GET http://localhost:9200/index/doctype/_mapping), ES seems to return the
mapping I tried to put and not the active mapping.

I write "seems to" because I did not make any deeper tests as I found my
issue.
I will try to reproduce it in the next days and if the problem occurs again,
I will gist it.

Thanks for reading :wink:
David

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-know-details-about-how-a-document-is-indexed-tp3068090p3073323.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com (http://Nabble.com).