Retrieving Analyzer Tokens


(stratawing) #1

Is there any way to return the actual tokens generated by an
index_analyzer at search time? I am using the synonyms filter
(without expansion) to index a particular field, and would like to get
the standardized value (right-hand-side of the "=>" in my synonyms
file ) for each document that was retrieved.

synonym file includes, for example:

i-pod, i pod => ipod,

when user searches for "i-pod" or "i pod", I want to return the
indexed doc, but also want to return "ipod" as the token that was
created at index time as part of the analysis.

I have my synonyms analyzer up and running, and can get the proper
tokens using the Analyze API.

Many thanks in advance for your thoughts and time.

Cheers!


(stratawing) #2

Here's a gist of the key pieces of my mapping, synonyms file, and a
note on my desired output - in case it helps.


(stratawing) #3

I've read a bit on Lucene tokens since yesterday, and it doesn't seem
possible to recover the tokens without using something like
http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/api/core/index.html?org/apache/lucene/index/IndexReader.html

Is this correct?


(stratawing) #4

Wow. Bummer. Zero responses.


(Shay Banon) #5

How about using the analyze API?

On Wednesday, March 14, 2012 at 2:03 AM, stratawing wrote:

Wow. Bummer. Zero responses.


(stratawing) #6

Thanks Shay! - I was hoping to get the actual doc returned together
with the tokens.
Not really mission critical. I'll fiddle around with the analyze api
a bit more to see if I can get the desired output.

On Mar 14, 8:27 am, Shay Banon kim...@gmail.com wrote:

How about using the analyze API?

On Wednesday, March 14, 2012 at 2:03 AM, stratawing wrote:

Wow. Bummer. Zero responses.


(system) #7