Tokens not shown in analysis API is searchable


(vineeth mohan) #1

Hi ,

I have a index where i have applied html_strip character filter with
lowercase , stopword and snowball filter.
But then am able to search words like href or some words in the link
provided in the href.

The strange case is that , when i run the same feed against the analysis
API , am not seeing these tokens.
Content is the field where i have applied the aforementioned analyzer.

FEED - https://gist.github.com/3293411
on hitting ANALYSIS API - https://gist.github.com/3293432
Command used to create analyzer and index - https://gist.github.com/3293443
Output of TermList plugin - https://gist.github.com/3293448

Here list given by analysis API is different from termlist plugin.

In short html_strip is working while using the analysis API but not working
when actual indexing happens.

Thanks
Vineeth


(Igor Motov) #2

These tokens are coming from the _all field
(http://www.elasticsearch.org/guide/reference/mapping/all-field.html) which
is indexed by default analyzer by default. As far as I can see from your
mapping, you are not overriding the default analyzer.

On Wednesday, August 8, 2012 4:37:24 AM UTC-4, Vineeth Mohan wrote:

Hi ,

I have a index where i have applied html_strip character filter with
lowercase , stopword and snowball filter.
But then am able to search words like href or some words in the link
provided in the href.

The strange case is that , when i run the same feed against the analysis
API , am not seeing these tokens.
Content is the field where i have applied the aforementioned analyzer.

FEED - https://gist.github.com/3293411
on hitting ANALYSIS API - https://gist.github.com/3293432
Command used to create analyzer and index -
https://gist.github.com/3293443
Output of TermList plugin - https://gist.github.com/3293448

Here list given by analysis API is different from termlist plugin.

In short html_strip is working while using the analysis API but not
working when actual indexing happens.

Thanks
Vineeth


(system) #3