Tokens not shown in analysis API is searchable

Hi ,

I have a index where i have applied html_strip character filter with
lowercase , stopword and snowball filter.
But then am able to search words like href or some words in the link
provided in the href.

The strange case is that , when i run the same feed against the analysis
API , am not seeing these tokens.
Content is the field where i have applied the aforementioned analyzer.

FEED - https://gist.github.com/3293411
on hitting ANALYSIS API - https://gist.github.com/3293432
Command used to create analyzer and index - https://gist.github.com/3293443
Output of TermList plugin - https://gist.github.com/3293448

Here list given by analysis API is different from termlist plugin.

In short html_strip is working while using the analysis API but not working
when actual indexing happens.

Thanks
Vineeth

These tokens are coming from the _all field
(Elasticsearch Platform — Find real-time answers at scale | Elastic) which
is indexed by default analyzer by default. As far as I can see from your
mapping, you are not overriding the default analyzer.

On Wednesday, August 8, 2012 4:37:24 AM UTC-4, Vineeth Mohan wrote:

Hi ,

I have a index where i have applied html_strip character filter with
lowercase , stopword and snowball filter.
But then am able to search words like href or some words in the link
provided in the href.

The strange case is that , when i run the same feed against the analysis
API , am not seeing these tokens.
Content is the field where i have applied the aforementioned analyzer.

FEED - gist:3293411 · GitHub
on hitting ANALYSIS API - gist:3293432 · GitHub
Command used to create analyzer and index -
gist:3293443 · GitHub
Output of TermList plugin - gist:3293448 · GitHub

Here list given by analysis API is different from termlist plugin.

In short html_strip is working while using the analysis API but not
working when actual indexing happens.

Thanks
Vineeth