Elyzer is a handy tool I've made to debug custom analyzers. Elasticsearch lets you see the final output of an analyzer's output via _analyze
. Unfortunately it doesn't let you pry into what happens at each step. Elyzer helps you do that:
doug$ elyzer --es http://localhost:9200 --index tmdb --analyzer english_bigrams --text "captain picard was cool"
TOKENIZER: standard
{1:captain} {2:picard} {3:was} {4:cool}
TOKEN_FILTER: standard
{1:captain} {2:picard} {3:was} {4:cool}
TOKEN_FILTER: lowercase
{1:captain} {2:picard} {3:was} {4:cool}
TOKEN_FILTER: porter_stem
{1:captain} {2:picard} {3:wa} {4:cool}
TOKEN_FILTER: bigram_filter
{1:captain picard} {2:picard wa} {3:wa cool}
In addition, please advocate for this pull request that is still open that would give detailed step-by-step analysis output from ES.
Cheers!
-Doug