Elyzer: step by step custom analyzer debugging


(Doug Turnbull) #1

Elyzer is a handy tool I've made to debug custom analyzers. Elasticsearch lets you see the final output of an analyzer's output via _analyze. Unfortunately it doesn't let you pry into what happens at each step. Elyzer helps you do that:

doug$ elyzer --es http://localhost:9200 --index tmdb --analyzer english_bigrams --text "captain picard was cool"
TOKENIZER: standard
{1:captain} {2:picard}  {3:was} {4:cool}    
TOKEN_FILTER: standard
{1:captain} {2:picard}  {3:was} {4:cool}    
TOKEN_FILTER: lowercase
{1:captain} {2:picard}  {3:was} {4:cool}    
TOKEN_FILTER: porter_stem
{1:captain} {2:picard}  {3:wa}  {4:cool}    
TOKEN_FILTER: bigram_filter
{1:captain picard}  {2:picard wa}   {3:wa cool} 

In addition, please advocate for this pull request that is still open that would give detailed step-by-step analysis output from ES.

Cheers!
-Doug


(system) #2