nacho
May 16, 2015, 10:22pm
1
When analyzing alpha 1a beta
, I want the outcome of tokens to be [alpha 1 a beta]
. Why does myAnalyzer
not do the trick?
POST myindex
{
"settings" : {
"analysis" : {
"analyzer" : {
"myAnalyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : [ "split_on_numerics" ]
}
},
"filter" : {
"split_on_numerics" : {
"type" : "word_delimiter",
"split_on_numerics" : true,
"split_on_case_change" : false,
"generate_word_parts" : false,
"generate_number_parts" : false,
"catenate_all" : false
}
}
}
}
}
Now when I run
GET /myindex/_analyze?analyzer=myAnalyzer&text=alpha 1a beta
no tokens are returned. Again, why?
curl -XPUT 'http://localhost:9200/myindex/?pretty' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"myAnalyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : [ "split_on_numerics" ]
}
},
"filter" : {
"split_on_numerics" : {
"type" : "word_delimiter",
"split_on_numerics" : true,
"split_on_case_change" : false,
"generate_word_parts" : true,
"generate_number_parts" : true,
"catenate_all" : false
}
}
}
}
}'
curl -XGET 'localhost:9200/myindex/_analyze?pretty&analyzer=myAnalyzer' -d 'alpha 1a beta'
{
"tokens" : [ {
"token" : "alpha",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "1",
"start_offset" : 6,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "a",
"start_offset" : 7,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "beta",
"start_offset" : 9,
"end_offset" : 13,
"type" : "<ALPHANUM>",
"position" : 4
} ]
}
1 Like