Hello,
so far the default settings for analyzers (i.e. they're untouched) worked for me in the general case but starts to show it's shortcomings.
I want to tune certain aspects, e.g. I want to split words on dots which aren't by default:
GET _analyze?text=foo.bar.baz
{
"tokens": [
{
"token": "foo.bar.baz",
"start_offset": 0,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
}
]
}
I noticed I get this output also when providing an explicit analyzer
, e.g. &analyzer=default
or &analyzer=standard
yields the same result.
Now, in general it's not problem: the documentation about this is OK and I've experience with it.
Basically, the defaults are fine and I just want to change them "a bit". As such, I want to replicate their current settings of the default/(standard?) analyzer exactly as they are and only adjust the parts I want to change.
I'm looking at Standard Analyzer | Elasticsearch Guide [1.5] | Elastic and I think this might be want I want => but I'm unable to derive what the actual definition of the analyzer is so I could build my custom
one.
Or is it possible tune default/standard analyzer? The docs above say e.g. for the "Standard Token Filter":
The standard token filter currently does nothing. It remains as a placeholder in case some filtering function needs to be added in a future version
Does "future" mean I can customize it?
I'm totally fine writing a complete custom
analyzer as long as I somehow can verify that the index/search definitions match exactly what is active by default plus the things I want to add. This is important for me as I don't want to mess things up by changing the way things are analyzed in unexpected ways without me noticing it.
thanks for any pointers,
- Markus
PS: I'm still using 1.5, just waiting for the big 5 release to upgrade