lasseschou
(lasseschou)
September 21, 2015, 7:32am
1
I want to create custom analyzers
for my solution. One of them needs to be very close to the standard analyzer, but handle dots (.) differently.
I've read the documentation on the built-in analyzers and how to create custom analyzers . But what I'm missing is the following:
An exact description of the Standard analyzer, including character filters, tokenizers and token filters.
An exact description of the Simple analyzer, including character filters, tokenizers and token filters.
Ideally, I'd like to see a complete PUT /my-index {"settings", ...} call (like in the custom analyzers doc).
Thanks!
n0othing
(Robbie Ogburn)
September 22, 2015, 8:12pm
2
Standard Analyzer - built using the Standard Tokenizer with the Standard Token Filter, Lower Case Token Filter, and Stop Token Filter. I could recreate it like so:
{
"type": "custom",
"tokenizer": "standard",
"filter": [ "standard", "lowercase", "stop" ]
}
Simple Analyzer - built using a Lower Case Tokenizer. I could recreate it like so:
{
"type": "custom",
"tokenizer": "lowercase"
}
Putting it into a complete example:
PUT /my-index
{
"settings": {
"analysis": {
"analyzer": {
"custom_standard": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"stop"
]
},
"custom_simple": {
"type": "custom",
"tokenizer": "lowercase"
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"field1": {
"type": "string",
"analyzer": "custom_standard"
},
"field2": {
"type": "string",
"analyzer": "custom_simple"
}
}
}
}
}
1 Like
lasseschou
(lasseschou)
September 23, 2015, 6:28am
3
Thanks so much, very helpful.
I want to create a clone of the standard analyzer, the only difference
being that it tokenizes words with '.' inside. Example:
www.test.com
Should be tokenized into www, test and com.
Can you help me create the mapping code for that? Thanks!
Lasse
Den tirsdag den 22. september 2015 skrev Robbie Ogburn <
noreply@discuss.elastic.co>: