english tokenizer does not exist.
english analyzer uses a standard tokenizer.
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr
Le 4 décembre 2013 at 10:59:08, Sasha Ostrikov (alexander.ostrikov@gmail.com) a écrit:
Sure, so this is my configuration (using Sense plugin for chrome):
POST _template/temp1
{
"template": "",
"order": "5",
"settings": {
"index": {
"analysis": {
"filter": {
"word_delimiter_filter": {
"type": "word_delimiter",
"generate_word_parts": false,
"catenate_words": true,
"split_on_numerics": false,
"preserve_original": true,
"type_table": [
"# => ALPHA",
"@ => ALPHA",
"% => ALPHA",
"$ => ALPHA",
"% => ALPHA"
]
},
"stop_english": {
"type": "stop",
"stopwords": [
"english"
]
},
"english_stemmer": {
"type": "stemmer",
"name": "english"
}
},
"analyzer": {
"english": { //HERE I'M TRYING TO OVERRIDE THE BUILT IN english ANALYZER
"filter": [
"word_delimiter_filter"
]
},
"english2": { //HERE I'M TRYING TO CONFIG MY OWN english ANALYZER THAT WOULD BEHAVE LIKE THE BUILT IN
"type": "custom",
"tokenizer": "english",
"filter": [
"lowercase",
"word_delimiter_filter",
"english_stemmer",
"stop_english"
]
}
}
}
}
},
"mappings": {
"default": {
"dynamic_templates": [
{
"template_textEnglish": {
"match": "text.English.",
"mapping": {
"type": "string",
"store": "yes",
"index": "analyzed",
"analyzer": "english",
"term_vector": "with_positions_offsets"
}
}
},
{
"template_textEnglish": {
"match": "text.English2.*",
"mapping": {
"type": "string",
"store": "yes",
"index": "analyzed",
"analyzer": "english2",
"term_vector": "with_positions_offsets"
}
}
}
]
}
}
}
and this is the error I get trying to create a new index:
{
"error": "IndexCreationException[[test1] failed to create index]; nested: ElasticSearchIllegalArgumentException[failed to find analyzer type [null] or tokenizer for [english]]; ",
"status": 400
}
On Tuesday, December 3, 2013 10:42:41 PM UTC+2, Kurt Hurtado wrote:
Hi Sasha,
Would you mind posting the full curl commands or some other representation of the settings and mappings you're creating?
Thanks!
On Tuesday, December 3, 2013 12:05:21 PM UTC-8, Sasha Ostrikov wrote:
Hello friends,
I'm trying to preserve specific characters during tokenization using word_delimiter filter by defining the type_table (as described in http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html).
Actually my idea is to override the built-in English analyzer by including custom configured "word_delimiter" ("type_table": ["# => ALPHA", "@ => ALPHA"]) filter, but I cannot find any way to do it.
I also tried to create a custom english analyzer but still getting next problems:
- I don't actually know the default settings of the built-in english analyzer (But I really want to preserve it)
- While trying to set "tokenizer": english getting an error on creating index, saying that english tokenizer is not found.
I'm using 0.90.5 ES
Hope for your kind help!
Sasha
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8cbc14ec-abf9-48b0-84f5-c4f3b9d1060e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.529f03c5.2901d82.bd3d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.