Multichar delimiter in path_hierarchy tokenizer


(damienclaveau) #1

Hi,
At the moment, the delimiter in a path_hierarchy tokenizer must be a single
char.

For example, something like this is not allowed :
"tokenizer": {
"arrow_path_tokenizer": {
"type": "path_hierarchy",
"delimiter": "-->"
}
}

Please, can some somebody give advice on which other tokenizer I could
configure
to obtain the same behaviour as the path_hierarchy tokenizer but with my
"-->" delimiter ?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/multichar-delimiter-in-path-hierarchy-tokenizer-tp4051191.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1394126756149-4051191.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


(Clinton Gormley) #2

You could fake it using the pattern_replace character filter:

curl -XPUT "http://localhost:9200/myindex" -d'
{
"settings": {
"analysis": {
"analyzer": {
"arrow": {
"tokenizer": "path_hierarchy",
"char_filter": [
"arrow_to_slash"
]
}
},
"char_filter": {
"arrow_to_slash": {
"type": "pattern_replace",
"pattern": "-->",
"replacement": "/"
}
}
}
}
}'

curl -XGET "http://localhost:9200/myindex/_analyze?analyzer=arrow" -d '
foo-->bar-->baz
'

{
"tokens": [
{
"token": "foo",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 1
},
{
"token": "foo/bar",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 1
},
{
"token": "foo/bar/baz",
"start_offset": 0,
"end_offset": 15,
"type": "word",
"position": 1
}
]
}

On 6 March 2014 18:25, damienclaveau damien.claveau@gmail.com wrote:

Hi,
At the moment, the delimiter in a path_hierarchy tokenizer must be a single
char.

For example, something like this is not allowed :
"tokenizer": {
"arrow_path_tokenizer": {
"type": "path_hierarchy",
"delimiter": "-->"
}
}

Please, can some somebody give advice on which other tokenizer I could
configure
to obtain the same behaviour as the path_hierarchy tokenizer but with my
"-->" delimiter ?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/multichar-delimiter-in-path-hierarchy-tokenizer-tp4051191.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1394126756149-4051191.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKQ91zvxp%3DrApq-fzMMg_mQjCa6zuGM1UcL6zCByPTEk4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3