Hi,
I am pretty new to elasticsearch and I'm facing a problem I can't figure
out.
I'm using logstash to store log files to elasticsearch following a specific
format. Each log line includes an URL, and some other elements that are
translated into fields inside elasticsearch databases.
The storing process seems to work pretty well and I am able to browse the
data like I want.
The problem is related to the way some fields are parsed when I come to try
to analyze the data and more particularly related to the delimiters that
are used to split the tokens.
One of the fields (named 'category') I want to analyze is composed of
several parts separated by special characters, such as '|' and the actual
token sometimes contain '-' characters. example : "category1|cat-egory2".
The first one should stay a delimiter but the dash is a problem as it is
part of some of the category names.
I've read some documentation about token delimiter (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html
and
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html)
and tried to apply the instructions. So, before creating any index, I tried
to request elasticsearch to change the pattern of delimiters by putting my
own regular expression ( "pattern":"|\\s+" ), like in the whitespace
example, not very different from the one in the example, I'm pretty sure
the pattern is correct.
Here is the kind of request I am performing after the PUT request was made:
{
"query": {
"match_all": {}
},
"facets": {
"category name": {
"terms": {
"field": "category"
}
}
}
}
The response reports the number of occurrences of each 'category' field, by
splitting the tokens into different parts. But the tokens split are not
following the pattern I entered in the whitespace tokenizer.
Instead I get statistics that are not reflecting the actual data because of
the default comportment of elasticsearch.
I would like to know what I'm doing wrong and that's why I'm asking for
your help.
Regards
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c16bc1b-89ff-4057-91f1-1d3cb4edeaf6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.