Tokenizer splits field values


(Daniel Ford) #1

The problem I currently have in Kibana 3 is as follows…

When I search for @fields.Doctype Kibana 3 displays results as follows.

Research
Commentary
Product
Document
Idea

The values in the Doctype field are..

Research Document
Product Document
Product Idea
Commentary Document

I am using the "KV" filter in Logstash to extract fields from the cs_uri_query and there may be new fields I cannot account for each day.
I have looked at the following.

What are the implications of not tokenising fields and how would I only tokenise some fields and exclude all other from being tokenised?

For example, if I do not tokenise the Doctype field, when I search for "Product" will it return nothing or both "Product Document" and "Product Idea"?

I am new to ES and LS so if there is a better way of getting to my end result then please suggest it

Thanks,

Dan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Luca Cavanna) #2

You can change the way documents are indexed updating the mapping through
the put mapping api. Elasticsearch tries to be smart and doesn't require
you to have an explicit mapping, applying sensible defaults. If you are
using logstash it is probably better to use index templates, since it uses
time based indexing.

If you don't tokenize at all a field, you are not going to find matches
when querying for single words anymore. For instance if you index "Research
Document" without tokenizing it, you can only find matches for that
document searching for the whole string containing both the words. I would
have a look at the multi_fieldhttp://www.elasticsearch.org/guide/reference/mapping/multi-field-type/ type,
which allows to index the same field in different ways, so that you have
different variations, for instance a tokenized version that's good for
search requests, and a non tokenized one that's good for faceting.

Cheers
Luca

On Tuesday, September 10, 2013 3:33:16 PM UTC+2, Daniel Ford wrote:

The problem I currently have in Kibana 3 is as follows…

When I search for @fields.Doctype Kibana 3 displays results as follows.

Research
Commentary
Product
Document
Idea

The values in the Doctype field are..

Research Document
Product Document
Product Idea
Commentary Document

I am using the "KV" filter in Logstash to extract fields from the
cs_uri_query and there may be new fields I cannot account for each day.
I have looked at the following.

https://github.com/bpaquet/node-logstash/commit/f9a019157f69d8e8ab3d80f6d3b8b77587ee9d05

What are the implications of not tokenising fields and how would I only
tokenise some fields and exclude all other from being tokenised?

For example, if I do not tokenise the Doctype field, when I search for
"Product" will it return nothing or both "Product Document" and "Product
Idea"?

I am new to ES and LS so if there is a better way of getting to my end
result then please suggest it

Thanks,

Dan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3