Tokenizer splits field values

Daniel_Ford · September 10, 2013, 1:33pm

The problem I currently have in Kibana 3 is as follows…

When I search for @fields.Doctype Kibana 3 displays results as follows.

Research
Commentary
Product
Document
Idea

The values in the Doctype field are..

Research Document
Product Document
Product Idea
Commentary Document

I am using the "KV" filter in Logstash to extract fields from the cs_uri_query and there may be new fields I cannot account for each day.
I have looked at the following.

What are the implications of not tokenising fields and how would I only tokenise some fields and exclude all other from being tokenised?

For example, if I do not tokenise the Doctype field, when I search for "Product" will it return nothing or both "Product Document" and "Product Idea"?

I am new to ES and LS so if there is a better way of getting to my end result then please suggest it

Thanks,

Dan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

javanna · September 10, 2013, 4:48pm

You can change the way documents are indexed updating the mapping through
the put mapping api. Elasticsearch tries to be smart and doesn't require
you to have an explicit mapping, applying sensible defaults. If you are
using logstash it is probably better to use index templates, since it uses
time based indexing.

If you don't tokenize at all a field, you are not going to find matches
when querying for single words anymore. For instance if you index "Research
Document" without tokenizing it, you can only find matches for that
document searching for the whole string containing both the words. I would
have a look at the multi_fieldhttp://www.elasticsearch.org/guide/reference/mapping/multi-field-type/ type,
which allows to index the same field in different ways, so that you have
different variations, for instance a tokenized version that's good for
search requests, and a non tokenized one that's good for faceting.

Cheers
Luca

On Tuesday, September 10, 2013 3:33:16 PM UTC+2, Daniel Ford wrote:

The problem I currently have in Kibana 3 is as follows…

When I search for @fields.Doctype Kibana 3 displays results as follows.

Research
Commentary
Product
Document
Idea

The values in the Doctype field are..

Research Document
Product Document
Product Idea
Commentary Document

I am using the "KV" filter in Logstash to extract fields from the
cs_uri_query and there may be new fields I cannot account for each day.
I have looked at the following.

Add information on configuring elasticsearch to avoid tokenizing fields · bpaquet/node-logstash@f9a0191 · GitHub

What are the implications of not tokenising fields and how would I only
tokenise some fields and exclude all other from being tokenised?

For example, if I do not tokenise the Doctype field, when I search for
"Product" will it return nothing or both "Product Document" and "Product
Idea"?

I am new to ES and LS so if there is a better way of getting to my end
result then please suggest it

Thanks,

Dan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Custom tokenizer or analyzer? Elasticsearch	6	386	July 6, 2017
Searching from the Kibana UI doesn't include field tokens Elasticsearch	1	354	July 6, 2017
How do I build a query such that each token in a document field is matched? Elasticsearch	12	2017	July 6, 2017
Unwanted tokenising of apache log fields: referer and request Elasticsearch	1	391	July 6, 2017
It is possibile don't token word with elasticsearch? Elasticsearch	3	383	July 6, 2017

Tokenizer splits field values

Related topics