How to make Kibana Terms panel to count complete string frequencies, rather than separate words frequencies?

I have documents in ES with the field "Message", which normally represents
some multi word text string. Trying to query it with Kibana to see which
strings are in this property most frequently. What I actually get back is
the table which shows frequency of the specific words, but not the whole
strings!

Now that I started to understand *something *about ES, my guess is that I
supposed to map that "Message" field as { "type": "string", "index":
"not_analyzed" }, so it is not split into words. But on the other hand I
still want to be able to find documents by searching for some words from
their message fields.

Next thought - multi_field "mapping":
{
"type" : "string",
"fields": {
"raw": { "type": "string", "index":
"not_analyzed" }
}
}

So that for normal query analysed Message field would work and when I build
my Terms panel I use Message.raw instead.

I need a confirmation that I'm moving in the right direction and this is
optimal and intended way to achieve the goal. It does not look so elegant,
that's why I'm asking. May be I miss some other ways to search string field
using separate words, but still treat it as a whole for the purpose of
counting. Please advise!
Konstantin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/575eaefb-6be1-4a3a-b015-0051db56587f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I use multi fields to have several different analysis types supported as
need and also to have the raw version available like in your example.

On Monday, October 6, 2014 8:34:34 PM UTC-5, Konstantin Erman wrote:

I have documents in ES with the field "Message", which normally represents
some multi word text string. Trying to query it with Kibana to see which
strings are in this property most frequently. What I actually get back is
the table which shows frequency of the specific words, but not the
whole strings!

Now that I started to understand *something *about ES, my guess is that I
supposed to map that "Message" field as { "type": "string", "index":
"not_analyzed" }, so it is not split into words. But on the other hand I
still want to be able to find documents by searching for some words from
their message fields.

Next thought - multi_field "mapping":
{
"type" : "string",
"fields": {
"raw": { "type": "string", "index":
"not_analyzed" }
}
}

So that for normal query analysed Message field would work and when I
build my Terms panel I use Message.raw instead.

I need a confirmation that I'm moving in the right direction and this is
optimal and intended way to achieve the goal. It does not look so elegant,
that's why I'm asking. May be I miss some other ways to search string field
using separate words, but still treat it as a whole for the purpose of
counting. Please advise!
Konstantin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3641a396-0367-4cd3-8be5-6a1a110eefae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Does not it cause substantial inflation in the amount of data to be processed and stored at indexing time?

As with most logs aggregation systems indexing is many orders of magnitude more frequent operation than querying and I'm concerned that using multi_fields instead of all simple string fields may negatively impact indexing performance.

May be there is a way to solve that problem at querying time?

ALSO each field gets "primary" name and other names with the dot and the suffix for different representations. How to select which representation should be used as primary?

On Monday, October 6, 2014 8:50:22 PM UTC-7, Doug Nelson wrote:

I use multi fields to have several different analysis types supported as need and also to have the raw version available like in your example.

On Monday, October 6, 2014 8:34:34 PM UTC-5, Konstantin Erman wrote:
I have documents in ES with the field "Message", which normally represents some multi word text string. Trying to query it with Kibana to see which strings are in this property most frequently. What I actually get back is the table which shows frequency of the specific words, but not the whole strings!

Now that I started to understand something about ES, my guess is that I supposed to map that "Message" field as { "type": "string", "index": "not_analyzed" }, so it is not split into words. But on the other hand I still want to be able to find documents by searching for some words from their message fields.

Next thought - multi_field "mapping":
{
"type" : "string",
"fields": {
"raw": { "type": "string", "index": "not_analyzed" }
}
}

So that for normal query analysed Message field would work and when I build my Terms panel I use Message.raw instead.

I need a confirmation that I'm moving in the right direction and this is optimal and intended way to achieve the goal. It does not look so elegant, that's why I'm asking. May be I miss some other ways to search string field using separate words, but still treat it as a whole for the purpose of counting. Please advise!
Konstantin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/510ce9e3-fd67-41c3-b969-b25e32eef352%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Remember that Lucene is an inverted index and you can not deduce index
volume from data input volume. If you have lots of millions of unique
values = high cardinality, you will add this to Elasticsearch index volume,
yes. If you have some thousand values = low cardinality in a field, you add
close to nothing to index volume. You can even copy a field to many other
fields and this adds close to nothing, this depends on the analyzer.

There is no alternative to multi field: if you want to search analyzed
forms, they must be in the index. If you want to examine field values from
input for frequency, e.g. in aggregations, they must be in the index
unchanged.

Jörg

On Tue, Oct 7, 2014 at 6:59 AM, Konstantin Erman konste@gmail.com wrote:

Does not it cause substantial inflation in the amount of data to be
processed and stored at indexing time?

As with most logs aggregation systems indexing is many orders of magnitude
more frequent operation than querying and I'm concerned that using
multi_fields instead of all simple string fields may negatively impact
indexing performance.

May be there is a way to solve that problem at querying time?

ALSO each field gets "primary" name and other names with the dot and the
suffix for different representations. How to select which representation
should be used as primary?

On Monday, October 6, 2014 8:50:22 PM UTC-7, Doug Nelson wrote:

I use multi fields to have several different analysis types supported as
need and also to have the raw version available like in your example.

On Monday, October 6, 2014 8:34:34 PM UTC-5, Konstantin Erman wrote:
I have documents in ES with the field "Message", which normally
represents some multi word text string. Trying to query it with Kibana to
see which strings are in this property most frequently. What I actually get
back is the table which shows frequency of the specific words, but not the
whole strings!

Now that I started to understand something about ES, my guess is that I
supposed to map that "Message" field as { "type": "string", "index":
"not_analyzed" }, so it is not split into words. But on the other hand I
still want to be able to find documents by searching for some words from
their message fields.

Next thought - multi_field "mapping":
{
"type" : "string",
"fields": {
"raw": { "type": "string", "index":
"not_analyzed" }
}
}

So that for normal query analysed Message field would work and when I
build my Terms panel I use Message.raw instead.

I need a confirmation that I'm moving in the right direction and this is
optimal and intended way to achieve the goal. It does not look so elegant,
that's why I'm asking. May be I miss some other ways to search string field
using separate words, but still treat it as a whole for the purpose of
counting. Please advise!
Konstantin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/510ce9e3-fd67-41c3-b969-b25e32eef352%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGPpc%3DtvRO%3D93XM7X4rCQVkzP%2B_dRhyJCFHWbpFvB9%2BWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.