Histogram facets on elasticsearh

Hello,

My general problem is simple. I would like to do a kind of OLAP cube using
elasticsearch.
For that I need to aggregate some value from my documents to obtain for
example data to draw histogram or pie chart.
When I do that on all of my documents, it's a bit slow.
I would like to know if an aggregation before indexing could be a good idea
to improve performance (less documents could lead to performance
improvement)
To do that, I dug on array and nested field.
My main problem is to obtain an aggregate value of the nested data
Here is an example of the data:
{

  • TravelsByHour: [
    • {
      • hour: "05:00:00"
      • count: 2
        }
    • {
      • hour: "06:00:00"
      • count: 7
        }
    • {
      • hour: "07:00:00"
      • count: 3
        }
    • {
      • hour: "08:00:00"
      • count: 1
        }
    • {
      • hour: "13:00:00"
      • count: 1
        }
    • {
      • hour: "14:00:00"
      • count: 3
        }
    • {
      • hour: "16:00:00"
      • count: 1
        }
    • {
      • hour: "17:00:00"
      • count: 1
        }
        ]
  • CI: {
    • Station: "401"
    • Name: "Hello"
    • Geo: {
      • lat: 61.5354531
      • lon: 92.161561
        }
        }
  • TravelDate: "2012-05-29"
  • Mode: "Bus"

}

It's an example, I could have {"country" : "US", "count": 13} or something
else.
The idea is to do a facet on my index to obtain the aggregate value of my
array like and I can't manage to find the proper facet.
I thought that histogram facet is what I need.
My query is as follow:

{
"query":
{"match_all":{ }},
"facets":
{"histo":{"histogram":{"key_field":"TravelsByHour.hour","value_field":"TravelsByHour.count","interval":1}}}}

But It doesn't work as I wanted.
I have a cast problem that seems to result of a mapping problem that I try
to resolve.

][DEBUG][action.search.type ] [Frost, Cordelia] [eborder][0],
node[fI3qDc_6Txa48ccT6xAU5A], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.search.SearchRequest@5e894fec]
org.elasticsearch.transport.RemoteTransportException:
[Centurius][inet[/199.0.12.126:9300]][search/phase/query]
Caused by: org.elasticsearch.search.SearchParseException: [eborder][0]:
query[ConstantScore(:)],from[-1],size[-1]: Parse Failure [Failed to parse
source
[{"query":{"match_all":{}},"facets":{"histo":{"histogram":{"key_field":"TravelsByHour.hour","value_field":"TravelsByHour.count","interval":1}}}}]]
at
org.elasticsearch.search.SearchService.parseSource(SearchService.java:573)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:484)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:469)
at
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:462)
at
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:234)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:529)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:518)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.ClassCastException:
org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be
cast to org.elasticsearch.index.fielddata.IndexNumericFieldData
at
org.elasticsearch.search.facet.histogram.HistogramFacetParser.parse(HistogramFacetParser.java:121)
at
org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:92)
at
org.elasticsearch.search.SearchService.parseSource(SearchService.java:561)
... 10 more

It's not really what's matter, here are my questions:
Is histogram facets the good facet? (again hour is an example, it could be
country, age, gender...)
Is nested data the good idea ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello again,

First here's a gist to launch what I talk
about: example · GitHub
I manage to find perhaps a solution to my problem: terms stat facet.
But it worked only with one document and not several...
The only stats that I used is total to find the total count of a value and
perhaps the mean in some cases.
Something like a count of a GROUP BY in an array...

I have another question:
It seems that an array is considered as several documents, It's like I have
just put two docs in my index but 13 docs are stored(from the head plugin
WebUI), is it normal?
Overall, do I have to code my own facets to aggregate value and count from
documents in my index?

Cheers,

Julien
On Wednesday, July 31, 2013 6:10:17 PM UTC+2, Julien Naour wrote:

Hello,

My general problem is simple. I would like to do a kind of OLAP cube using
elasticsearch.
For that I need to aggregate some value from my documents to obtain for
example data to draw histogram or pie chart.
When I do that on all of my documents, it's a bit slow.
I would like to know if an aggregation before indexing could be a good
idea to improve performance (less documents could lead to performance
improvement)
To do that, I dug on array and nested field.
My main problem is to obtain an aggregate value of the nested data
Here is an example of the data:
{

  • TravelsByHour: [
    • {
      • hour: "05:00:00"
      • count: 2
        }
    • {
      • hour: "06:00:00"
      • count: 7
        }
    • {
      • hour: "07:00:00"
      • count: 3
        }
    • {
      • hour: "08:00:00"
      • count: 1
        }
    • {
      • hour: "13:00:00"
      • count: 1
        }
    • {
      • hour: "14:00:00"
      • count: 3
        }
    • {
      • hour: "16:00:00"
      • count: 1
        }
    • {
      • hour: "17:00:00"
      • count: 1
        }
        ]
  • CI: {
    • Station: "401"
    • Name: "Hello"
    • Geo: {
      • lat: 61.5354531
      • lon: 92.161561
        }
        }
  • TravelDate: "2012-05-29"
  • Mode: "Bus"

}

It's an example, I could have {"country" : "US", "count": 13} or something
else.
The idea is to do a facet on my index to obtain the aggregate value of my
array like and I can't manage to find the proper facet.
I thought that histogram facet is what I need.
My query is as follow:

{
"query":
{"match_all":{ }},
"facets":

{"histo":{"histogram":{"key_field":"TravelsByHour.hour","value_field":"TravelsByHour.count","interval":1}}}}

But It doesn't work as I wanted.
I have a cast problem that seems to result of a mapping problem that I try
to resolve.

][DEBUG][action.search.type ] [Frost, Cordelia] [eborder][0],
node[fI3qDc_6Txa48ccT6xAU5A], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.search.SearchRequest@5e894fec]
org.elasticsearch.transport.RemoteTransportException:
[Centurius][inet[/199.0.12.126:9300]][search/phase/query]
Caused by: org.elasticsearch.search.SearchParseException: [eborder][0]:
query[ConstantScore(:)],from[-1],size[-1]: Parse Failure [Failed to parse
source
[{"query":{"match_all":{}},"facets":{"histo":{"histogram":{"key_field":"TravelsByHour.hour","value_field":"TravelsByHour.count","interval":1}}}}]]
at
org.elasticsearch.search.SearchService.parseSource(SearchService.java:573)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:484)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:469)
at
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:462)
at
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:234)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:529)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:518)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.ClassCastException:
org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be
cast to org.elasticsearch.index.fielddata.IndexNumericFieldData
at
org.elasticsearch.search.facet.histogram.HistogramFacetParser.parse(HistogramFacetParser.java:121)
at
org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:92)
at
org.elasticsearch.search.SearchService.parseSource(SearchService.java:561)
... 10 more

It's not really what's matter, here are my questions:
Is histogram facets the good facet? (again hour is an example, it could be
country, age, gender...)
Is nested data the good idea ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ok It seems that I forgot "nested": "TravelsByHour" in my term stat facet
but do I have to keep the other stats that I'm not interested in.

Julien

On Thursday, August 1, 2013 3:12:25 PM UTC+2, Julien Naour wrote:

Hello again,

First here's a gist to launch what I talk about:
example · GitHub
I manage to find perhaps a solution to my problem: terms stat facet.
But it worked only with one document and not several...
The only stats that I used is total to find the total count of a value and
perhaps the mean in some cases.
Something like a count of a GROUP BY in an array...

I have another question:
It seems that an array is considered as several documents, It's like I
have just put two docs in my index but 13 docs are stored(from the head
plugin WebUI), is it normal?
Overall, do I have to code my own facets to aggregate value and count from
documents in my index?

Cheers,

Julien
On Wednesday, July 31, 2013 6:10:17 PM UTC+2, Julien Naour wrote:

Hello,

My general problem is simple. I would like to do a kind of OLAP cube
using elasticsearch.
For that I need to aggregate some value from my documents to obtain for
example data to draw histogram or pie chart.
When I do that on all of my documents, it's a bit slow.
I would like to know if an aggregation before indexing could be a good
idea to improve performance (less documents could lead to performance
improvement)
To do that, I dug on array and nested field.
My main problem is to obtain an aggregate value of the nested data
Here is an example of the data:
{

  • TravelsByHour: [
    • {
      • hour: "05:00:00"
      • count: 2
        }
    • {
      • hour: "06:00:00"
      • count: 7
        }
    • {
      • hour: "07:00:00"
      • count: 3
        }
    • {
      • hour: "08:00:00"
      • count: 1
        }
    • {
      • hour: "13:00:00"
      • count: 1
        }
    • {
      • hour: "14:00:00"
      • count: 3
        }
    • {
      • hour: "16:00:00"
      • count: 1
        }
    • {
      • hour: "17:00:00"
      • count: 1
        }
        ]
  • CI: {
    • Station: "401"
    • Name: "Hello"
    • Geo: {
      • lat: 61.5354531
      • lon: 92.161561
        }
        }
  • TravelDate: "2012-05-29"
  • Mode: "Bus"

}

It's an example, I could have {"country" : "US", "count": 13} or
something else.
The idea is to do a facet on my index to obtain the aggregate value of my
array like and I can't manage to find the proper facet.
I thought that histogram facet is what I need.
My query is as follow:

{
"query":
{"match_all":{ }},
"facets":

{"histo":{"histogram":{"key_field":"TravelsByHour.hour","value_field":"TravelsByHour.count","interval":1}}}}

But It doesn't work as I wanted.
I have a cast problem that seems to result of a mapping problem that I
try to resolve.

][DEBUG][action.search.type ] [Frost, Cordelia] [eborder][0],
node[fI3qDc_6Txa48ccT6xAU5A], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.search.SearchRequest@5e894fec]
org.elasticsearch.transport.RemoteTransportException:
[Centurius][inet[/199.0.12.126:9300]][search/phase/query]
Caused by: org.elasticsearch.search.SearchParseException: [eborder][0]:
query[ConstantScore(:)],from[-1],size[-1]: Parse Failure [Failed to parse
source
[{"query":{"match_all":{}},"facets":{"histo":{"histogram":{"key_field":"TravelsByHour.hour","value_field":"TravelsByHour.count","interval":1}}}}]]
at
org.elasticsearch.search.SearchService.parseSource(SearchService.java:573)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:484)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:469)
at
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:462)
at
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:234)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:529)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:518)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.ClassCastException:
org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be
cast to org.elasticsearch.index.fielddata.IndexNumericFieldData
at
org.elasticsearch.search.facet.histogram.HistogramFacetParser.parse(HistogramFacetParser.java:121)
at
org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:92)
at
org.elasticsearch.search.SearchService.parseSource(SearchService.java:561)
... 10 more

It's not really what's matter, here are my questions:
Is histogram facets the good facet? (again hour is an example, it could
be country, age, gender...)
Is nested data the good idea ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It seems that the nested is not necessary in my case, an array type is good
enough. And that was that raise the number of documents...

2013/8/1 Julien Naour julnaour@gmail.com

Ok It seems that I forgot "nested": "TravelsByHour" in my term stat facet
but do I have to keep the other stats that I'm not interested in.

Julien

On Thursday, August 1, 2013 3:12:25 PM UTC+2, Julien Naour wrote:

Hello again,

First here's a gist to launch what I talk about: https://gist.github.**
com/jnaour/**a6bc8056c5f55ef13231https://gist.github.com/jnaour/a6bc8056c5f55ef13231
I manage to find perhaps a solution to my problem: terms stat facet.
But it worked only with one document and not several...
The only stats that I used is total to find the total count of a value
and perhaps the mean in some cases.
Something like a count of a GROUP BY in an array...

I have another question:
It seems that an array is considered as several documents, It's like I
have just put two docs in my index but 13 docs are stored(from the head
plugin WebUI), is it normal?
Overall, do I have to code my own facets to aggregate value and count
from documents in my index?

Cheers,

Julien
On Wednesday, July 31, 2013 6:10:17 PM UTC+2, Julien Naour wrote:

Hello,

My general problem is simple. I would like to do a kind of OLAP cube
using elasticsearch.
For that I need to aggregate some value from my documents to obtain for
example data to draw histogram or pie chart.
When I do that on all of my documents, it's a bit slow.
I would like to know if an aggregation before indexing could be a good
idea to improve performance (less documents could lead to performance
improvement)
To do that, I dug on array and nested field.
My main problem is to obtain an aggregate value of the nested data
Here is an example of the data:
{

  • TravelsByHour: [
    • {
      • hour: "05:00:00"
      • count: 2
        }
    • {
      • hour: "06:00:00"
      • count: 7
        }
    • {
      • hour: "07:00:00"
      • count: 3
        }
    • {
      • hour: "08:00:00"
      • count: 1
        }
    • {
      • hour: "13:00:00"
      • count: 1
        }
    • {
      • hour: "14:00:00"
      • count: 3
        }
    • {
      • hour: "16:00:00"
      • count: 1
        }
    • {
      • hour: "17:00:00"
      • count: 1
        }
        ]
  • CI: {
    • Station: "401"
    • Name: "Hello"
    • Geo: {
      • lat: 61.5354531
      • lon: 92.161561
        }
        }
  • TravelDate: "2012-05-29"
  • Mode: "Bus"

}

It's an example, I could have {"country" : "US", "count": 13} or
something else.
The idea is to do a facet on my index to obtain the aggregate value of
my array like and I can't manage to find the proper facet.
I thought that histogram facet is what I need.
My query is as follow:

{
"query":
{"match_all":{ }},
"facets":
{"histo":{"histogram":{"key_field":"TravelsByHour.hour","
value_field":"TravelsByHour.**count","interval":1}}}}

But It doesn't work as I wanted.
I have a cast problem that seems to result of a mapping problem that I
try to resolve.

][DEBUG][action.search.type ] [Frost, Cordelia] [eborder][0],
node[fI3qDc_6Txa48ccT6xAU5A], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.**search.SearchRequest@5e894fec]
org.elasticsearch.transport.**RemoteTransportException:
[Centurius][inet[/199.0.12.126:9300]][search/phase/query]
Caused by: org.elasticsearch.search.SearchParseException:
[eborder][0]: query[ConstantScore(:)],from[-1],size[-1]: Parse
Failure [Failed to parse source [{"query":{"match_all":{}},"

facets":{"histo":{"histogram":
{"key_field":"TravelsByHour.

hour","value_field":"**TravelsByHour.count","**interval":1}}}}]]
at org.elasticsearch.search.SearchService.parseSource(
SearchService.java:573)
at org.elasticsearch.search.SearchService.createContext(
SearchService.java:484)
at org.elasticsearch.search.SearchService.createContext(
SearchService.java:469)
at org.elasticsearch.search.SearchService.
createAndPutContext(**SearchService.java:462)
at org.elasticsearch.search.**SearchService.**executeQueryPhase(
**SearchService.java:234)
at org.elasticsearch.search.action.
SearchServiceTransportAction$SearchQueryTransportHandler.
messageReceived(**SearchServiceTransportAction.**java:529)
at org.elasticsearch.search.action.
SearchServiceTransportAction$SearchQueryTransportHandler.
messageReceived(**SearchServiceTransportAction.**java:518)
at org.elasticsearch.transport.netty.MessageChannelHandler$
RequestHandler.run(**MessageChannelHandler.java:**265)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.ClassCastException: org.elasticsearch.index.

fielddata.plain.**PagedBytesIndexFieldData cannot be cast to
org.elasticsearch.index.**fielddata.**IndexNumericFieldData
at org.elasticsearch.search.facet.histogram.
HistogramFacetParser.parse(**HistogramFacetParser.java:121)
at org.elasticsearch.search.facet.FacetParseElement.parse(
FacetParseElement.java:92)
at org.elasticsearch.search.SearchService.parseSource(
SearchService.java:561)
... 10 more

It's not really what's matter, here are my questions:
Is histogram facets the good facet? (again hour is an example, it could
be country, age, gender...)
Is nested data the good idea ?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RjuSr9clcyg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.