Hello everybody,

I have several questions.

We have a project with two axes: BI (some stats about our aggregated data

visualized on graphs with filters and dynamicity between these graphs) and

search. At the beginning we choose elasticsearch just for the search. The

easy way for the BI is an OLAP cube. We have some scalability needs and

that leed us to consider elasticsearch as a pseudo OLAP cube using facets

and filters.

Question 1 : Is it a good idea? Or at least not so bad idea?

The fact that filters could be cached independantly using bool filter is an

avantage considering our problem. We know that facets could be pretty

processing intensive and that could lead to some performance issue. We

expect that scaling out could minimize this performance issue.

We have about 100 000 000 instances of data. Data will be aggregate.

There will be a screen with almost 10 graphs on it, each graph will need

data but are close in term of filters. We are not sure what are the best

practices: have independant data modeling for independant query one by

graph or a big query on a unique type for all graphs. In a development

point of view, it could be better to have the more important graphs and

data as possible, we could beneficiate of asychronicious response to show

graphs when there are ready but in a performance point of view big query on

a big type could be better.

Question 2 : What is the best practice: big query on a big type? One

query/graph on a big type? One query and one type/graph?

To do the BI thing I want to minimize the number of data by aggregating the

most. In this context I have to solution to minimize the number of data by

index/type using array. Imagine that I have a type with this data on it :

{

"filter1" : "A",

"filter2" : "B",

"value" : "value1",

"count" : 1

}

{

"filter1" : "A",

"filter2" : "B",

"value" : "value2",

"count" : 4

}

Question 3: Could it be more interesting to have the folowing data instead ?

{

"filter1" : "A",

"filter2" : "B",

"data" : [{

"value" : "value1",

"count" : 1

}, {

"value" : "value2",

"count" : 4

}

]

}

For the moment in the first case I use a term stats facets to have the

aggregated value like the following:

{

"query" : {

"filtered" : {

"filter" : {

"bool" : {

"must" : [{

"term" : {

"filter1" : "A"

}

}, {

"term" : {

"filter2" : "B"

}

}

]

}

}

}

},

"facets" : {

"value" : {

"terms_stats" : {

"key_field" : "value",

"value_field" : "count"

}

}

}

}

terms stat give many things that I don't want like mean, min, max etc.

Question 4: Is there a more light way to obtain my aggregated value, a term

facet with key_field and value_field for example ?

Thank you in advance

Julien Naour

