Facet and grouping


(pric) #1

Hello, I would like to know if it's possible to do this with elasticsearch:

my documents: (it's for the example, in the real there is more fields)

doc 1 -> brand: audi, color: black
doc 2 -> brand: audi, color: white
doc 3 -> brand: audi, color: black
doc 4 -> brand: bmw, color: white

and the facets result I would like to get:

black -> 1 (because just one brand, audi)
white -> 2 (because 2 brand, audi and bmw)

can I do this with ES. Thanks


(Lukáš Vlček) #2

Hi,

I think you might want to look at Terms Stats facet
http://www.elasticsearch.org/guide/reference/api/search/facets/terms-stats-facet.html

Regards,
Lukas

On Wed, Jul 4, 2012 at 11:45 AM, pric p.richard195@laposte.net wrote:

Hello, I would like to know if it's possible to do this with elasticsearch:

my documents: (it's for the example, in the real there is more fields)

doc 1 -> brand: audi, color: black
doc 2 -> brand: audi, color: white
doc 3 -> brand: audi, color: black
doc 4 -> brand: bmw, color: white

and the facets result I would like to get:

black -> 1 (because just one brand, audi)
white -> 2 (because 2 brand, audi and bmw)

can I do this with ES. Thanks


(pric) #3

Thanks for your help Lukáš but I can't do this with terms stats.

If somebody have another idea?
thanks


(Lukáš Vlček) #4

Yea, you are right. I think there is no specific facet for this calculation
type now.

However, with a little bit more work you might be able to get this. But it
depends on your particular use case. For example you could add a new filed
into your object which would be the combination of 'color' and 'car maker'
and then you could do the terms facet on this field and process it on the
client side. But if you are going to have a lot of combinations this would
not work well (because you will have to pull all the distinct value using
all_terms option
http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html
and
then do the final sorting by aggregated counts on the client as well).

Regards,
Lukas

On Wed, Jul 4, 2012 at 1:32 PM, pric p.richard195@laposte.net wrote:

Thanks for your help Lukáš but I can't do this with terms stats.

If somebody have another idea?
thanks


(pric) #5

Thanks Lukáš, but in my case it will be not possible. I will try to fin
another way.


(sujoysett) #6

You can use a script facet like this, with separator of your choice (I
prefer '~~~')

{
"from": 0,
"size": 0,
"facets": {
"test": {
"terms": {
"script": "doc['color'].value + "~~~" +
doc['brand'].value"
},
"global": false
}
}
}

It will give you facet results like

"terms": [
    {
        "term": "black~~~audi",
        "count": 2
    },
    {
        "term": "white~~~audi",
        "count": 1
    },
    {
        "term": "white~~~bmw",
        "count": 1
    }
]

Now you can to use a recursive program (I used a simple recursive
javascript function here) to convert this json to some hierarchical json ,
something like the following

{
"black": {
"audi": 2
},
"white": {
"audi": 1,
"bmw": 1
}
}

Counting the elements at the desired level would give you desired count.
Such as 'white' has 2 elements, 'black' has 1 element, and such.

I faced the shortage of hierarchical facet in elasticsearch, and since it
was much necessary for building json for advanced drill down
visualizations, developed multilevel json (4-5 levels drill down) by this
way.

-- Sujoy


(Lukáš Vlček) #7

Hi,

yea, this will work but only to the certain level of color brand
combinations. Generally this would not be very efficient for higher number
of combinations. The problematic part is order (sort) of aggregated data
(which you need to calculate on the client side) because if you are after
DESC like sort by count you would have to pull all the possible
combinations. And if I understand it correctly then even if you want to get
only top N items it still means you have to process all possible
combinations.

Regards,
Lukas

On Thu, Jul 5, 2012 at 7:55 AM, Sujoy Sett sujoysett@gmail.com wrote:

You can use a script facet like this, with separator of your choice (I
prefer '~~~')

{
"from": 0,
"size": 0,
"facets": {
"test": {
"terms": {
"script": "doc['color'].value + "~~~" +
doc['brand'].value"
},
"global": false
}
}
}

It will give you facet results like

"terms": [
    {
        "term": "black~~~audi",
        "count": 2
    },
    {
        "term": "white~~~audi",
        "count": 1
    },
    {
        "term": "white~~~bmw",
        "count": 1
    }
]

Now you can to use a recursive program (I used a simple recursive
javascript function here) to convert this json to some hierarchical json ,
something like the following

{
"black": {
"audi": 2
},
"white": {
"audi": 1,
"bmw": 1
}
}

Counting the elements at the desired level would give you desired count.
Such as 'white' has 2 elements, 'black' has 1 element, and such.

I faced the shortage of hierarchical facet in elasticsearch, and since it
was much necessary for building json for advanced drill down
visualizations, developed multilevel json (4-5 levels drill down) by this
way.

-- Sujoy


(sujoysett) #8

Hi Lukáš,

Of course there are limitations. I have used this for five fields
combinations max (means five levels drill down), on keyword analyzed
fields, with an index of approx 1lac docs. Response time was acceptable
with a normal UI hosting advanced charts.

For parsing, iterating and sorting requirements I used underscore.js
utility. In absence of hierarchical faceting, this was the best I could get
as a way-around, and it worked well. For finding 'top' results fast on huge
datasets, separate query on each field can give 'top' of each fields, and
using those to filter the final query can help.

But again, these are customized alternatives, to use only when primary is
missing.

-- Sujoy.

On Thursday, July 5, 2012 12:15:34 PM UTC+5:30, Lukáš Vlček wrote:

Hi,

yea, this will work but only to the certain level of color brand
combinations. Generally this would not be very efficient for higher number
of combinations. The problematic part is order (sort) of aggregated data
(which you need to calculate on the client side) because if you are after
DESC like sort by count you would have to pull all the possible
combinations. And if I understand it correctly then even if you want to get
only top N items it still means you have to process all possible
combinations.

Regards,
Lukas

On Thu, Jul 5, 2012 at 7:55 AM, Sujoy Sett wrote:

You can use a script facet like this, with separator of your choice (I
prefer '~~~')

{
"from": 0,
"size": 0,
"facets": {
"test": {
"terms": {
"script": "doc['color'].value + "~~~" +
doc['brand'].value"
},
"global": false
}
}
}

It will give you facet results like

"terms": [
    {
        "term": "black~~~audi",
        "count": 2
    },
    {
        "term": "white~~~audi",
        "count": 1
    },
    {
        "term": "white~~~bmw",
        "count": 1
    }
]

Now you can to use a recursive program (I used a simple recursive
javascript function here) to convert this json to some hierarchical json ,
something like the following

{
"black": {
"audi": 2
},
"white": {
"audi": 1,
"bmw": 1
}
}

Counting the elements at the desired level would give you desired count.
Such as 'white' has 2 elements, 'black' has 1 element, and such.

I faced the shortage of hierarchical facet in elasticsearch, and since it
was much necessary for building json for advanced drill down
visualizations, developed multilevel json (4-5 levels drill down) by this
way.

-- Sujoy


(system) #9