ElasticSearch and grouping


(Nicolas Blanc) #1

Hi the list!

Some days ago, i started to work on ES, and grouping function. Today i have
something which work, thx to Martijn and his experimental grouping code.

I adapt Martijn code to the 0.19.4 version (thx also to lusini.de).

Based on that code, i added more facets to manage grouping (for now
date_histogram, histogram in normal mode (i.e. without script ou key/value
or bound), and range). And it works. I will add more facets soon, as my
company want to use more of them.

After that, i worked on a version which will not be tied to 1 shard. First
i tried a full global grouping. I have a branch with that, and it works.
But it's a nightmare to have valid facets. So i won't work on this feature
for some times.

My second great adventure was to test my patched martijn's code on multiple
shards, with all items to group in the same shard (i use the parent/child
functionnality to achieve this goal). The grouping works, and facets are
good too. But i had a problem as soon as i want the result to be sorted.
So my question is :
Is a developper in the group, who can explain me how i can Merge and sort
docs from all shards ? It's the last problem which can prevent me using ES
in production :slight_smile:

Thx in advance!

--
Nicolas BLANC.

P.S.: If Some of you want it, i can share my code which add grouping to
Range, Histogram and DateHistogram facets.

--


(Jörg Prante) #2

Hi Nicolas,

well, I would love to have a look at your code!

Without knowing much in detail about the grouping patch state as it is now,
the merge/sort step to build groups out of shards is not easy.

You have to introduce a group function, which shouldn't be necessarily a
sort function, more a collapse function that could allow round-robin group
building, that is, look at each shard, apply the group feature extraction,
compare to current group, collapse or not, and proceed to iterate over the
shards until last document is grouped. This is expensive. Sorting groups is
soon going to be even more expensive because the whole result set would
have to be examined. :frowning:

But, I think, a sort should be similar to facet sorting implementation.
Sorting facets works that each shard performs faceting on its own and
pushes the result up to the node level, transporting the required sort
parameters, and on that level, the collected facets are sorted again.

Cheers,

Jörg

On Saturday, October 20, 2012 11:59:33 AM UTC+2, Nicolas Blanc wrote:

Hi the list!

Some days ago, i started to work on ES, and grouping function. Today i
have something which work, thx to Martijn and his experimental grouping
code.

I adapt Martijn code to the 0.19.4 version (thx also to lusini.de).

Based on that code, i added more facets to manage grouping (for now
date_histogram, histogram in normal mode (i.e. without script ou key/value
or bound), and range). And it works. I will add more facets soon, as my
company want to use more of them.

After that, i worked on a version which will not be tied to 1 shard. First
i tried a full global grouping. I have a branch with that, and it works.
But it's a nightmare to have valid facets. So i won't work on this feature
for some times.

My second great adventure was to test my patched martijn's code on
multiple shards, with all items to group in the same shard (i use the
parent/child functionnality to achieve this goal). The grouping works, and
facets are good too. But i had a problem as soon as i want the result to be
sorted.
So my question is :
Is a developper in the group, who can explain me how i can Merge and sort
docs from all shards ? It's the last problem which can prevent me using ES
in production :slight_smile:

Thx in advance!

--
Nicolas BLANC.

P.S.: If Some of you want it, i can share my code which add grouping to
Range, Histogram and DateHistogram facets.

--


(system) #3