Parent/Child, trying to find my way without Grouping function

Hi all!

Actually i'm working on a project to put important search on my website
company, from SQL, to ES. The main prob for me is the grouping function.

Just a samall explain first :slight_smile:

I have messages from users, grouped by thread. In the sql schema, i have a
table for thread, a table for messages with an id referencing an index from
the thread message (foreign key). On the webiste, we show first the list of
thread to customer. And on this list, depending on which sort the user
choose, we show only the first message corresponding on each thread.

To achieve this, i try tochange my data model. But not find any idea very
relevant. So after some nightmare, i tried the nested object. It's awful
when you need to index hundreds of messages per thread. Each time i add a
message, the whole thread is re indexed... And by the way, when i search on
the index, the object returned is always the full thread with all childs.

So, i give another try to Parent/Child way. It was also not what i need. So
first i created the thread object, and then the message one, with parent
pointing to the thread one. The request top_children is good to sort the
thread based on messages... But the main concern is that i don't know how
to retrieve the message field, from the message involved in the thread
selection. And i absolutely need to get this fields... With the scope, i am
able to have good facets, but the lack of child fields is blocking for me.

Some of you know how to get rid of these problem ? All my consideration,
love, repect and whatever to the one who will help me!

I also put my question in the french group, but i saw that the original one
is more active.

Thx in advance,

--
Nicolas BLANC.

--

Hi,

I think you can just issue a second query to get the child document per
thread object.

I do not know your use case in detail but if you display like 10 top most
relevant parents then once you get them you can issue a second query for
ten children (and you can use multi search for this
Elasticsearch Platform — Find real-time answers at scale | Elastic )

Would that help?

Regards,
Lukas

On Tue, Aug 14, 2012 at 3:52 PM, Nicolas Blanc
nicolas.blanc@blablacar.comwrote:

Hi all!

Actually i'm working on a project to put important search on my website
company, from SQL, to ES. The main prob for me is the grouping function.

Just a samall explain first :slight_smile:

I have messages from users, grouped by thread. In the sql schema, i have a
table for thread, a table for messages with an id referencing an index from
the thread message (foreign key). On the webiste, we show first the list of
thread to customer. And on this list, depending on which sort the user
choose, we show only the first message corresponding on each thread.

To achieve this, i try tochange my data model. But not find any idea very
relevant. So after some nightmare, i tried the nested object. It's awful
when you need to index hundreds of messages per thread. Each time i add a
message, the whole thread is re indexed... And by the way, when i search on
the index, the object returned is always the full thread with all childs.

So, i give another try to Parent/Child way. It was also not what i need.
So first i created the thread object, and then the message one, with parent
pointing to the thread one. The request top_children is good to sort the
thread based on messages... But the main concern is that i don't know how
to retrieve the message field, from the message involved in the thread
selection. And i absolutely need to get this fields... With the scope, i am
able to have good facets, but the lack of child fields is blocking for me.

Some of you know how to get rid of these problem ? All my consideration,
love, repect and whatever to the one who will help me!

I also put my question in the french group, but i saw that the original
one is more active.

Thx in advance,

--
Nicolas BLANC.

--

--

It can be a solution with some cases... But sometimes the first request can
be long and painful. I don't think it's easy to implement from our side
(repeat the query 10 times, and adding a new filter for the parent id).

Is there a way i can contribute to the main elasticsearch code, to add
grouping by field (on individual shard) ? I agree to work all the time
needed to add these missing key feature for me!!!

How can i contribute efficiently ?

Thx in advance,

--
Nicolas BLANC.

2012/8/14 Lukáš Vlček lukas.vlcek@gmail.com

Hi,

I think you can just issue a second query to get the child document per
thread object.

I do not know your use case in detail but if you display like 10 top most
relevant parents then once you get them you can issue a second query for
ten children (and you can use multi search for this
Elasticsearch Platform — Find real-time answers at scale | Elastic )

Would that help?

Regards,
Lukas

On Tue, Aug 14, 2012 at 3:52 PM, Nicolas Blanc <
nicolas.blanc@blablacar.com> wrote:

Hi all!

Actually i'm working on a project to put important search on my website
company, from SQL, to ES. The main prob for me is the grouping function.

Just a samall explain first :slight_smile:

I have messages from users, grouped by thread. In the sql schema, i have
a table for thread, a table for messages with an id referencing an index
from the thread message (foreign key). On the webiste, we show first the
list of thread to customer. And on this list, depending on which sort the
user choose, we show only the first message corresponding on each thread.

To achieve this, i try tochange my data model. But not find any idea very
relevant. So after some nightmare, i tried the nested object. It's awful
when you need to index hundreds of messages per thread. Each time i add a
message, the whole thread is re indexed... And by the way, when i search on
the index, the object returned is always the full thread with all childs.

So, i give another try to Parent/Child way. It was also not what i need.
So first i created the thread object, and then the message one, with parent
pointing to the thread one. The request top_children is good to sort the
thread based on messages... But the main concern is that i don't know how
to retrieve the message field, from the message involved in the thread
selection. And i absolutely need to get this fields... With the scope, i am
able to have good facets, but the lack of child fields is blocking for me.

Some of you know how to get rid of these problem ? All my consideration,
love, repect and whatever to the one who will help me!

I also put my question in the french group, but i saw that the original
one is more active.

Thx in advance,

--
Nicolas BLANC.

--

--

--

Hi Nicolas,

I did some work in the past for result grouping. It is not perfect,
but I think it something you can try out:
https://github.com/martijnvg/elasticsearch-with-local-grouping

Or you can patches:

Martijn

On 14 August 2012 18:23, Nicolas Blanc nicolas.blanc@blablacar.com wrote:

It can be a solution with some cases... But sometimes the first request can
be long and painful. I don't think it's easy to implement from our side
(repeat the query 10 times, and adding a new filter for the parent id).

Is there a way i can contribute to the main elasticsearch code, to add
grouping by field (on individual shard) ? I agree to work all the time
needed to add these missing key feature for me!!!

How can i contribute efficiently ?

Thx in advance,

--
Nicolas BLANC.

2012/8/14 Lukáš Vlček lukas.vlcek@gmail.com

Hi,

I think you can just issue a second query to get the child document per
thread object.

I do not know your use case in detail but if you display like 10 top most
relevant parents then once you get them you can issue a second query for ten
children (and you can use multi search for this
Elasticsearch Platform — Find real-time answers at scale | Elastic )

Would that help?

Regards,
Lukas

On Tue, Aug 14, 2012 at 3:52 PM, Nicolas Blanc
nicolas.blanc@blablacar.com wrote:

Hi all!

Actually i'm working on a project to put important search on my website
company, from SQL, to ES. The main prob for me is the grouping function.

Just a samall explain first :slight_smile:

I have messages from users, grouped by thread. In the sql schema, i have
a table for thread, a table for messages with an id referencing an index
from the thread message (foreign key). On the webiste, we show first the
list of thread to customer. And on this list, depending on which sort the
user choose, we show only the first message corresponding on each thread.

To achieve this, i try tochange my data model. But not find any idea very
relevant. So after some nightmare, i tried the nested object. It's awful
when you need to index hundreds of messages per thread. Each time i add a
message, the whole thread is re indexed... And by the way, when i search on
the index, the object returned is always the full thread with all childs.

So, i give another try to Parent/Child way. It was also not what i need.
So first i created the thread object, and then the message one, with parent
pointing to the thread one. The request top_children is good to sort the
thread based on messages... But the main concern is that i don't know how to
retrieve the message field, from the message involved in the thread
selection. And i absolutely need to get this fields... With the scope, i am
able to have good facets, but the lack of child fields is blocking for me.

Some of you know how to get rid of these problem ? All my consideration,
love, repect and whatever to the one who will help me!

I also put my question in the french group, but i saw that the original
one is more active.

Thx in advance,

--
Nicolas BLANC.

--

--

--

--
Met vriendelijke groet,

Martijn van Groningen

--

Hi Martin,

Just follow your link, and i will give it a try!

Just a small question, i saw in your doc that i need to have only 1 shard!

As i group all my threads by shard (thx to parent/child), is your grouping
working on multiple shard? (i mean all objects to group are inside the same
shard)

thx in advance,

--
Nicolas BLANC.

Le mercredi 15 août 2012 10:16:13 UTC+2, Martijn v Groningen a écrit :

Hi Nicolas,

I did some work in the past for result grouping. It is not perfect,
but I think it something you can try out:
https://github.com/martijnvg/elasticsearch-with-local-grouping

Or you can patches:
GitHub - lusini/elasticsearch-grouping-patches: This repository yields patches for elasticsearch with grouping support

Martijn

--

What if you'll try to use "ids" filter in second query? First query will
return you a list of threads, then you in loop getting threads ids in array
and perform the second filtered query (match_all) with "ids" filter.

Something like:

    $threads = $threadsIndex->search($threadsQuery);
    $ids = array();
    foreach ($threads as /** @var $thread Elastica_Document */ $thread) 

{
$ids = $thread->getId();
}
$messagesQuery = new Elastica_Query_Filtered(
new Elastica_Query_MatchAll(),
new Elastica_Filter_Ids(null, $ids)
);

четверг, 16 августа 2012 г., 12:04:43 UTC+3 пользователь Nicolas Blanc
написал:

Hi Martin,

Just follow your link, and i will give it a try!

Just a small question, i saw in your doc that i need to have only 1 shard!

As i group all my threads by shard (thx to parent/child), is your grouping
working on multiple shard? (i mean all objects to group are inside the same
shard)

thx in advance,

--
Nicolas BLANC.

Le mercredi 15 août 2012 10:16:13 UTC+2, Martijn v Groningen a écrit :

Hi Nicolas,

I did some work in the past for result grouping. It is not perfect,
but I think it something you can try out:
https://github.com/martijnvg/elasticsearch-with-local-grouping

Or you can patches:
GitHub - lusini/elasticsearch-grouping-patches: This repository yields patches for elasticsearch with grouping support

Martijn

--

Nice idea.

Or use multiget: Elasticsearch Platform — Find real-time answers at scale | Elastic

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 17 août 2012 à 00:41, Yeah! victor.gryshko@gmail.com a écrit :

What if you'll try to use "ids" filter in second query? First query will return you a list of threads, then you in loop getting threads ids in array and perform the second filtered query (match_all) with "ids" filter.

Something like:

    $threads = $threadsIndex->search($threadsQuery);
    $ids = array();
    foreach ($threads as /** @var $thread Elastica_Document */ $thread) {
        $ids[] = $thread->getId();
    }
    $messagesQuery = new Elastica_Query_Filtered(
        new Elastica_Query_MatchAll(),
        new Elastica_Filter_Ids(null, $ids)
    );

четверг, 16 августа 2012 г., 12:04:43 UTC+3 пользователь Nicolas Blanc написал:
Hi Martin,

Just follow your link, and i will give it a try!

Just a small question, i saw in your doc that i need to have only 1 shard!

As i group all my threads by shard (thx to parent/child), is your grouping working on multiple shard? (i mean all objects to group are inside the same shard)

thx in advance,

--
Nicolas BLANC.

Le mercredi 15 août 2012 10:16:13 UTC+2, Martijn v Groningen a écrit :
Hi Nicolas,

I did some work in the past for result grouping. It is not perfect,
but I think it something you can try out:
https://github.com/martijnvg/elasticsearch-with-local-grouping

Or you can patches:
GitHub - lusini/elasticsearch-grouping-patches: This repository yields patches for elasticsearch with grouping support

Martijn

--

Hi Nicolas,

No, it doesn't work in a cluster with more than one shard. The result
grouping in my fork is more of an experiment.
Since you do have multiple shards doing a subsequent search that is
suggested by others is a good option to explore.

Martijn

On 17 August 2012 03:39, David Pilato david@pilato.fr wrote:

Nice idea.

Or use multiget:
Elasticsearch Platform — Find real-time answers at scale | Elastic

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 17 août 2012 à 00:41, Yeah! victor.gryshko@gmail.com a écrit :

What if you'll try to use "ids" filter in second query? First query will
return you a list of threads, then you in loop getting threads ids in array
and perform the second filtered query (match_all) with "ids" filter.

Something like:

    $threads = $threadsIndex->search($threadsQuery);
    $ids = array();
    foreach ($threads as /** @var $thread Elastica_Document */ $thread)

{
$ids = $thread->getId();
}
$messagesQuery = new Elastica_Query_Filtered(
new Elastica_Query_MatchAll(),
new Elastica_Filter_Ids(null, $ids)
);

четверг, 16 августа 2012 г., 12:04:43 UTC+3 пользователь Nicolas Blanc
написал:

Hi Martin,

Just follow your link, and i will give it a try!

Just a small question, i saw in your doc that i need to have only 1 shard!

As i group all my threads by shard (thx to parent/child), is your grouping
working on multiple shard? (i mean all objects to group are inside the same
shard)

thx in advance,

--
Nicolas BLANC.

Le mercredi 15 août 2012 10:16:13 UTC+2, Martijn v Groningen a écrit :

Hi Nicolas,

I did some work in the past for result grouping. It is not perfect,
but I think it something you can try out:
https://github.com/martijnvg/elasticsearch-with-local-grouping

Or you can patches:
GitHub - lusini/elasticsearch-grouping-patches: This repository yields patches for elasticsearch with grouping support

Martijn

--

--

--
Met vriendelijke groet,

Martijn van Groningen

--

Thx to all!

But i already try to find my way with multiple requests... And it's really
complicated to obtain what i want in a decent time...

So, as i am crazy... And more! I just start to work on your base code
Martijn, to adapt it to a 'more than one shard' configuration. It's not
easy, but i think i really need this feature! Better than ask to my
frontend dev team to code a ugly workaround. In a KISS environnement, the
simple and stupid thing here is to code the missing feature! So let's go!

Thx all for your propositions, nice to see a lot of helpers in the group.

Is existing somewhere some guidelines to follow when coding in
elasticsearch ? Didn't find any...

--
Nicolas BLANC.

Le vendredi 17 août 2012 10:35:36 UTC+2, Martijn v Groningen a écrit :

Hi Nicolas,

No, it doesn't work in a cluster with more than one shard. The result
grouping in my fork is more of an experiment.
Since you do have multiple shards doing a subsequent search that is
suggested by others is a good option to explore.

Martijn

On 17 August 2012 03:39, David Pilato <da...@pilato.fr <javascript:>>
wrote:

Nice idea.

Or use multiget:
Elasticsearch Platform — Find real-time answers at scale | Elastic

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 17 août 2012 à 00:41, Yeah! <victor....@gmail.com <javascript:>> a
écrit :

What if you'll try to use "ids" filter in second query? First query will
return you a list of threads, then you in loop getting threads ids in
array
and perform the second filtered query (match_all) with "ids" filter.

Something like:

    $threads = $threadsIndex->search($threadsQuery); 
    $ids = array(); 
    foreach ($threads as /** @var $thread Elastica_Document */ 

$thread)

{
$ids = $thread->getId();
}
$messagesQuery = new Elastica_Query_Filtered(
new Elastica_Query_MatchAll(),
new Elastica_Filter_Ids(null, $ids)
);

четверг, 16 августа 2012 г., 12:04:43 UTC+3 пользователь Nicolas Blanc
написал:

Hi Martin,

Just follow your link, and i will give it a try!

Just a small question, i saw in your doc that i need to have only 1
shard!

As i group all my threads by shard (thx to parent/child), is your
grouping
working on multiple shard? (i mean all objects to group are inside the
same
shard)

thx in advance,

--
Nicolas BLANC.

Le mercredi 15 août 2012 10:16:13 UTC+2, Martijn v Groningen a écrit :

Hi Nicolas,

I did some work in the past for result grouping. It is not perfect,
but I think it something you can try out:
https://github.com/martijnvg/elasticsearch-with-local-grouping

Or you can patches:
GitHub - lusini/elasticsearch-grouping-patches: This repository yields patches for elasticsearch with grouping support

Martijn

--

--

--
Met vriendelijke groet,

Martijn van Groningen

--

If you're using the _parent field already, then I think the result
grouping might already work correctly in a multi shard environment.
When the _parent field is used it makes sure that docs with the same
_parent value end up on the same shard and since
the grouping works locally the result grouping should work. I haven't
tested this, but I think it works.

Martijn

On 17 August 2012 11:00, Nicolas Blanc nicolas.blanc@blablacar.com wrote:

Thx to all!

But i already try to find my way with multiple requests... And it's really
complicated to obtain what i want in a decent time...

So, as i am crazy... And more! I just start to work on your base code
Martijn, to adapt it to a 'more than one shard' configuration. It's not
easy, but i think i really need this feature! Better than ask to my frontend
dev team to code a ugly workaround. In a KISS environnement, the simple and
stupid thing here is to code the missing feature! So let's go!

Thx all for your propositions, nice to see a lot of helpers in the group.

Is existing somewhere some guidelines to follow when coding in elasticsearch
? Didn't find any...

--
Nicolas BLANC.

Le vendredi 17 août 2012 10:35:36 UTC+2, Martijn v Groningen a écrit :

Hi Nicolas,

No, it doesn't work in a cluster with more than one shard. The result
grouping in my fork is more of an experiment.
Since you do have multiple shards doing a subsequent search that is
suggested by others is a good option to explore.

Martijn

On 17 August 2012 03:39, David Pilato da...@pilato.fr wrote:

Nice idea.

Or use multiget:
Elasticsearch Platform — Find real-time answers at scale | Elastic

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 17 août 2012 à 00:41, Yeah! victor....@gmail.com a écrit :

What if you'll try to use "ids" filter in second query? First query will
return you a list of threads, then you in loop getting threads ids in
array
and perform the second filtered query (match_all) with "ids" filter.

Something like:

    $threads = $threadsIndex->search($threadsQuery);
    $ids = array();
    foreach ($threads as /** @var $thread Elastica_Document */

$thread)
{
$ids = $thread->getId();
}
$messagesQuery = new Elastica_Query_Filtered(
new Elastica_Query_MatchAll(),
new Elastica_Filter_Ids(null, $ids)
);

четверг, 16 августа 2012 г., 12:04:43 UTC+3 пользователь Nicolas Blanc
написал:

Hi Martin,

Just follow your link, and i will give it a try!

Just a small question, i saw in your doc that i need to have only 1
shard!

As i group all my threads by shard (thx to parent/child), is your
grouping
working on multiple shard? (i mean all objects to group are inside the
same
shard)

thx in advance,

--
Nicolas BLANC.

Le mercredi 15 août 2012 10:16:13 UTC+2, Martijn v Groningen a écrit :

Hi Nicolas,

I did some work in the past for result grouping. It is not perfect,
but I think it something you can try out:
https://github.com/martijnvg/elasticsearch-with-local-grouping

Or you can patches:
GitHub - lusini/elasticsearch-grouping-patches: This repository yields patches for elasticsearch with grouping support

Martijn

--

--

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--

Just import your work on the master branch of elasticsearch... Seems to
work out of the box (in my case) :slight_smile:
Need to adapt some facet code, but you did a very good job for an
experimental code :slight_smile:

Thx for this wonderfull piece of patches :slight_smile:

Just a last question... What's missing in your code, so it can be accepted
in the master elasticsearch branch ? I feel i'm not alone to need these
feature, already coded by you...

--
Nicolas BLANC.

2012/8/17 Martijn v Groningen martijn.v.groningen@gmail.com

If you're using the _parent field already, then I think the result
grouping might already work correctly in a multi shard environment.
When the _parent field is used it makes sure that docs with the same
_parent value end up on the same shard and since
the grouping works locally the result grouping should work. I haven't
tested this, but I think it works.

Martijn

On 17 August 2012 11:00, Nicolas Blanc nicolas.blanc@blablacar.com
wrote:

Thx to all!

But i already try to find my way with multiple requests... And it's
really
complicated to obtain what i want in a decent time...

So, as i am crazy... And more! I just start to work on your base code
Martijn, to adapt it to a 'more than one shard' configuration. It's not
easy, but i think i really need this feature! Better than ask to my
frontend
dev team to code a ugly workaround. In a KISS environnement, the simple
and
stupid thing here is to code the missing feature! So let's go!

Thx all for your propositions, nice to see a lot of helpers in the group.

Is existing somewhere some guidelines to follow when coding in
elasticsearch
? Didn't find any...

--
Nicolas BLANC.

Le vendredi 17 août 2012 10:35:36 UTC+2, Martijn v Groningen a écrit :

Hi Nicolas,

No, it doesn't work in a cluster with more than one shard. The result
grouping in my fork is more of an experiment.
Since you do have multiple shards doing a subsequent search that is
suggested by others is a good option to explore.

Martijn

On 17 August 2012 03:39, David Pilato da...@pilato.fr wrote:

Nice idea.

Or use multiget:
Elasticsearch Platform — Find real-time answers at scale | Elastic

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 17 août 2012 à 00:41, Yeah! victor....@gmail.com a écrit :

What if you'll try to use "ids" filter in second query? First query
will
return you a list of threads, then you in loop getting threads ids in
array
and perform the second filtered query (match_all) with "ids" filter.

Something like:

    $threads = $threadsIndex->search($threadsQuery);
    $ids = array();
    foreach ($threads as /** @var $thread Elastica_Document */

$thread)
{
$ids = $thread->getId();
}
$messagesQuery = new Elastica_Query_Filtered(
new Elastica_Query_MatchAll(),
new Elastica_Filter_Ids(null, $ids)
);

четверг, 16 августа 2012 г., 12:04:43 UTC+3 пользователь Nicolas Blanc
написал:

Hi Martin,

Just follow your link, and i will give it a try!

Just a small question, i saw in your doc that i need to have only 1
shard!

As i group all my threads by shard (thx to parent/child), is your
grouping
working on multiple shard? (i mean all objects to group are inside
the
same
shard)

thx in advance,

--
Nicolas BLANC.

Le mercredi 15 août 2012 10:16:13 UTC+2, Martijn v Groningen a écrit
:

Hi Nicolas,

I did some work in the past for result grouping. It is not perfect,
but I think it something you can try out:
https://github.com/martijnvg/elasticsearch-with-local-grouping

Or you can patches:
GitHub - lusini/elasticsearch-grouping-patches: This repository yields patches for elasticsearch with grouping support

Martijn

--

--

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--

You are not alone :slight_smile:

Result grouping is going to be added to ES in the near future. Some internal
refactoring need to happen first. Also the grouped response I have in my fork
will change.

Martijn

On 17 August 2012 14:15, Nicolas Blanc nicolas.blanc@blablacar.com wrote:

Just import your work on the master branch of elasticsearch... Seems to work
out of the box (in my case) :slight_smile:
Need to adapt some facet code, but you did a very good job for an
experimental code :slight_smile:

Thx for this wonderfull piece of patches :slight_smile:

Just a last question... What's missing in your code, so it can be accepted
in the master elasticsearch branch ? I feel i'm not alone to need these
feature, already coded by you...

--
Nicolas BLANC.

2012/8/17 Martijn v Groningen martijn.v.groningen@gmail.com

If you're using the _parent field already, then I think the result
grouping might already work correctly in a multi shard environment.
When the _parent field is used it makes sure that docs with the same
_parent value end up on the same shard and since
the grouping works locally the result grouping should work. I haven't
tested this, but I think it works.

Martijn

On 17 August 2012 11:00, Nicolas Blanc nicolas.blanc@blablacar.com
wrote:

Thx to all!

But i already try to find my way with multiple requests... And it's
really
complicated to obtain what i want in a decent time...

So, as i am crazy... And more! I just start to work on your base code
Martijn, to adapt it to a 'more than one shard' configuration. It's not
easy, but i think i really need this feature! Better than ask to my
frontend
dev team to code a ugly workaround. In a KISS environnement, the simple
and
stupid thing here is to code the missing feature! So let's go!

Thx all for your propositions, nice to see a lot of helpers in the
group.

Is existing somewhere some guidelines to follow when coding in
elasticsearch
? Didn't find any...

--
Nicolas BLANC.

Le vendredi 17 août 2012 10:35:36 UTC+2, Martijn v Groningen a écrit :

Hi Nicolas,

No, it doesn't work in a cluster with more than one shard. The result
grouping in my fork is more of an experiment.
Since you do have multiple shards doing a subsequent search that is
suggested by others is a good option to explore.

Martijn

On 17 August 2012 03:39, David Pilato da...@pilato.fr wrote:

Nice idea.

Or use multiget:
Elasticsearch Platform — Find real-time answers at scale | Elastic

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 17 août 2012 à 00:41, Yeah! victor....@gmail.com a écrit :

What if you'll try to use "ids" filter in second query? First query
will
return you a list of threads, then you in loop getting threads ids in
array
and perform the second filtered query (match_all) with "ids" filter.

Something like:

    $threads = $threadsIndex->search($threadsQuery);
    $ids = array();
    foreach ($threads as /** @var $thread Elastica_Document */

$thread)
{
$ids = $thread->getId();
}
$messagesQuery = new Elastica_Query_Filtered(
new Elastica_Query_MatchAll(),
new Elastica_Filter_Ids(null, $ids)
);

четверг, 16 августа 2012 г., 12:04:43 UTC+3 пользователь Nicolas
Blanc
написал:

Hi Martin,

Just follow your link, and i will give it a try!

Just a small question, i saw in your doc that i need to have only 1
shard!

As i group all my threads by shard (thx to parent/child), is your
grouping
working on multiple shard? (i mean all objects to group are inside
the
same
shard)

thx in advance,

--
Nicolas BLANC.

Le mercredi 15 août 2012 10:16:13 UTC+2, Martijn v Groningen a écrit
:

Hi Nicolas,

I did some work in the past for result grouping. It is not perfect,
but I think it something you can try out:
https://github.com/martijnvg/elasticsearch-with-local-grouping

Or you can patches:
GitHub - lusini/elasticsearch-grouping-patches: This repository yields patches for elasticsearch with grouping support

Martijn

--

--

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--

--
Met vriendelijke groet,

Martijn van Groningen

--

Oups... Do you think the API will stay inchanged ?

It's important for me to know if i need to prepare my team to a future API
change in the grouping support! And by the way, if you need some developer
help on the subject, my company let me work on these key feature. So i can
contribute ! Even if you just need tester i'm here :slight_smile:

Thx for everything (including time waste to answer me)

--
Nicolas BLANC.

2012/8/17 Martijn v Groningen martijn.v.groningen@gmail.com

You are not alone :slight_smile:
Field Collapsing/Combining · Issue #256 · elastic/elasticsearch · GitHub

Result grouping is going to be added to ES in the near future. Some
internal
refactoring need to happen first. Also the grouped response I have in my
fork
will change.

Martijn

On 17 August 2012 14:15, Nicolas Blanc nicolas.blanc@blablacar.com
wrote:

Just import your work on the master branch of elasticsearch... Seems to
work
out of the box (in my case) :slight_smile:
Need to adapt some facet code, but you did a very good job for an
experimental code :slight_smile:

Thx for this wonderfull piece of patches :slight_smile:

Just a last question... What's missing in your code, so it can be
accepted
in the master elasticsearch branch ? I feel i'm not alone to need these
feature, already coded by you...

--
Nicolas BLANC.

2012/8/17 Martijn v Groningen martijn.v.groningen@gmail.com

If you're using the _parent field already, then I think the result
grouping might already work correctly in a multi shard environment.
When the _parent field is used it makes sure that docs with the same
_parent value end up on the same shard and since
the grouping works locally the result grouping should work. I haven't
tested this, but I think it works.

Martijn

On 17 August 2012 11:00, Nicolas Blanc nicolas.blanc@blablacar.com
wrote:

Thx to all!

But i already try to find my way with multiple requests... And it's
really
complicated to obtain what i want in a decent time...

So, as i am crazy... And more! I just start to work on your base code
Martijn, to adapt it to a 'more than one shard' configuration. It's
not
easy, but i think i really need this feature! Better than ask to my
frontend
dev team to code a ugly workaround. In a KISS environnement, the
simple
and
stupid thing here is to code the missing feature! So let's go!

Thx all for your propositions, nice to see a lot of helpers in the
group.

Is existing somewhere some guidelines to follow when coding in
elasticsearch
? Didn't find any...

--
Nicolas BLANC.

Le vendredi 17 août 2012 10:35:36 UTC+2, Martijn v Groningen a écrit :

Hi Nicolas,

No, it doesn't work in a cluster with more than one shard. The result
grouping in my fork is more of an experiment.
Since you do have multiple shards doing a subsequent search that is
suggested by others is a good option to explore.

Martijn

On 17 August 2012 03:39, David Pilato da...@pilato.fr wrote:

Nice idea.

Or use multiget:
Elasticsearch Platform — Find real-time answers at scale | Elastic

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 17 août 2012 à 00:41, Yeah! victor....@gmail.com a écrit :

What if you'll try to use "ids" filter in second query? First query
will
return you a list of threads, then you in loop getting threads ids
in
array
and perform the second filtered query (match_all) with "ids"
filter.

Something like:

    $threads = $threadsIndex->search($threadsQuery);
    $ids = array();
    foreach ($threads as /** @var $thread Elastica_Document */

$thread)
{
$ids = $thread->getId();
}
$messagesQuery = new Elastica_Query_Filtered(
new Elastica_Query_MatchAll(),
new Elastica_Filter_Ids(null, $ids)
);

четверг, 16 августа 2012 г., 12:04:43 UTC+3 пользователь Nicolas
Blanc
написал:

Hi Martin,

Just follow your link, and i will give it a try!

Just a small question, i saw in your doc that i need to have only
1
shard!

As i group all my threads by shard (thx to parent/child), is your
grouping
working on multiple shard? (i mean all objects to group are inside
the
same
shard)

thx in advance,

--
Nicolas BLANC.

Le mercredi 15 août 2012 10:16:13 UTC+2, Martijn v Groningen a
écrit
:

Hi Nicolas,

I did some work in the past for result grouping. It is not
perfect,
but I think it something you can try out:
https://github.com/martijnvg/elasticsearch-with-local-grouping

Or you can patches:
GitHub - lusini/elasticsearch-grouping-patches: This repository yields patches for elasticsearch with grouping support

Martijn

--

--

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--

--
Met vriendelijke groet,

Martijn van Groningen

--

--

On 17 August 2012 15:17, Nicolas Blanc nicolas.blanc@blablacar.com wrote:

Oups... Do you think the API will stay inchanged ?
There are more result grouping related features that can be added.
Like grouping by script, the sort order of docs inside a group or
collecting aggregated group statistics (e.g. max price). At the time I
created result grouping in ES I didn't take this features into
account. To accomodate this features the request & response format
need to be changed and therefore it is not possible to keep the API in
experimental fork intact.

It's important for me to know if i need to prepare my team to a future API
change in the grouping support! And by the way, if you need some developer
help on the subject, my company let me work on these key feature. So i can
contribute ! Even if you just need tester i'm here :slight_smile:
Cool!

Martijn

--

Hi Martijn,

I actually work on your code, to extend it to all facets we used. But i
have some questions...

As i understand fully the code, facets are not calculated against the
topDocs obtained in the main query... But are re -played in
FacetPhase.execute(...) to my understanding.
So my first question is : am i right ?
If yes... I was working also in a way to be able to group globally. But
What i saw, is that filter+facets are applied in a contextSearch way (so
per Shard). And It will be very difficult to do a global job (filter and
facet) when grouping, if we are not able to search in the grouped topDocs.
Am i right too ?

Actually your code on terms string is really not easy to port to all
facets... My first problem is that you only accept String field type for
grouping. And we use a long one here... But i think i will be able to make
something more adaptable. And as you said, someone in my company asked me
to be able to do a special sort inside a group, so i understand the need
for API change. I can work on a full API proposal, if you think it can
help... Otherwise, i need to wait after you and Shay.

Thx in advance,

--
Nicolas BLANC.

2012/8/17 Martijn v Groningen martijn.v.groningen@gmail.com

On 17 August 2012 15:17, Nicolas Blanc nicolas.blanc@blablacar.com
wrote:

Oups... Do you think the API will stay inchanged ?
There are more result grouping related features that can be added.
Like grouping by script, the sort order of docs inside a group or
collecting aggregated group statistics (e.g. max price). At the time I
created result grouping in ES I didn't take this features into
account. To accomodate this features the request & response format
need to be changed and therefore it is not possible to keep the API in
experimental fork intact.

It's important for me to know if i need to prepare my team to a future
API
change in the grouping support! And by the way, if you need some
developer
help on the subject, my company let me work on these key feature. So i
can
contribute ! Even if you just need tester i'm here :slight_smile:
Cool!

Martijn

--

--

As i understand fully the code, facets are not calculated against the

topDocs obtained in the main query... But are re -played in
FacetPhase.execute(...) to my understanding.
So my first question is : am i right ?
Yes, but that is for all facets.

If yes... I was working also in a way to be able to group globally. But What
i saw, is that filter+facets are applied in a contextSearch way (so per
Shard). And It will be very difficult to do a global job (filter and facet)
when grouping, if we are not able to search in the grouped topDocs. Am i
right too ?
What do you mean with with a global job? All facets are computed per
shard, and the individual shard results are merged afterwards.

Actually your code on terms string is really not easy to port to all
facets... My first problem is that you only accept String field type for
grouping. And we use a long one here... But i think i will be able to make
something more adaptable.
At that time I only implemented faceting for string fields. But it shouldn't
be that hard to also do the same trick with longs in
TermsLongOrdinalsFacetCollector
and LongFieldData.

And as you said, someone in my company asked me to
be able to do a special sort inside a group, so i understand the need for
API change. I can work on a full API proposal, if you think it can help...
Otherwise, i need to wait after you and Shay.
At some point 'the sort inside a group' option will definitely be in the API.
I think it is good to know what you and other users think about result
grouping, so
if you have an idea how the result grouping api should look like, it would great
if you could share that.

Martijn

--

Thx for all your answer Martijn ! And for your time spent with me.

I read all of your work and blog (you and Mike McCandless). You are very
well integrated with grouping in lucene (+Solr +ES) as i saw.

What i understand from reading all informations from here and here, is that
i need to do grouping in 2 phases.

First i get all topGroups from shards, corresponding to the request. Then i
merge theses topGroups.

After that, i send in a second phase the search on all shards, with the
merged topGroups.

My first works on the subject are great (I created a new
TransportSearchGroupThenQueryAndFetchAction class)... But the main concern
is to be able to use "gouped facet". IE, facet counting groups, not
documents. At this point i do not see how to do a basic work for all
facets... It seems to me a hard pain to implement grouped facet for each
facets... If no other ways, i will do it, but it will take too many time. I
saw your work on string terms facet, and i'm a little bit afraid... Do you
know if in Solr it's done in each facet ?

--
Nicolas BLANC.

2012/8/28 Martijn v Groningen martijn.v.groningen@gmail.com

As i understand fully the code, facets are not calculated against the

topDocs obtained in the main query... But are re -played in
FacetPhase.execute(...) to my understanding.
So my first question is : am i right ?
Yes, but that is for all facets.

If yes... I was working also in a way to be able to group globally. But
What
i saw, is that filter+facets are applied in a contextSearch way (so per
Shard). And It will be very difficult to do a global job (filter and
facet)
when grouping, if we are not able to search in the grouped topDocs. Am i
right too ?
What do you mean with with a global job? All facets are computed per
shard, and the individual shard results are merged afterwards.

Actually your code on terms string is really not easy to port to all
facets... My first problem is that you only accept String field type for
grouping. And we use a long one here... But i think i will be able to
make
something more adaptable.
At that time I only implemented faceting for string fields. But it
shouldn't
be that hard to also do the same trick with longs in
TermsLongOrdinalsFacetCollector
and LongFieldData.

And as you said, someone in my company asked me to
be able to do a special sort inside a group, so i understand the need for
API change. I can work on a full API proposal, if you think it can
help...
Otherwise, i need to wait after you and Shay.
At some point 'the sort inside a group' option will definitely be in the
API.
I think it is good to know what you and other users think about result
grouping, so
if you have an idea how the result grouping api should look like, it would
great
if you could share that.

Martijn

--

--

What i understand from reading all informations from here and here, is that
i need to do grouping in 2 phases.
Well you don't have to. If all documents belonging to a group are
inside the same shard
then everything should just work.

My first works on the subject are great (I created a new
TransportSearchGroupThenQueryAndFetchAction class)... But the main concern
is to be able to use "gouped facet". IE, facet counting groups, not
documents. At this point i do not see how to do a basic work for all
facets... It seems to me a hard pain to implement grouped facet for each
facets... If no other ways, i will do it, but it will take too many time. I
saw your work on string terms facet, and i'm a little bit afraid... Do you
know if in Solr it's done in each facet ?
Solr has grouped facets, but grouped facet don't work with the 2 phase
distributed
grouping out-of-the-box. The grouped facet counts will be incorrect.
In order to have
correct grouped facet counts the user must index each doc belonging to the same
group into the same Solr shard. The reason that isn't supported is
that otherwise
all individual shard group / facet count combinations need to be send
over the wire
to the Solr shard that started the whole request and this can be very
expensive.

In a nutshell with the 2-pass distributed grouping only the top
matching groups of each
shard are send to the node were the overall request started in the first phase.
All the groups from the shards are merged and during the second phase
the top documents belonging to
the top merged groups are retrieved from the nodes that have groups
that made it in the top
merged groups (possibly during the second phase also group stats can
be computed).

Since ES has routing support I think there is no need for a 2 pass
distributed grouping
support. The result grouping can be performed locally.
Just like document with the _parent field, all these docs with the
same _parent field all end
up in the same shard and top_children query / has_child filter take
advantage of that.

Martijn

--