Bucketing elasticsearch results?


(Jon-Paul Lussier) #1

Hey, first of all, I'd like to preface this by saying I'm fairly
inexperienced with Elasticsearch or implementing search in general;
I'm an intermediate ruby/rails developer using the Tire gem; and this
question has been a headache for at least 8 hours.

I need to essentially, query the 'most relevant N(where N is a small
number, 3-7) 'listing' documents for each 'organization'' -- ideally,
this would be some kind of bucketed result set(e.g. hits =>
{:organization_1 => {:matching_listings => '...' }, :organization_2 =>
{:etc => '...'}} )

I'm not sure if elasticsearch /can/ do this, it seems like I might
need to get facet counts for my query over a global scope, and run the
same query to return a limited number of documents for /each/ facet
term.

Am I totally off base here? Is there something more/less built in to
Elasticsearch to enable this?

Appreciate any help you guys can extend; hopefully I stumble on to a
reasonable solution soon. Thanks.


(vineeth mohan) #2

What is your criteria for "most relevant N" documents.
As in what qualifies a document to be relevent ?

Thanks
Vineeth

On Fri, Oct 14, 2011 at 8:08 PM, Jon-Paul Lussier <jonpaul.lussier@gmail.com

wrote:

Hey, first of all, I'd like to preface this by saying I'm fairly
inexperienced with Elasticsearch or implementing search in general;
I'm an intermediate ruby/rails developer using the Tire gem; and this
question has been a headache for at least 8 hours.

I need to essentially, query the 'most relevant N(where N is a small
number, 3-7) 'listing' documents for each 'organization'' -- ideally,
this would be some kind of bucketed result set(e.g. hits =>
{:organization_1 => {:matching_listings => '...' }, :organization_2 =>
{:etc => '...'}} )

I'm not sure if elasticsearch /can/ do this, it seems like I might
need to get facet counts for my query over a global scope, and run the
same query to return a limited number of documents for /each/ facet
term.

Am I totally off base here? Is there something more/less built in to
Elasticsearch to enable this?

Appreciate any help you guys can extend; hopefully I stumble on to a
reasonable solution soon. Thanks.


(vineeth mohan) #3

Are you looking for something like "group by" in mysql.

On Fri, Oct 14, 2011 at 8:24 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

What is your criteria for "most relevant N" documents.
As in what qualifies a document to be relevent ?

Thanks
Vineeth

On Fri, Oct 14, 2011 at 8:08 PM, Jon-Paul Lussier <
jonpaul.lussier@gmail.com> wrote:

Hey, first of all, I'd like to preface this by saying I'm fairly
inexperienced with Elasticsearch or implementing search in general;
I'm an intermediate ruby/rails developer using the Tire gem; and this
question has been a headache for at least 8 hours.

I need to essentially, query the 'most relevant N(where N is a small
number, 3-7) 'listing' documents for each 'organization'' -- ideally,
this would be some kind of bucketed result set(e.g. hits =>
{:organization_1 => {:matching_listings => '...' }, :organization_2 =>
{:etc => '...'}} )

I'm not sure if elasticsearch /can/ do this, it seems like I might
need to get facet counts for my query over a global scope, and run the
same query to return a limited number of documents for /each/ facet
term.

Am I totally off base here? Is there something more/less built in to
Elasticsearch to enable this?

Appreciate any help you guys can extend; hopefully I stumble on to a
reasonable solution soon. Thanks.


(Jon-Paul Lussier) #4

The query parameters from a user, in this case. The 'most' qualified
would be the highest scoring.

On Oct 14, 10:57 am, Vineeth Mohan vineethmo...@algotree.com wrote:

Are you looking for something like "group by" in mysql.

On Fri, Oct 14, 2011 at 8:24 PM, Vineeth Mohan vineethmo...@algotree.comwrote:

What is your criteria for "most relevant N" documents.
As in what qualifies a document to be relevent ?

Thanks
Vineeth

On Fri, Oct 14, 2011 at 8:08 PM, Jon-Paul Lussier <
jonpaul.luss...@gmail.com> wrote:

Hey, first of all, I'd like to preface this by saying I'm fairly
inexperienced with Elasticsearch or implementing search in general;
I'm an intermediate ruby/rails developer using the Tire gem; and this
question has been a headache for at least 8 hours.

I need to essentially, query the 'most relevant N(where N is a small
number, 3-7) 'listing' documents for each 'organization'' -- ideally,
this would be some kind of bucketed result set(e.g. hits =>
{:organization_1 => {:matching_listings => '...' }, :organization_2 =>
{:etc => '...'}} )

I'm not sure if elasticsearch /can/ do this, it seems like I might
need to get facet counts for my query over a global scope, and run the
same query to return a limited number of documents for /each/ facet
term.

Am I totally off base here? Is there something more/less built in to
Elasticsearch to enable this?

Appreciate any help you guys can extend; hopefully I stumble on to a
reasonable solution soon. Thanks.


(Jon-Paul Lussier) #5

I believe I'm looking for something that would give me the same result
as group by (more or less, I'm not particuarly adept with data
stores); though I'm honestly not exactly sure.

On Oct 14, 10:57 am, Vineeth Mohan vineethmo...@algotree.com wrote:

Are you looking for something like "group by" in mysql.

On Fri, Oct 14, 2011 at 8:24 PM, Vineeth Mohan vineethmo...@algotree.comwrote:

What is your criteria for "most relevant N" documents.
As in what qualifies a document to be relevent ?

Thanks
Vineeth

On Fri, Oct 14, 2011 at 8:08 PM, Jon-Paul Lussier <
jonpaul.luss...@gmail.com> wrote:

Hey, first of all, I'd like to preface this by saying I'm fairly
inexperienced with Elasticsearch or implementing search in general;
I'm an intermediate ruby/rails developer using the Tire gem; and this
question has been a headache for at least 8 hours.

I need to essentially, query the 'most relevant N(where N is a small
number, 3-7) 'listing' documents for each 'organization'' -- ideally,
this would be some kind of bucketed result set(e.g. hits =>
{:organization_1 => {:matching_listings => '...' }, :organization_2 =>
{:etc => '...'}} )

I'm not sure if elasticsearch /can/ do this, it seems like I might
need to get facet counts for my query over a global scope, and run the
same query to return a limited number of documents for /each/ facet
term.

Am I totally off base here? Is there something more/less built in to
Elasticsearch to enable this?

Appreciate any help you guys can extend; hopefully I stumble on to a
reasonable solution soon. Thanks.


(vineeth mohan) #6

I am not sure if we can get this using a single hit to ES.

one way wud b using faceted search.
Using faceted search you will get the companies which are included in the
results.
Once you get this , you can again first request for each company with the
same search query. HTH

Thanks
Vineeth

On Fri, Oct 14, 2011 at 8:53 PM, Jon-Paul Lussier <jonpaul.lussier@gmail.com

wrote:

I believe I'm looking for something that would give me the same result
as group by (more or less, I'm not particuarly adept with data
stores); though I'm honestly not exactly sure.

On Oct 14, 10:57 am, Vineeth Mohan vineethmo...@algotree.com wrote:

Are you looking for something like "group by" in mysql.

On Fri, Oct 14, 2011 at 8:24 PM, Vineeth Mohan <
vineethmo...@algotree.com>wrote:

What is your criteria for "most relevant N" documents.
As in what qualifies a document to be relevent ?

Thanks
Vineeth

On Fri, Oct 14, 2011 at 8:08 PM, Jon-Paul Lussier <
jonpaul.luss...@gmail.com> wrote:

Hey, first of all, I'd like to preface this by saying I'm fairly
inexperienced with Elasticsearch or implementing search in general;
I'm an intermediate ruby/rails developer using the Tire gem; and this
question has been a headache for at least 8 hours.

I need to essentially, query the 'most relevant N(where N is a small
number, 3-7) 'listing' documents for each 'organization'' -- ideally,
this would be some kind of bucketed result set(e.g. hits =>
{:organization_1 => {:matching_listings => '...' }, :organization_2 =>
{:etc => '...'}} )

I'm not sure if elasticsearch /can/ do this, it seems like I might
need to get facet counts for my query over a global scope, and run the
same query to return a limited number of documents for /each/ facet
term.

Am I totally off base here? Is there something more/less built in to
Elasticsearch to enable this?

Appreciate any help you guys can extend; hopefully I stumble on to a
reasonable solution soon. Thanks.


(Jon-Paul Lussier) #7

Fair enough, this is what I'm thinking will have to happen. It's a
little unlucky that I have to run the query twice, but so far
Elasticsearch /seems/ fast enough that it won't matter at all.

Thanks a lot for your time and input; if anyone else has some ideas,
feel free to share them. Appreciated.

On Oct 14, 12:49 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

I am not sure if we can get this using a single hit to ES.

one way wud b using faceted search.
Using faceted search you will get the companies which are included in the
results.
Once you get this , you can again first request for each company with the
same search query. HTH

Thanks
Vineeth

On Fri, Oct 14, 2011 at 8:53 PM, Jon-Paul Lussier <jonpaul.luss...@gmail.com

wrote:
I believe I'm looking for something that would give me the same result
as group by (more or less, I'm not particuarly adept with data
stores); though I'm honestly not exactly sure.

On Oct 14, 10:57 am, Vineeth Mohan vineethmo...@algotree.com wrote:

Are you looking for something like "group by" in mysql.

On Fri, Oct 14, 2011 at 8:24 PM, Vineeth Mohan <
vineethmo...@algotree.com>wrote:

What is your criteria for "most relevant N" documents.
As in what qualifies a document to be relevent ?

Thanks
Vineeth

On Fri, Oct 14, 2011 at 8:08 PM, Jon-Paul Lussier <
jonpaul.luss...@gmail.com> wrote:

Hey, first of all, I'd like to preface this by saying I'm fairly
inexperienced with Elasticsearch or implementing search in general;
I'm an intermediate ruby/rails developer using the Tire gem; and this
question has been a headache for at least 8 hours.

I need to essentially, query the 'most relevant N(where N is a small
number, 3-7) 'listing' documents for each 'organization'' -- ideally,
this would be some kind of bucketed result set(e.g. hits =>
{:organization_1 => {:matching_listings => '...' }, :organization_2 =>
{:etc => '...'}} )

I'm not sure if elasticsearch /can/ do this, it seems like I might
need to get facet counts for my query over a global scope, and run the
same query to return a limited number of documents for /each/ facet
term.

Am I totally off base here? Is there something more/less built in to
Elasticsearch to enable this?

Appreciate any help you guys can extend; hopefully I stumble on to a
reasonable solution soon. Thanks.


(Karussell) #8

this needs to be implemented in an efficient manner (but can be simply
implemented on the client side):


(system) #9