Search Across Multiple Indexes

Hi,

I have two indexes, they basically index notes. A note has a noteid,
title description, userid.

So my first index is notes(index)/note(mapping) with noteid

Second is note content which has list of note ids and note content and
id for this is sha1(title) and list of userids who have this note in
common. So note title is common between these two indexes.

All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.

I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the search
results.

But I do want to search across both of these indexes together so that
I can have relevant order.

Thanks

I don't really understand what is the problem you have with search across
the two indices...

On Fri, May 11, 2012 at 4:39 AM, Kinesh Satiya ksatiya@mysocialcloud.comwrote:

Hi,

I have two indexes, they basically index notes. A note has a noteid,
title description, userid.

So my first index is notes(index)/note(mapping) with noteid

Second is note content which has list of note ids and note content and
id for this is sha1(title) and list of userids who have this note in
common. So note title is common between these two indexes.

All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.

I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the search
results.

But I do want to search across both of these indexes together so that
I can have relevant order.

Thanks

On May 13, 2:06 pm, Shay Banon kim...@gmail.com wrote:

I don't really understand what is the problem you have with search across
the two indices...

For one, the relevance scores from different indexes seem to be
uncomparable.

In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are merged
into a single ranked list, it's really comparing apples to oranges.

On Fri, May 11, 2012 at 4:39 AM, Kinesh Satiya ksat...@mysocialcloud.comwrote:

Hi,

I have two indexes, they basically index notes. A note has a noteid,
title description, userid.

So my first index is notes(index)/note(mapping) with noteid

Second is note content which has list of note ids and note content and
id for this is sha1(title) and list of userids who have this note in
common. So note title is common between these two indexes.

All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.

I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the search
results.

But I do want to search across both of these indexes together so that
I can have relevant order.

Thanks

Actually, the idf is computed per shard (by default). You can use a search
type set to dfs_query_then_fetch, which will add another phase to teh
search to compute distributed frequencies (but note it wil be slower).

On Sun, May 13, 2012 at 8:40 PM, Crwe tester.testerus@gmail.com wrote:

On May 13, 2:06 pm, Shay Banon kim...@gmail.com wrote:

I don't really understand what is the problem you have with search across
the two indices...

For one, the relevance scores from different indexes seem to be
uncomparable.

In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are merged
into a single ranked list, it's really comparing apples to oranges.

On Fri, May 11, 2012 at 4:39 AM, Kinesh Satiya <
ksat...@mysocialcloud.com>wrote:

Hi,

I have two indexes, they basically index notes. A note has a noteid,
title description, userid.

So my first index is notes(index)/note(mapping) with noteid

Second is note content which has list of note ids and note content and
id for this is sha1(title) and list of userids who have this note in
common. So note title is common between these two indexes.

All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.

I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the search
results.

But I do want to search across both of these indexes together so that
I can have relevant order.

Thanks

Oh, so this can happen for a single index, too? I would be interested
in using the dfs_query_then_fetch then -- what does "slower" mean in
this context? Asymptotically slower? Constant factor (x #shards)
slower?

More generally, is there a resource page for ES which describes
performance implications of various settings (incl. refresh)?

Thank you in advance.

On May 15, 10:26 pm, Shay Banon kim...@gmail.com wrote:

Actually, the idf is computed per shard (by default). You can use a search
type set to dfs_query_then_fetch, which will add another phase to teh
search to compute distributed frequencies (but note it wil be slower).

On Sun, May 13, 2012 at 8:40 PM, Crwe tester.teste...@gmail.com wrote:

On May 13, 2:06 pm, Shay Banon kim...@gmail.com wrote:

I don't really understand what is the problem you have with search across
the two indices...

For one, the relevance scores from different indexes seem to be
uncomparable.

In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are merged
into a single ranked list, it's really comparing apples to oranges.

On Fri, May 11, 2012 at 4:39 AM, Kinesh Satiya <
ksat...@mysocialcloud.com>wrote:

Hi,

I have two indexes, they basically index notes. A note has a noteid,
title description, userid.

So my first index is notes(index)/note(mapping) with noteid

Second is note content which has list of note ids and note content and
id for this is sha1(title) and list of userids who have this note in
common. So note title is common between these two indexes.

All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.

I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the search
results.

But I do want to search across both of these indexes together so that
I can have relevant order.

Thanks

Hard to say how much slower, effectively, what happens is that the DFS
phase goes to all the relevant shards, and extracts the terms associated
with the query and their frequencies, it then aggregates those and execute
the search on each shard with the aggregated value.

On Sun, May 20, 2012 at 10:24 AM, Crwe tester.testerus@gmail.com wrote:

Oh, so this can happen for a single index, too? I would be interested
in using the dfs_query_then_fetch then -- what does "slower" mean in
this context? Asymptotically slower? Constant factor (x #shards)
slower?

More generally, is there a resource page for ES which describes
performance implications of various settings (incl. refresh)?

Thank you in advance.

On May 15, 10:26 pm, Shay Banon kim...@gmail.com wrote:

Actually, the idf is computed per shard (by default). You can use a
search
type set to dfs_query_then_fetch, which will add another phase to teh
search to compute distributed frequencies (but note it wil be slower).

On Sun, May 13, 2012 at 8:40 PM, Crwe tester.teste...@gmail.com wrote:

On May 13, 2:06 pm, Shay Banon kim...@gmail.com wrote:

I don't really understand what is the problem you have with search
across
the two indices...

For one, the relevance scores from different indexes seem to be
uncomparable.

In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are merged
into a single ranked list, it's really comparing apples to oranges.

On Fri, May 11, 2012 at 4:39 AM, Kinesh Satiya <
ksat...@mysocialcloud.com>wrote:

Hi,

I have two indexes, they basically index notes. A note has a
noteid,
title description, userid.

So my first index is notes(index)/note(mapping) with noteid

Second is note content which has list of note ids and note content
and
id for this is sha1(title) and list of userids who have this note
in
common. So note title is common between these two indexes.

All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.

I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the
search
results.

But I do want to search across both of these indexes together so
that
I can have relevant order.

Thanks

Ok, that sounds like each shard will be accessed twice, instead of
once. Unless the aggregation phase itself is the bottleneck, this
means about 0.5x performance hit (independent of #shards), which is
quite acceptable for me. Good news.

Thanks again Shay.

On May 21, 12:00 am, Shay Banon kim...@gmail.com wrote:

Hard to say how much slower, effectively, what happens is that the DFS
phase goes to all the relevant shards, and extracts the terms associated
with the query and their frequencies, it then aggregates those and execute
the search on each shard with the aggregated value.

On Sun, May 20, 2012 at 10:24 AM, Crwe tester.teste...@gmail.com wrote:

Oh, so this can happen for a single index, too? I would be interested
in using the dfs_query_then_fetch then -- what does "slower" mean in
this context? Asymptotically slower? Constant factor (x #shards)
slower?

More generally, is there a resource page for ES which describes
performance implications of various settings (incl. refresh)?

Thank you in advance.

On May 15, 10:26 pm, Shay Banon kim...@gmail.com wrote:

Actually, the idf is computed per shard (by default). You can use a
search
type set to dfs_query_then_fetch, which will add another phase to teh
search to compute distributed frequencies (but note it wil be slower).

On Sun, May 13, 2012 at 8:40 PM, Crwe tester.teste...@gmail.com wrote:

On May 13, 2:06 pm, Shay Banon kim...@gmail.com wrote:

I don't really understand what is the problem you have with search
across
the two indices...

For one, the relevance scores from different indexes seem to be
uncomparable.

In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are merged
into a single ranked list, it's really comparing apples to oranges.

On Fri, May 11, 2012 at 4:39 AM, Kinesh Satiya <
ksat...@mysocialcloud.com>wrote:

Hi,

I have two indexes, they basically index notes. A note has a
noteid,
title description, userid.

So my first index is notes(index)/note(mapping) with noteid

Second is note content which has list of note ids and note content
and
id for this is sha1(title) and list of userids who have this note
in
common. So note title is common between these two indexes.

All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.

I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the
search
results.

But I do want to search across both of these indexes together so
that
I can have relevant order.

Thanks

Since there is also the cost of "search" itself, it won't be 0.5x, it will
probably be lower, faster network here is important.

On Mon, May 21, 2012 at 5:55 PM, Crwe tester.testerus@gmail.com wrote:

Ok, that sounds like each shard will be accessed twice, instead of
once. Unless the aggregation phase itself is the bottleneck, this
means about 0.5x performance hit (independent of #shards), which is
quite acceptable for me. Good news.

Thanks again Shay.

On May 21, 12:00 am, Shay Banon kim...@gmail.com wrote:

Hard to say how much slower, effectively, what happens is that the DFS
phase goes to all the relevant shards, and extracts the terms associated
with the query and their frequencies, it then aggregates those and
execute
the search on each shard with the aggregated value.

On Sun, May 20, 2012 at 10:24 AM, Crwe tester.teste...@gmail.com
wrote:

Oh, so this can happen for a single index, too? I would be interested
in using the dfs_query_then_fetch then -- what does "slower" mean in
this context? Asymptotically slower? Constant factor (x #shards)
slower?

More generally, is there a resource page for ES which describes
performance implications of various settings (incl. refresh)?

Thank you in advance.

On May 15, 10:26 pm, Shay Banon kim...@gmail.com wrote:

Actually, the idf is computed per shard (by default). You can use a
search
type set to dfs_query_then_fetch, which will add another phase to teh
search to compute distributed frequencies (but note it wil be
slower).

On Sun, May 13, 2012 at 8:40 PM, Crwe tester.teste...@gmail.com
wrote:

On May 13, 2:06 pm, Shay Banon kim...@gmail.com wrote:

I don't really understand what is the problem you have with
search
across
the two indices...

For one, the relevance scores from different indexes seem to be
uncomparable.

In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are
merged
into a single ranked list, it's really comparing apples to oranges.

On Fri, May 11, 2012 at 4:39 AM, Kinesh Satiya <
ksat...@mysocialcloud.com>wrote:

Hi,

I have two indexes, they basically index notes. A note has a
noteid,
title description, userid.

So my first index is notes(index)/note(mapping) with noteid

Second is note content which has list of note ids and note
content
and
id for this is sha1(title) and list of userids who have this
note
in
common. So note title is common between these two indexes.

All I want to do is search across both of theses indexes, with
individual boost factors for title description and content
with a
given keyword.

I know how to do the individual searches with bool and adding
boost
factors.
Currently I first search on first index with the given keyword
and
then search on next index for content and then merge both the
search
results.

But I do want to search across both of these indexes together
so
that
I can have relevant order.

Thanks