I have two indexes, they basically index notes. A note has a noteid,
title description, userid.
So my first index is notes(index)/note(mapping) with noteid
Second is note content which has list of note ids and note content and
id for this is sha1(title) and list of userids who have this note in
common. So note title is common between these two indexes.
All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.
I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the search
results.
But I do want to search across both of these indexes together so that
I can have relevant order.
I have two indexes, they basically index notes. A note has a noteid,
title description, userid.
So my first index is notes(index)/note(mapping) with noteid
Second is note content which has list of note ids and note content and
id for this is sha1(title) and list of userids who have this note in
common. So note title is common between these two indexes.
All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.
I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the search
results.
But I do want to search across both of these indexes together so that
I can have relevant order.
I don't really understand what is the problem you have with search across
the two indices...
For one, the relevance scores from different indexes seem to be
uncomparable.
In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are merged
into a single ranked list, it's really comparing apples to oranges.
I have two indexes, they basically index notes. A note has a noteid,
title description, userid.
So my first index is notes(index)/note(mapping) with noteid
Second is note content which has list of note ids and note content and
id for this is sha1(title) and list of userids who have this note in
common. So note title is common between these two indexes.
All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.
I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the search
results.
But I do want to search across both of these indexes together so that
I can have relevant order.
Actually, the idf is computed per shard (by default). You can use a search
type set to dfs_query_then_fetch, which will add another phase to teh
search to compute distributed frequencies (but note it wil be slower).
I don't really understand what is the problem you have with search across
the two indices...
For one, the relevance scores from different indexes seem to be
uncomparable.
In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are merged
into a single ranked list, it's really comparing apples to oranges.
I have two indexes, they basically index notes. A note has a noteid,
title description, userid.
So my first index is notes(index)/note(mapping) with noteid
Second is note content which has list of note ids and note content and
id for this is sha1(title) and list of userids who have this note in
common. So note title is common between these two indexes.
All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.
I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the search
results.
But I do want to search across both of these indexes together so that
I can have relevant order.
Oh, so this can happen for a single index, too? I would be interested
in using the dfs_query_then_fetch then -- what does "slower" mean in
this context? Asymptotically slower? Constant factor (x #shards)
slower?
More generally, is there a resource page for ES which describes
performance implications of various settings (incl. refresh)?
Actually, the idf is computed per shard (by default). You can use a search
type set to dfs_query_then_fetch, which will add another phase to teh
search to compute distributed frequencies (but note it wil be slower).
I don't really understand what is the problem you have with search across
the two indices...
For one, the relevance scores from different indexes seem to be
uncomparable.
In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are merged
into a single ranked list, it's really comparing apples to oranges.
I have two indexes, they basically index notes. A note has a noteid,
title description, userid.
So my first index is notes(index)/note(mapping) with noteid
Second is note content which has list of note ids and note content and
id for this is sha1(title) and list of userids who have this note in
common. So note title is common between these two indexes.
All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.
I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the search
results.
But I do want to search across both of these indexes together so that
I can have relevant order.
Hard to say how much slower, effectively, what happens is that the DFS
phase goes to all the relevant shards, and extracts the terms associated
with the query and their frequencies, it then aggregates those and execute
the search on each shard with the aggregated value.
Oh, so this can happen for a single index, too? I would be interested
in using the dfs_query_then_fetch then -- what does "slower" mean in
this context? Asymptotically slower? Constant factor (x #shards)
slower?
More generally, is there a resource page for ES which describes
performance implications of various settings (incl. refresh)?
Actually, the idf is computed per shard (by default). You can use a
search
type set to dfs_query_then_fetch, which will add another phase to teh
search to compute distributed frequencies (but note it wil be slower).
I don't really understand what is the problem you have with search
across
the two indices...
For one, the relevance scores from different indexes seem to be
uncomparable.
In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are merged
into a single ranked list, it's really comparing apples to oranges.
I have two indexes, they basically index notes. A note has a
noteid,
title description, userid.
So my first index is notes(index)/note(mapping) with noteid
Second is note content which has list of note ids and note content
and
id for this is sha1(title) and list of userids who have this note
in
common. So note title is common between these two indexes.
All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.
I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the
search
results.
But I do want to search across both of these indexes together so
that
I can have relevant order.
Ok, that sounds like each shard will be accessed twice, instead of
once. Unless the aggregation phase itself is the bottleneck, this
means about 0.5x performance hit (independent of #shards), which is
quite acceptable for me. Good news.
Hard to say how much slower, effectively, what happens is that the DFS
phase goes to all the relevant shards, and extracts the terms associated
with the query and their frequencies, it then aggregates those and execute
the search on each shard with the aggregated value.
Oh, so this can happen for a single index, too? I would be interested
in using the dfs_query_then_fetch then -- what does "slower" mean in
this context? Asymptotically slower? Constant factor (x #shards)
slower?
More generally, is there a resource page for ES which describes
performance implications of various settings (incl. refresh)?
Actually, the idf is computed per shard (by default). You can use a
search
type set to dfs_query_then_fetch, which will add another phase to teh
search to compute distributed frequencies (but note it wil be slower).
I don't really understand what is the problem you have with search
across
the two indices...
For one, the relevance scores from different indexes seem to be
uncomparable.
In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are merged
into a single ranked list, it's really comparing apples to oranges.
I have two indexes, they basically index notes. A note has a
noteid,
title description, userid.
So my first index is notes(index)/note(mapping) with noteid
Second is note content which has list of note ids and note content
and
id for this is sha1(title) and list of userids who have this note
in
common. So note title is common between these two indexes.
All I want to do is search across both of theses indexes, with
individual boost factors for title description and content with a
given keyword.
I know how to do the individual searches with bool and adding boost
factors.
Currently I first search on first index with the given keyword and
then search on next index for content and then merge both the
search
results.
But I do want to search across both of these indexes together so
that
I can have relevant order.
Ok, that sounds like each shard will be accessed twice, instead of
once. Unless the aggregation phase itself is the bottleneck, this
means about 0.5x performance hit (independent of #shards), which is
quite acceptable for me. Good news.
Hard to say how much slower, effectively, what happens is that the DFS
phase goes to all the relevant shards, and extracts the terms associated
with the query and their frequencies, it then aggregates those and
execute
the search on each shard with the aggregated value.
Oh, so this can happen for a single index, too? I would be interested
in using the dfs_query_then_fetch then -- what does "slower" mean in
this context? Asymptotically slower? Constant factor (x #shards)
slower?
More generally, is there a resource page for ES which describes
performance implications of various settings (incl. refresh)?
Actually, the idf is computed per shard (by default). You can use a
search
type set to dfs_query_then_fetch, which will add another phase to teh
search to compute distributed frequencies (but note it wil be
slower).
I don't really understand what is the problem you have with
search
across
the two indices...
For one, the relevance scores from different indexes seem to be
uncomparable.
In particular, the idf values seem to be calculated for each index
separately. This means that when results from two indexes are
merged
into a single ranked list, it's really comparing apples to oranges.
I have two indexes, they basically index notes. A note has a
noteid,
title description, userid.
So my first index is notes(index)/note(mapping) with noteid
Second is note content which has list of note ids and note
content
and
id for this is sha1(title) and list of userids who have this
note
in
common. So note title is common between these two indexes.
All I want to do is search across both of theses indexes, with
individual boost factors for title description and content
with a
given keyword.
I know how to do the individual searches with bool and adding
boost
factors.
Currently I first search on first index with the given keyword
and
then search on next index for content and then merge both the
search
results.
But I do want to search across both of these indexes together
so
that
I can have relevant order.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.