comming from a strong RMDB background I still tend to think in a sql
manner, so Im sorry for bad terminology and misconceptions but im trying to
acomplish the following:
lets say I have an index called "data" and two types "blogs" and "comments"
"blogs" has the fields [blog_id, user, subject, body, postdate]
"comments" has the fields [comment_id, user, text, postdate, blog_id]
in this manner I can have a website where the user profile can have a link
to "blogs" and another link to "comments" where in each case a simple
search of the type and filtering the user gets me the results I want
now I have a special case where there is a search page and I search "blogs"
between a date range with user "aaa" and body "bbb", again so far so good
the problem is if I only want to list the relative comments of this result
set.
In other words I want to search typeA and return typeB where typeA.id =
typeB.fieldX
is this at all possible without restructuring and re-indexing everything,
im currently running elasticsearch 19.8 using the http api?
ES does not have anything similar to joins. You have to either denormalize
your data (e.g. comments inside the blog document), structure the data
using nested or parent/child format, or perform multiple queries.
The easiest solution is to perform a second query on the comments type
using blog_id as a filter. You'll have to perform a query for each result
returned from your first blog query, but these should be pretty quick
once the filter is in memory. You could also store document IDs of
individual comments inside the blog type, which will result in faster
lookups and less memory usage than filters (with the disadvantage of
needing to update the blog doc whenever a new comment is added).
-Zach
On Monday, January 14, 2013 8:10:24 AM UTC-5, GX wrote:
Hi All
comming from a strong RMDB background I still tend to think in a sql
manner, so Im sorry for bad terminology and misconceptions but im trying to
acomplish the following:
lets say I have an index called "data" and two types "blogs" and "comments"
"blogs" has the fields [blog_id, user, subject, body, postdate]
"comments" has the fields [comment_id, user, text, postdate, blog_id]
in this manner I can have a website where the user profile can have a link
to "blogs" and another link to "comments" where in each case a simple
search of the type and filtering the user gets me the results I want
now I have a special case where there is a search page and I search
"blogs" between a date range with user "aaa" and body "bbb", again so far
so good the problem is if I only want to list the relative comments of this
result set.
In other words I want to search typeA and return typeB where typeA.id =
typeB.fieldX
is this at all possible without restructuring and re-indexing everything,
im currently running elasticsearch 19.8 using the http api?
I did have the running two queries in mind, but we are talking of loads of
data (otherwise I wouldn't need a dedicated search engine right), so if I
query "blogs" and I get 10,000 hits, only 50 are returned due to paging,
enevn if I did return all 10k results, looping all that data and
constructing a second query will be inefficient.
The parent child approach sounds interesting, the docs give a simple
example:
but how do I stipulate that blog _tag is related to blog via filedA? (or am
I misunderstanding the use) which blog_tags belong to which blogs or is
blog tag not a type but a field?
Regards
GX
On Monday, January 14, 2013 5:30:02 PM UTC+2, Zachary Tong wrote:
ES does not have anything similar to joins. You have to either
denormalize your data (e.g. comments inside the blog document), structure
the data using nested or parent/child format, or perform multiple queries.
The easiest solution is to perform a second query on the comments type
using blog_id as a filter. You'll have to perform a query for each result
returned from your first blog query, but these should be pretty quick
once the filter is in memory. You could also store document IDs of
individual comments inside the blog type, which will result in faster
lookups and less memory usage than filters (with the disadvantage of
needing to update the blog doc whenever a new comment is added).
-Zach
On Monday, January 14, 2013 8:10:24 AM UTC-5, GX wrote:
Hi All
comming from a strong RMDB background I still tend to think in a sql
manner, so Im sorry for bad terminology and misconceptions but im trying to
acomplish the following:
lets say I have an index called "data" and two types "blogs" and
"comments"
"blogs" has the fields [blog_id, user, subject, body, postdate]
"comments" has the fields [comment_id, user, text, postdate, blog_id]
in this manner I can have a website where the user profile can have a
link to "blogs" and another link to "comments" where in each case a simple
search of the type and filtering the user gets me the results I want
now I have a special case where there is a search page and I search
"blogs" between a date range with user "aaa" and body "bbb", again so far
so good the problem is if I only want to list the relative comments of this
result set.
In other words I want to search typeA and return typeB where typeA.id =
typeB.fieldX
is this at all possible without restructuring and re-indexing everything,
im currently running elasticsearch 19.8 using the http api?
On Monday, January 14, 2013 8:49:56 PM UTC+2, GX wrote:
Zach thanks for your clarification
I did have the running two queries in mind, but we are talking of loads of
data (otherwise I wouldn't need a dedicated search engine right), so if I
query "blogs" and I get 10,000 hits, only 50 are returned due to paging,
enevn if I did return all 10k results, looping all that data and
constructing a second query will be inefficient.
but how do I stipulate that blog _tag is related to blog via filedA? (or
am I misunderstanding the use) which blog_tags belong to which blogs or is
blog tag not a type but a field?
Regards
GX
On Monday, January 14, 2013 5:30:02 PM UTC+2, Zachary Tong wrote:
ES does not have anything similar to joins. You have to either
denormalize your data (e.g. comments inside the blog document), structure
the data using nested or parent/child format, or perform multiple queries.
The easiest solution is to perform a second query on the comments type
using blog_id as a filter. You'll have to perform a query for each result
returned from your first blog query, but these should be pretty quick
once the filter is in memory. You could also store document IDs of
individual comments inside the blog type, which will result in faster
lookups and less memory usage than filters (with the disadvantage of
needing to update the blog doc whenever a new comment is added).
-Zach
On Monday, January 14, 2013 8:10:24 AM UTC-5, GX wrote:
Hi All
comming from a strong RMDB background I still tend to think in a sql
manner, so Im sorry for bad terminology and misconceptions but im trying to
acomplish the following:
lets say I have an index called "data" and two types "blogs" and
"comments"
"blogs" has the fields [blog_id, user, subject, body, postdate]
"comments" has the fields [comment_id, user, text, postdate, blog_id]
in this manner I can have a website where the user profile can have a
link to "blogs" and another link to "comments" where in each case a simple
search of the type and filtering the user gets me the results I want
now I have a special case where there is a search page and I search
"blogs" between a date range with user "aaa" and body "bbb", again so far
so good the problem is if I only want to list the relative comments of this
result set.
In other words I want to search typeA and return typeB where typeA.id =
typeB.fieldX
is this at all possible without restructuring and re-indexing
everything, im currently running elasticsearch 19.8 using the http api?
I'm not sure the performance hit would be terrible, especially if you
include document IDs in the blog type. Direct ID lookups in ES are very fast.
I asked Shay about this a while ago, and Shay said that direct ID lookups
can be expected to be a bit faster than B-Trees
(https://groups.google.com/d/msg/elasticsearch/hx0D2rKWr0s/EnTrQVTQEH8J).
If you setup your routing correctly (such that comments are stored on the
same shard as the blog doc), then you are effectively recreating exactly
how ES does parent/child nesting internally. Although I suppose you'll pay
for network round-trips.
I use the this approach on an index with ~250,000 docs and performance hit
seems to be negligible. Not sure how well it scales, but for my traffic it
works fine.
For parent/child (and nested too)...you posted the link that I had in mind
=) If that solution works for you, it'll be a lot easier than doing what I
outlined above.
-Zach
On Monday, January 14, 2013 1:49:56 PM UTC-5, GX wrote:
Zach thanks for your clarification
I did have the running two queries in mind, but we are talking of loads of
data (otherwise I wouldn't need a dedicated search engine right), so if I
query "blogs" and I get 10,000 hits, only 50 are returned due to paging,
enevn if I did return all 10k results, looping all that data and
constructing a second query will be inefficient.
but how do I stipulate that blog _tag is related to blog via filedA? (or
am I misunderstanding the use) which blog_tags belong to which blogs or is
blog tag not a type but a field?
Regards
GX
On Monday, January 14, 2013 5:30:02 PM UTC+2, Zachary Tong wrote:
ES does not have anything similar to joins. You have to either
denormalize your data (e.g. comments inside the blog document), structure
the data using nested or parent/child format, or perform multiple queries.
The easiest solution is to perform a second query on the comments type
using blog_id as a filter. You'll have to perform a query for each result
returned from your first blog query, but these should be pretty quick
once the filter is in memory. You could also store document IDs of
individual comments inside the blog type, which will result in faster
lookups and less memory usage than filters (with the disadvantage of
needing to update the blog doc whenever a new comment is added).
-Zach
On Monday, January 14, 2013 8:10:24 AM UTC-5, GX wrote:
Hi All
comming from a strong RMDB background I still tend to think in a sql
manner, so Im sorry for bad terminology and misconceptions but im trying to
acomplish the following:
lets say I have an index called "data" and two types "blogs" and
"comments"
"blogs" has the fields [blog_id, user, subject, body, postdate]
"comments" has the fields [comment_id, user, text, postdate, blog_id]
in this manner I can have a website where the user profile can have a
link to "blogs" and another link to "comments" where in each case a simple
search of the type and filtering the user gets me the results I want
now I have a special case where there is a search page and I search
"blogs" between a date range with user "aaa" and body "bbb", again so far
so good the problem is if I only want to list the relative comments of this
result set.
In other words I want to search typeA and return typeB where typeA.id =
typeB.fieldX
is this at all possible without restructuring and re-indexing
everything, im currently running elasticsearch 19.8 using the http api?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.