Query referenced documents


(Liron Cohen) #1

Hello,

This is my first post here and I'm quite new to ES.

After doing A LOT of digging I still couldn't find a good way to implement
what I need.
Let's say I have these documents:
curl -XGET http://localhost:9200/test/customers/1
http://localhost:9200/test/customers/1/_source?pretty=true{"_index":
"test","_type": "customers","_id": "1","_version": 1,"exists": true,
"_source": {"Name": "MyCust1"}}

curl -XGET http://localhost:9200/test/invoices/1
{"_index": "test","_type": "invoices","_id": "1","_version": 1,"exists":
true,"_source": {"custId": "1"}}
(these are examples of course)

As you can see, the invoice document has a field (named: custId, with value
of 1) pointing to a custoer document with _id = 1.

Of course customer in not a child document of invoices and it'll be used as
a reference in other documents as well.

How can I query the invoices with a certain customer name?
What is the mapping I need to do?
Is it all possible?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

Index invoice with all customer data.
It really makes sense here to build it as a single document as you want to take a 'snapshot' of the invoice when it's generated, right?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 23 sept. 2013 à 12:02, Kido cliron1@gmail.com a écrit :

Hello,

This is my first post here and I'm quite new to ES.

After doing A LOT of digging I still couldn't find a good way to implement what I need.
Let's say I have these documents:
curl -XGET http://localhost:9200/test/customers/1
{"_index": "test","_type": "customers","_id": "1","_version": 1,"exists": true,"_source": {"Name": "MyCust1"}}

curl -XGET http://localhost:9200/test/invoices/1
{"_index": "test","_type": "invoices","_id": "1","_version": 1,"exists": true,"_source": {"custId": "1"}}
(these are examples of course)

As you can see, the invoice document has a field (named: custId, with value of 1) pointing to a custoer document with _id = 1.

Of course customer in not a child document of invoices and it'll be used as a reference in other documents as well.

How can I query the invoices with a certain customer name?
What is the mapping I need to do?
Is it all possible?

Thanks

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Liron Cohen) #3

Hey David,

Visited your site earlier :slight_smile:
Thank you for answering.

As I mentioned: "customer in not a child document of invoices and it'll be
used as a reference in other documents as well."
But even if so, customer has many invoices and once the customer's address
is change, I'll need to sync all referencing docs (like invoices) with the
new details.
If it helps, I can give another example: a tourist index; Locations and
Hotels docs. A location (doc) is used in hotel (doc), but also for
Reservation (doc), BusRoute (doc) etc.
What I'm looking for (if it's possible) is a way to reference docs in other
docs and then be able to query by them.
Something like the "parent field" (
http://www.elasticsearch.org/guide/reference/mapping/parent-field/).
Actually, after reading about it immediately searched for "child field"
(which would be a solution for my problem), but couldn't find it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Liron Cohen) #4

Okay... after doing some more research I reached the conclusion that it's
not possible in ES (which is a shame, I still think the ES team should
consider adding a _child field).
So I read thishttp://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ and
reached the "Denormalization" section and it got me thinking.
Most of the referenced docs are pretty small (relatively) and rarely change
so why not include their source in the referring doc and use _updatehttp://www.elasticsearch.org/guide/reference/api/update/ (with
partial doc) to denormalize them when changes occur?
Tested it out and it works great.

Now, the question is how will the _update perform on massive (well, I'm
exaggerating) scale.
Let's say, the referred doc is referenced 1,000 times in the index, in
different types.
Has anyone tried this sort of thing? Any issues I should consider?

Is there another way of denormalize data?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Martijn Van Groningen) #5

Hi Kido,

De-normalizing your data is a valid approach for this use case. The
downside if something is referenced N times you will need to update /
reindex N times.
You can also use the parent / child support the comes with elasticsearch,
which will eliminate to issue mentioned in the previous sentence. Multiple
child types can point to the same parent type.

If you use the p/c queries you either get back the parent hits or the child
hits. Including the top child hits per parent hit or parent hit per child
hit is on the road map. In the mean time you can use a work around. In
addition to your regular request (with a has_child or has_parent query) you
can use the msearch api to retrieve the top children or parent for each
hit. For each hit you will add an search request to the msearch api, that
will for example fetch the top children for that particular parent hit.
This way with just one additional request to your cluster you can get the
job done.

Martijn

On 23 September 2013 17:13, Kido cliron1@gmail.com wrote:

Okay... after doing some more research I reached the conclusion that
it's not possible in ES (which is a shame, I still think the ES team should
consider adding a _child field).
So I read thishttp://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ and
reached the "Denormalization" section and it got me thinking.
Most of the referenced docs are pretty small (relatively) and rarely
change so why not include their source in the referring doc and use
_update http://www.elasticsearch.org/guide/reference/api/update/ (with
partial doc) to denormalize them when changes occur?
Tested it out and it works great.

Now, the question is how will the _update perform on massive (well, I'm
exaggerating) scale.
Let's say, the referred doc is referenced 1,000 times in the index, in
different types.
Has anyone tried this sort of thing? Any issues I should consider?

Is there another way of denormalize data?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Amit Soni) #6

I have a similar use case and I am feeling reluctant to de-normalize the
data since I might have hundreds of thousands of documents that would need
an update as a result of an update to a dependent document.

I am still exploring what would be the right way to solve this use case and
whether parent-child functionality is the right way to go.

-Amit.

On Wed, Sep 25, 2013 at 4:11 AM, Martijn v Groningen <
martijn.v.groningen@gmail.com> wrote:

Hi Kido,

De-normalizing your data is a valid approach for this use case. The
downside if something is referenced N times you will need to update /
reindex N times.
You can also use the parent / child support the comes with elasticsearch,
which will eliminate to issue mentioned in the previous sentence. Multiple
child types can point to the same parent type.

If you use the p/c queries you either get back the parent hits or the
child hits. Including the top child hits per parent hit or parent hit per
child hit is on the road map. In the mean time you can use a work around.
In addition to your regular request (with a has_child or has_parent query)
you can use the msearch api to retrieve the top children or parent for each
hit. For each hit you will add an search request to the msearch api, that
will for example fetch the top children for that particular parent hit.
This way with just one additional request to your cluster you can get the
job done.

Martijn

On 23 September 2013 17:13, Kido cliron1@gmail.com wrote:

Okay... after doing some more research I reached the conclusion that
it's not possible in ES (which is a shame, I still think the ES team should
consider adding a _child field).
So I read thishttp://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ and
reached the "Denormalization" section and it got me thinking.
Most of the referenced docs are pretty small (relatively) and rarely
change so why not include their source in the referring doc and use
_update http://www.elasticsearch.org/guide/reference/api/update/ (with
partial doc) to denormalize them when changes occur?
Tested it out and it works great.

Now, the question is how will the _update perform on massive (well, I'm
exaggerating) scale.
Let's say, the referred doc is referenced 1,000 times in the index, in
different types.
Has anyone tried this sort of thing? Any issues I should consider?

Is there another way of denormalize data?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7