How to correctly store the parent/child structure in the ES


(Павел Поляков) #1

Hi,

I have the next structure in my mysql database.

https://lh4.googleusercontent.com/-UZJmzHN7BLM/UzMSK5WNrSI/AAAAAAAAAlY/I1lfQfw-IR8/s1600/Screen+Shot+2014-03-26+at+7.44.46+PM.png

What I want to do is to copy the content of the database to the
elasticsearch server. Main goal is to search through the transactions and
use the facets option.

Kind of this (currently it's implemented using the mysql):

https://lh6.googleusercontent.com/-v-vR0z7g98k/UzMSsZABwaI/AAAAAAAAAlg/Kt_AuN2Lhe4/s1600/Screen+Shot+2014-03-26+at+7.46.55+PM.png

The issue is - how should I store the documents in the elasticsearch, so
they are reindexed quickly.
In the future, I would need to update the ES index as soon as transaction
or bank_account data is updated.

I've checked all the available options of the
ES http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
and decided to use the parent/child one.
I've created one index and two types - transaction and bank_account. Where
the transaction is the child of the bank_account.

But there are open questions:

1. How could I query the ES, using the "has_parent" option so it would
return not only the childs but the information about the parent also.
In the results I need to receive the object where the fields country,
currency and name would be available.
Currently I've managed to receive only the _source fields from the
transaction and the _parent field which is the id of the bank account

2. How should I query the ES to receive the facets on the fields country,
currency and name?

3. Which other structure could I use so the reindexing is happening
quickly? The case is that I have 160000 transactions and 20 banks. If I
would store the information about the bank directly in the transaction
object, that means that if I would change the country of the bank, then I
would need to reindex nearly 40000 documents (for example). Which is not
acceptable.

I'm also not able to use the "nested object" concept - to store the
transactions inside the bank account, because that way I would not be able
to insert the transactions to that bank dynamically, each new insert would
also cause the reindexing of the whole document.

Any thoughts?

Regards,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a0fba83-6e0b-4e0d-aefd-b66bca5fded3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Павел Поляков) #2

Still interesting to get your opinion about the question.

среда, 26 марта 2014 г., 19:55:29 UTC+2 пользователь Павел Поляков написал:

Hi,

I have the next structure in my mysql database.

https://lh4.googleusercontent.com/-UZJmzHN7BLM/UzMSK5WNrSI/AAAAAAAAAlY/I1lfQfw-IR8/s1600/Screen+Shot+2014-03-26+at+7.44.46+PM.png

What I want to do is to copy the content of the database to the
elasticsearch server. Main goal is to search through the transactions and
use the facets option.

Kind of this (currently it's implemented using the mysql):

https://lh6.googleusercontent.com/-v-vR0z7g98k/UzMSsZABwaI/AAAAAAAAAlg/Kt_AuN2Lhe4/s1600/Screen+Shot+2014-03-26+at+7.46.55+PM.png

The issue is - how should I store the documents in the elasticsearch, so
they are reindexed quickly.
In the future, I would need to update the ES index as soon as transaction
or bank_account data is updated.

I've checked all the available options of the ES
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/and decided to use the parent/child one.
I've created one index and two types - transaction and bank_account. Where
the transaction is the child of the bank_account.

But there are open questions:

1. How could I query the ES, using the "has_parent" option so it would
return not only the childs but the information about the parent also.
In the results I need to receive the object where the fields country,
currency and name would be available.
Currently I've managed to receive only the _source fields from the
transaction and the _parent field which is the id of the bank account

2. How should I query the ES to receive the facets on the fields
country, currency and name?

3. Which other structure could I use so the reindexing is happening
quickly? The case is that I have 160000 transactions and 20 banks. If I
would store the information about the bank directly in the transaction
object, that means that if I would change the country of the bank, then I
would need to reindex nearly 40000 documents (for example). Which is not
acceptable.

I'm also not able to use the "nested object" concept - to store the
transactions inside the bank account, because that way I would not be able
to insert the transactions to that bank dynamically, each new insert would
also cause the reindexing of the whole document.

Any thoughts?

Regards,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/21d4dfbb-eafd-4556-9087-7c01d2772b3a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3