Best practice to design index/type for Elasticsearch


(Ranjith Sundaraj Chandra) #1

Hi,
I am a newbie to elasticsearch. I am planning to use elasticsearch for just
search and leave oracle DB to hold the canonical data.

Requirement : To provide search functionality on the 'application forms'
submitted to our system. Our 'application forms' can have many 'items'
associated with it (one to many).
We have to provide 2 searches, one that fetches 'application form' level
data and the other fetches 'item' level data. Both will have similar search
criteria.

Options 1:
I can create 2 indices with different json document structure. First index
will have the 'application form' as the parent and all 'items' as its child
(similar to its canonical structure), Second index will reverse the
hierarchy and have 'item' as the parent and all the 'application form'
attributes associated to it.
*Cons: *
Data is duplicated in 2 indices.

Option 2:
Create one index similar to canonical structure and support both the search
requirement.
*Cons : *
I have to write code to process and massage the data for returning the
results at 'item' level.

Question:

  1. Which option is preferable and why.
  2. Is there a better solution to handle this scenario.
  3. If I opt for option 1 - do I need to create 2 JDBC river component to
    update data in both the elasticsearch indices.
    or
    should I write one JDBC river component and update both the indices.

Appreciate your inputs.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/92702ef1-4fd1-45a6-a4b2-eb0cbd8c3aad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

Regarding the JDBC river, it depends how many data sources you have. If you
have one for form fields and one for form values, you should use two
instances. If you can use a single SQL select for both form fields and
values, one river instance should be sufficient.

Beside, you do not need parent/child or two indices if one index with two
mappings should work where you can follow reference IDs to get additional
attributes.

Jörg

On Mon, Jul 14, 2014 at 2:03 AM, Ranjith Sundaraj Chandra <
scranjith.work@gmail.com> wrote:

Hi,
I am a newbie to elasticsearch. I am planning to use elasticsearch for
just search and leave oracle DB to hold the canonical data.

Requirement : To provide search functionality on the 'application
forms' submitted to our system. Our 'application forms' can have many
'items' associated with it (one to many).
We have to provide 2 searches, one that fetches 'application form' level
data and the other fetches 'item' level data. Both will have similar search
criteria.

Options 1:
I can create 2 indices with different json document structure. First index
will have the 'application form' as the parent and all 'items' as its child
(similar to its canonical structure), Second index will reverse the
hierarchy and have 'item' as the parent and all the 'application form'
attributes associated to it.
*Cons: *
Data is duplicated in 2 indices.

Option 2:
Create one index similar to canonical structure and support both the
search requirement.
*Cons : *
I have to write code to process and massage the data for returning the
results at 'item' level.

Question:

  1. Which option is preferable and why.
  2. Is there a better solution to handle this scenario.
  3. If I opt for option 1 - do I need to create 2 JDBC river component to
    update data in both the elasticsearch indices.
    or
    should I write one JDBC river component and update both the indices.

Appreciate your inputs.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/92702ef1-4fd1-45a6-a4b2-eb0cbd8c3aad%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/92702ef1-4fd1-45a6-a4b2-eb0cbd8c3aad%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE%3DkFigz43zn0UGBG1JWRu8zy4kPCrAFTus8BBZ4yTF0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ranjith Sundaraj Chandra) #3

Thanks Jorg,
I will try this out and get back.
meanwhile if you have a sample to create the mapping at the 'item' level,
please share it.

Thanks,
Ranjith

On Monday, July 14, 2014 3:05:32 AM UTC-4, Jörg Prante wrote:

Regarding the JDBC river, it depends how many data sources you have. If
you have one for form fields and one for form values, you should use two
instances. If you can use a single SQL select for both form fields and
values, one river instance should be sufficient.

Beside, you do not need parent/child or two indices if one index with two
mappings should work where you can follow reference IDs to get additional
attributes.

Jörg

On Mon, Jul 14, 2014 at 2:03 AM, Ranjith Sundaraj Chandra <
scranji...@gmail.com <javascript:>> wrote:

Hi,
I am a newbie to elasticsearch. I am planning to use elasticsearch for
just search and leave oracle DB to hold the canonical data.

Requirement : To provide search functionality on the 'application
forms' submitted to our system. Our 'application forms' can have many
'items' associated with it (one to many).
We have to provide 2 searches, one that fetches 'application form' level
data and the other fetches 'item' level data. Both will have similar search
criteria.

Options 1:
I can create 2 indices with different json document structure. First
index will have the 'application form' as the parent and all 'items' as its
child (similar to its canonical structure), Second index will reverse the
hierarchy and have 'item' as the parent and all the 'application form'
attributes associated to it.
*Cons: *
Data is duplicated in 2 indices.

Option 2:
Create one index similar to canonical structure and support both the
search requirement.
*Cons : *
I have to write code to process and massage the data for returning the
results at 'item' level.

Question:

  1. Which option is preferable and why.
  2. Is there a better solution to handle this scenario.
  3. If I opt for option 1 - do I need to create 2 JDBC river component to
    update data in both the elasticsearch indices.
    or
    should I write one JDBC river component and update both the indices.

Appreciate your inputs.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/92702ef1-4fd1-45a6-a4b2-eb0cbd8c3aad%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/92702ef1-4fd1-45a6-a4b2-eb0cbd8c3aad%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8891e581-6089-47b9-aeea-fb4e1d09f4a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4