Elasticsearch : Need advice on architectural design of my cluster

I am quite new to elasticsearch. I need to build a search system using the
data from MongoDB. So, here is a high level overview of my application:

  • There are different users belonging to different organizations
  • A User can upload multiple datasets. Each dataset is stored as a
    single document in MongoDB. However, each dataset contains an array of
    nodes which contain the data we are interested in.
  • User can load one dataset at a time to his workspace and view the
    entire data for that particular dataset. But at a time, one user can view
    only one dataset. So, datasets are independent from each other and we never
    need to have any aggregation on multiple datasets.
  • User can perform a search in a dataset which is loaded in his/her
    workspace. Search should return the matching elements from the nodes array
    of the dataset

For illustration, here is a single doc in MongoDB datasets collection

{
"_id": ObjectId()
"setName": "dummy_set",
"nodes": [
{
"id": ObjectId(),
"label": "some text",
"content" : "more text"
},
. . .
]
}

For this, the design that I have though about is:

  • There will be one index in my cluster
  • Each single dataset will be stored in a separate type in the index.
    Name of the type will be the ObjectId of the dataset in mongoDB
  • Each element in the nodes array of dataset will become a single
    document in the corresponding type in elasticsearch.
  • I will use custom routing to make sure a single dataset resides on one
    shard only. For that, I will be using the type name (ObjectId of dataset
    from MongoDB) as my routing key. I assume, I will have to store it with
    each document in elasticsearch?

Now I need to know if I am heading in a right direction ? Does the solution
look scalable or is there something terribly wrong with the design and
would love to hear some suggestions on how to improve it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ideally you want to keep different types in different indexes.
And you definitely don't want everything in one massive index as that won't
scale well.

On 28 December 2014 at 22:41, Mandeep Gulati mandeep.s.gulati@gmail.com
wrote:

I am quite new to elasticsearch. I need to build a search system using the
data from MongoDB. So, here is a high level overview of my application:

  • There are different users belonging to different organizations
  • A User can upload multiple datasets. Each dataset is stored as a
    single document in MongoDB. However, each dataset contains an array of
    nodes which contain the data we are interested in.
  • User can load one dataset at a time to his workspace and view the
    entire data for that particular dataset. But at a time, one user can view
    only one dataset. So, datasets are independent from each other and we never
    need to have any aggregation on multiple datasets.
  • User can perform a search in a dataset which is loaded in his/her
    workspace. Search should return the matching elements from the nodes array
    of the dataset

For illustration, here is a single doc in MongoDB datasets collection

{
"_id": ObjectId()
"setName": "dummy_set",
"nodes": [
{
"id": ObjectId(),
"label": "some text",
"content" : "more text"
},
. . .
]
}

For this, the design that I have though about is:

  • There will be one index in my cluster
  • Each single dataset will be stored in a separate type in the index.
    Name of the type will be the ObjectId of the dataset in mongoDB
  • Each element in the nodes array of dataset will become a single
    document in the corresponding type in elasticsearch.
  • I will use custom routing to make sure a single dataset resides on
    one shard only. For that, I will be using the type name (ObjectId of
    dataset from MongoDB) as my routing key. I assume, I will have to store it
    with each document in elasticsearch?

Now I need to know if I am heading in a right direction ? Does the
solution look scalable or is there something terribly wrong with the design
and would love to hear some suggestions on how to improve it.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8iOLoOxxNV6woPoLykeYXVWk-Y2ouPSX_2G3hvZLsYcg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks for the response Mark!

However, I am trying to understand how massive index can be a problem if
everytime I know which type to query ? Any explanation or link to some
documentation regarding this ?

On Tuesday, December 30, 2014 3:42:20 AM UTC+5:30, Mark Walkom wrote:

Ideally you want to keep different types in different indexes.
And you definitely don't want everything in one massive index as that
won't scale well.

On 28 December 2014 at 22:41, Mandeep Gulati <mandeep....@gmail.com
<javascript:>> wrote:

I am quite new to elasticsearch. I need to build a search system using
the data from MongoDB. So, here is a high level overview of my application:

  • There are different users belonging to different organizations
  • A User can upload multiple datasets. Each dataset is stored as a
    single document in MongoDB. However, each dataset contains an array of
    nodes which contain the data we are interested in.
  • User can load one dataset at a time to his workspace and view the
    entire data for that particular dataset. But at a time, one user can view
    only one dataset. So, datasets are independent from each other and we never
    need to have any aggregation on multiple datasets.
  • User can perform a search in a dataset which is loaded in his/her
    workspace. Search should return the matching elements from the nodes array
    of the dataset

For illustration, here is a single doc in MongoDB datasets collection

{
"_id": ObjectId()
"setName": "dummy_set",
"nodes": [
{
"id": ObjectId(),
"label": "some text",
"content" : "more text"
},
. . .
]
}

For this, the design that I have though about is:

  • There will be one index in my cluster
  • Each single dataset will be stored in a separate type in the index.
    Name of the type will be the ObjectId of the dataset in mongoDB
  • Each element in the nodes array of dataset will become a single
    document in the corresponding type in elasticsearch.
  • I will use custom routing to make sure a single dataset resides on
    one shard only. For that, I will be using the type name (ObjectId of
    dataset from MongoDB) as my routing key. I assume, I will have to store it
    with each document in elasticsearch?

Now I need to know if I am heading in a right direction ? Does the
solution look scalable or is there something terribly wrong with the design
and would love to hear some suggestions on how to improve it.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/13880f3a-fe01-4d81-99eb-5c05f709b866%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You don't put all your data into one massive table in a single database do
you?
There's data structure, sizing, performance and more that you need to take
into account, irrespective of what data store you use.

On 31 December 2014 at 18:37, Mandeep Gulati mandeep.s.gulati@gmail.com
wrote:

Thanks for the response Mark!

However, I am trying to understand how massive index can be a problem if
everytime I know which type to query ? Any explanation or link to some
documentation regarding this ?

On Tuesday, December 30, 2014 3:42:20 AM UTC+5:30, Mark Walkom wrote:

Ideally you want to keep different types in different indexes.
And you definitely don't want everything in one massive index as that
won't scale well.

On 28 December 2014 at 22:41, Mandeep Gulati mandeep....@gmail.com
wrote:

I am quite new to elasticsearch. I need to build a search system using
the data from MongoDB. So, here is a high level overview of my application:

  • There are different users belonging to different organizations
  • A User can upload multiple datasets. Each dataset is stored as a
    single document in MongoDB. However, each dataset contains an array of
    nodes which contain the data we are interested in.
  • User can load one dataset at a time to his workspace and view the
    entire data for that particular dataset. But at a time, one user can view
    only one dataset. So, datasets are independent from each other and we never
    need to have any aggregation on multiple datasets.
  • User can perform a search in a dataset which is loaded in his/her
    workspace. Search should return the matching elements from the nodes array
    of the dataset

For illustration, here is a single doc in MongoDB datasets collection

{
"_id": ObjectId()
"setName": "dummy_set",
"nodes": [
{
"id": ObjectId(),
"label": "some text",
"content" : "more text"
},
. . .
]
}

For this, the design that I have though about is:

  • There will be one index in my cluster
  • Each single dataset will be stored in a separate type in the
    index. Name of the type will be the ObjectId of the dataset in mongoDB
  • Each element in the nodes array of dataset will become a single
    document in the corresponding type in elasticsearch.
  • I will use custom routing to make sure a single dataset resides on
    one shard only. For that, I will be using the type name (ObjectId of
    dataset from MongoDB) as my routing key. I assume, I will have to store it
    with each document in elasticsearch?

Now I need to know if I am heading in a right direction ? Does the
solution look scalable or is there something terribly wrong with the design
and would love to hear some suggestions on how to improve it.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/13880f3a-fe01-4d81-99eb-5c05f709b866%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/13880f3a-fe01-4d81-99eb-5c05f709b866%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_yvg2j0aYZRTTaT4FWYy%2BFboXiXG5yDdyScCrPp6XMUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.