Elasticsearch : Need advice on architectural design of my cluster

Mandeep_Gulati · December 28, 2014, 11:41am

I am quite new to elasticsearch. I need to build a search system using the
data from MongoDB. So, here is a high level overview of my application:

There are different users belonging to different organizations
A User can upload multiple datasets. Each dataset is stored as a
single document in MongoDB. However, each dataset contains an array of
nodes which contain the data we are interested in.
User can load one dataset at a time to his workspace and view the
entire data for that particular dataset. But at a time, one user can view
only one dataset. So, datasets are independent from each other and we never
need to have any aggregation on multiple datasets.
User can perform a search in a dataset which is loaded in his/her
workspace. Search should return the matching elements from the nodes array
of the dataset

For illustration, here is a single doc in MongoDB datasets collection

{
"_id": ObjectId()
"setName": "dummy_set",
"nodes": [
{
"id": ObjectId(),
"label": "some text",
"content" : "more text"
},
. . .
]
}

For this, the design that I have though about is:

There will be one index in my cluster
Each single dataset will be stored in a separate type in the index.
Name of the type will be the ObjectId of the dataset in mongoDB
Each element in the nodes array of dataset will become a single
document in the corresponding type in elasticsearch.
I will use custom routing to make sure a single dataset resides on one
shard only. For that, I will be using the type name (ObjectId of dataset
from MongoDB) as my routing key. I assume, I will have to store it with
each document in elasticsearch?

Now I need to know if I am heading in a right direction ? Does the solution
look scalable or is there something terribly wrong with the design and
would love to hear some suggestions on how to improve it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · December 29, 2014, 10:10pm

Ideally you want to keep different types in different indexes.
And you definitely don't want everything in one massive index as that won't
scale well.

On 28 December 2014 at 22:41, Mandeep Gulati mandeep.s.gulati@gmail.com
wrote:

I am quite new to elasticsearch. I need to build a search system using the
data from MongoDB. So, here is a high level overview of my application:

There are different users belonging to different organizations

A User can upload multiple datasets. Each dataset is stored as a
single document in MongoDB. However, each dataset contains an array of
nodes which contain the data we are interested in.

User can load one dataset at a time to his workspace and view the
entire data for that particular dataset. But at a time, one user can view
only one dataset. So, datasets are independent from each other and we never
need to have any aggregation on multiple datasets.

User can perform a search in a dataset which is loaded in his/her
workspace. Search should return the matching elements from the nodes array
of the dataset

For illustration, here is a single doc in MongoDB datasets collection

{
"_id": ObjectId()
"setName": "dummy_set",
"nodes": [
{
"id": ObjectId(),
"label": "some text",
"content" : "more text"
},
. . .
]
}

For this, the design that I have though about is:

There will be one index in my cluster

Each single dataset will be stored in a separate type in the index.
Name of the type will be the ObjectId of the dataset in mongoDB

Each element in the nodes array of dataset will become a single
document in the corresponding type in elasticsearch.

I will use custom routing to make sure a single dataset resides on
one shard only. For that, I will be using the type name (ObjectId of
dataset from MongoDB) as my routing key. I assume, I will have to store it
with each document in elasticsearch?

Now I need to know if I am heading in a right direction ? Does the
solution look scalable or is there something terribly wrong with the design
and would love to hear some suggestions on how to improve it.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8iOLoOxxNV6woPoLykeYXVWk-Y2ouPSX_2G3hvZLsYcg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Mandeep_Gulati · December 31, 2014, 7:37am

Thanks for the response Mark!

However, I am trying to understand how massive index can be a problem if
everytime I know which type to query ? Any explanation or link to some
documentation regarding this ?

On Tuesday, December 30, 2014 3:42:20 AM UTC+5:30, Mark Walkom wrote:

Ideally you want to keep different types in different indexes.
And you definitely don't want everything in one massive index as that
won't scale well.

On 28 December 2014 at 22:41, Mandeep Gulati <mandeep....@gmail.com
<javascript:>> wrote:

I am quite new to elasticsearch. I need to build a search system using
the data from MongoDB. So, here is a high level overview of my application:

There are different users belonging to different organizations

A User can upload multiple datasets. Each dataset is stored as a
single document in MongoDB. However, each dataset contains an array of
nodes which contain the data we are interested in.

User can load one dataset at a time to his workspace and view the
entire data for that particular dataset. But at a time, one user can view
only one dataset. So, datasets are independent from each other and we never
need to have any aggregation on multiple datasets.

User can perform a search in a dataset which is loaded in his/her
workspace. Search should return the matching elements from the nodes array
of the dataset

For illustration, here is a single doc in MongoDB datasets collection

{
"_id": ObjectId()
"setName": "dummy_set",
"nodes": [
{
"id": ObjectId(),
"label": "some text",
"content" : "more text"
},
. . .
]
}

For this, the design that I have though about is:

There will be one index in my cluster

Each single dataset will be stored in a separate type in the index.
Name of the type will be the ObjectId of the dataset in mongoDB

Each element in the nodes array of dataset will become a single
document in the corresponding type in elasticsearch.

I will use custom routing to make sure a single dataset resides on
one shard only. For that, I will be using the type name (ObjectId of
dataset from MongoDB) as my routing key. I assume, I will have to store it
with each document in elasticsearch?

Now I need to know if I am heading in a right direction ? Does the
solution look scalable or is there something terribly wrong with the design
and would love to hear some suggestions on how to improve it.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/13880f3a-fe01-4d81-99eb-5c05f709b866%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · January 1, 2015, 3:11am

You don't put all your data into one massive table in a single database do
you?
There's data structure, sizing, performance and more that you need to take
into account, irrespective of what data store you use.

On 31 December 2014 at 18:37, Mandeep Gulati mandeep.s.gulati@gmail.com
wrote:

Thanks for the response Mark!

However, I am trying to understand how massive index can be a problem if
everytime I know which type to query ? Any explanation or link to some
documentation regarding this ?

On Tuesday, December 30, 2014 3:42:20 AM UTC+5:30, Mark Walkom wrote:

Ideally you want to keep different types in different indexes.
And you definitely don't want everything in one massive index as that
won't scale well.

On 28 December 2014 at 22:41, Mandeep Gulati mandeep....@gmail.com
wrote:

I am quite new to elasticsearch. I need to build a search system using
the data from MongoDB. So, here is a high level overview of my application:

There are different users belonging to different organizations

A User can upload multiple datasets. Each dataset is stored as a
single document in MongoDB. However, each dataset contains an array of
nodes which contain the data we are interested in.

User can load one dataset at a time to his workspace and view the
entire data for that particular dataset. But at a time, one user can view
only one dataset. So, datasets are independent from each other and we never
need to have any aggregation on multiple datasets.

User can perform a search in a dataset which is loaded in his/her
workspace. Search should return the matching elements from the nodes array
of the dataset

For illustration, here is a single doc in MongoDB datasets collection

{
"_id": ObjectId()
"setName": "dummy_set",
"nodes": [
{
"id": ObjectId(),
"label": "some text",
"content" : "more text"
},
. . .
]
}

For this, the design that I have though about is:

There will be one index in my cluster

Each single dataset will be stored in a separate type in the
index. Name of the type will be the ObjectId of the dataset in mongoDB

Each element in the nodes array of dataset will become a single
document in the corresponding type in elasticsearch.

I will use custom routing to make sure a single dataset resides on
one shard only. For that, I will be using the type name (ObjectId of
dataset from MongoDB) as my routing key. I assume, I will have to store it
with each document in elasticsearch?

Now I need to know if I am heading in a right direction ? Does the
solution look scalable or is there something terribly wrong with the design
and would love to hear some suggestions on how to improve it.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a177625f-2462-40c4-aad2-514ee3553b64%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/13880f3a-fe01-4d81-99eb-5c05f709b866%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/13880f3a-fe01-4d81-99eb-5c05f709b866%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_yvg2j0aYZRTTaT4FWYy%2BFboXiXG5yDdyScCrPp6XMUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.