I am quite new to elasticsearch. I need to build a search system using the
data from MongoDB. So, here is a high level overview of my application:
There are different users belonging to different organizations
A User can upload multiple datasets. Each dataset is stored as a
single document in MongoDB. However, each dataset contains an array of
nodes which contain the data we are interested in.
User can load one dataset at a time to his workspace and view the
entire data for that particular dataset. But at a time, one user can view
only one dataset. So, datasets are independent from each other and we never
need to have any aggregation on multiple datasets.
User can perform a search in a dataset which is loaded in his/her
workspace. Search should return the matching elements from the nodes array
of the dataset
For illustration, here is a single doc in MongoDB datasets collection
Each single dataset will be stored in a separate type in the index.
Name of the type will be the ObjectId of the dataset in mongoDB
Each element in the nodes array of dataset will become a single
document in the corresponding type in elasticsearch.
I will use custom routing to make sure a single dataset resides on one
shard only. For that, I will be using the type name (ObjectId of dataset
from MongoDB) as my routing key. I assume, I will have to store it with
each document in elasticsearch?
Now I need to know if I am heading in a right direction ? Does the solution
look scalable or is there something terribly wrong with the design and
would love to hear some suggestions on how to improve it.
I am quite new to elasticsearch. I need to build a search system using the
data from MongoDB. So, here is a high level overview of my application:
There are different users belonging to different organizations
A User can upload multiple datasets. Each dataset is stored as a
single document in MongoDB. However, each dataset contains an array of
nodes which contain the data we are interested in.
User can load one dataset at a time to his workspace and view the
entire data for that particular dataset. But at a time, one user can view
only one dataset. So, datasets are independent from each other and we never
need to have any aggregation on multiple datasets.
User can perform a search in a dataset which is loaded in his/her
workspace. Search should return the matching elements from the nodes array
of the dataset
For illustration, here is a single doc in MongoDB datasets collection
Each single dataset will be stored in a separate type in the index.
Name of the type will be the ObjectId of the dataset in mongoDB
Each element in the nodes array of dataset will become a single
document in the corresponding type in elasticsearch.
I will use custom routing to make sure a single dataset resides on
one shard only. For that, I will be using the type name (ObjectId of
dataset from MongoDB) as my routing key. I assume, I will have to store it
with each document in elasticsearch?
Now I need to know if I am heading in a right direction ? Does the
solution look scalable or is there something terribly wrong with the design
and would love to hear some suggestions on how to improve it.
However, I am trying to understand how massive index can be a problem if
everytime I know which type to query ? Any explanation or link to some
documentation regarding this ?
On Tuesday, December 30, 2014 3:42:20 AM UTC+5:30, Mark Walkom wrote:
Ideally you want to keep different types in different indexes.
And you definitely don't want everything in one massive index as that
won't scale well.
On 28 December 2014 at 22:41, Mandeep Gulati <mandeep....@gmail.com
<javascript:>> wrote:
I am quite new to elasticsearch. I need to build a search system using
the data from MongoDB. So, here is a high level overview of my application:
There are different users belonging to different organizations
A User can upload multiple datasets. Each dataset is stored as a
single document in MongoDB. However, each dataset contains an array of
nodes which contain the data we are interested in.
User can load one dataset at a time to his workspace and view the
entire data for that particular dataset. But at a time, one user can view
only one dataset. So, datasets are independent from each other and we never
need to have any aggregation on multiple datasets.
User can perform a search in a dataset which is loaded in his/her
workspace. Search should return the matching elements from the nodes array
of the dataset
For illustration, here is a single doc in MongoDB datasets collection
Each single dataset will be stored in a separate type in the index.
Name of the type will be the ObjectId of the dataset in mongoDB
Each element in the nodes array of dataset will become a single
document in the corresponding type in elasticsearch.
I will use custom routing to make sure a single dataset resides on
one shard only. For that, I will be using the type name (ObjectId of
dataset from MongoDB) as my routing key. I assume, I will have to store it
with each document in elasticsearch?
Now I need to know if I am heading in a right direction ? Does the
solution look scalable or is there something terribly wrong with the design
and would love to hear some suggestions on how to improve it.
You don't put all your data into one massive table in a single database do
you?
There's data structure, sizing, performance and more that you need to take
into account, irrespective of what data store you use.
However, I am trying to understand how massive index can be a problem if
everytime I know which type to query ? Any explanation or link to some
documentation regarding this ?
On Tuesday, December 30, 2014 3:42:20 AM UTC+5:30, Mark Walkom wrote:
Ideally you want to keep different types in different indexes.
And you definitely don't want everything in one massive index as that
won't scale well.
I am quite new to elasticsearch. I need to build a search system using
the data from MongoDB. So, here is a high level overview of my application:
There are different users belonging to different organizations
A User can upload multiple datasets. Each dataset is stored as a
single document in MongoDB. However, each dataset contains an array of
nodes which contain the data we are interested in.
User can load one dataset at a time to his workspace and view the
entire data for that particular dataset. But at a time, one user can view
only one dataset. So, datasets are independent from each other and we never
need to have any aggregation on multiple datasets.
User can perform a search in a dataset which is loaded in his/her
workspace. Search should return the matching elements from the nodes array
of the dataset
For illustration, here is a single doc in MongoDB datasets collection
Each single dataset will be stored in a separate type in the
index. Name of the type will be the ObjectId of the dataset in mongoDB
Each element in the nodes array of dataset will become a single
document in the corresponding type in elasticsearch.
I will use custom routing to make sure a single dataset resides on
one shard only. For that, I will be using the type name (ObjectId of
dataset from MongoDB) as my routing key. I assume, I will have to store it
with each document in elasticsearch?
Now I need to know if I am heading in a right direction ? Does the
solution look scalable or is there something terribly wrong with the design
and would love to hear some suggestions on how to improve it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.