I am new to Elasticsearch and researching on to see if it can fit my needs.
Requirement - To be able to log data into datastore and generate metrics
out of it. The data would be huge around 50K recrods per hour (each row of
few hundred bytes/KB).
We are seeing how we can utilize elasticsearch for this. By going over
various threads, looks like this is possible to use elastic search for this
need and also use it as a datastore, however could not clarify few items
given below :
Can we use Elasticsearch to only for search capabilities while the
actual data is stored in another database? (i.e., Elasticsearch shall
maintain only 'indexed' (index in Database terminology) data which are
searchable while the actual data would be stored in different store..say
mongodb.). So if have 50K records (~ 5GB) of data, of which there are 10-20
properties of interest in each record which we expect elasticsearch to
search/index. Then can we make elasticsearch to store only these searchable
properties in its index (size of which we expect to be around few MBs),
while the actual data is stored in a separate store and then link it
somehow?
(In otherwords, elasticsearch just has the properties data which is
searchable (nothing more) and has a pointer to actual data in Mongo).
What are the memory requirements of elasticsearch? (From what i
understood, elasticsearch loads all relevant indexed data into memory while
searching. So in my earlier example, if i want 20 properties (each 50 bytes
approx) in each record to be searchable, then for searching 50K records,
elasticsearch approximately would need 1.2GB of RAM to search over a days
data) Is my understanding correct?
You can disable analyzing on the field which have a pointer to the feed in
your database and index that field along.
That way , you can , at any time go back to the orginal feed in your
database.
Thanks
Vineeth
On Sun, Feb 23, 2014 at 11:20 AM, Sandesh Kumar dev.skg@gmail.com wrote:
hi,
I am new to Elasticsearch and researching on to see if it can fit my
needs.
Requirement - To be able to log data into datastore and generate metrics
out of it. The data would be huge around 50K recrods per hour (each row of
few hundred bytes/KB).
We are seeing how we can utilize elasticsearch for this. By going over
various threads, looks like this is possible to use Elasticsearch for this
need and also use it as a datastore, however could not clarify few items
given below :
Can we use Elasticsearch to only for search capabilities while the
actual data is stored in another database? (i.e., Elasticsearch shall
maintain only 'indexed' (index in Database terminology) data which are
searchable while the actual data would be stored in different store..say
mongodb.). So if have 50K records (~ 5GB) of data, of which there are 10-20
properties of interest in each record which we expect elasticsearch to
search/index. Then can we make elasticsearch to store only these searchable
properties in its index (size of which we expect to be around few MBs),
while the actual data is stored in a separate store and then link it
somehow?
(In otherwords, elasticsearch just has the properties data which is
searchable (nothing more) and has a pointer to actual data in Mongo).
What are the memory requirements of elasticsearch? (From what i
understood, elasticsearch loads all relevant indexed data into memory while
searching. So in my earlier example, if i want 20 properties (each 50 bytes
approx) in each record to be searchable, then for searching 50K records,
elasticsearch approximately would need 1.2GB of RAM to search over a days
data) Is my understanding correct?
You can turn dynamic mapping to false and then explicitly specify only a
handful of fields that will be indexed/searchable. Or, if you don't want to
do this, just send in a smaller JSON document with only the fields you want
searched or indexed.
RAM is mostly dependent on the types of searches. If you do a lot of
cached filters, sorting, facets/aggregation, script field access,
parent-child, that's where most of the RAM will be used. Otherwise, if
you're just doing standard full-text searches and maybe some occasional
filtering, you'll probably require "lesser" RAM. The best way to determine
this is to test your queries (different query types) on a single node and
monitor RAM usage. You'll want to run the node stats checking for the
filter, fielddata, and id caches for usage.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.