Configure elasticsearch to store only index data


(Sandesh Kumar) #1

hi,

I am new to Elasticsearch and researching on to see if it can fit my needs.

Requirement - To be able to log data into datastore and generate metrics
out of it. The data would be huge around 50K recrods per hour (each row of
few hundred bytes/KB).
We are seeing how we can utilize elasticsearch for this. By going over
various threads, looks like this is possible to use elastic search for this
need and also use it as a datastore, however could not clarify few items
given below :

  1. Can we use Elasticsearch to only for search capabilities while the
    actual data is stored in another database? (i.e., Elasticsearch shall
    maintain only 'indexed' (index in Database terminology) data which are
    searchable while the actual data would be stored in different store..say
    mongodb.). So if have 50K records (~ 5GB) of data, of which there are 10-20
    properties of interest in each record which we expect elasticsearch to
    search/index. Then can we make elasticsearch to store only these searchable
    properties in its index (size of which we expect to be around few MBs),
    while the actual data is stored in a separate store and then link it
    somehow?
    (In otherwords, elasticsearch just has the properties data which is
    searchable (nothing more) and has a pointer to actual data in Mongo).

  2. What are the memory requirements of elasticsearch? (From what i
    understood, elasticsearch loads all relevant indexed data into memory while
    searching. So in my earlier example, if i want 20 properties (each 50 bytes
    approx) in each record to be searchable, then for searching 50K records,
    elasticsearch approximately would need 1.2GB of RAM to search over a days
    data) Is my understanding correct?

thanks
Sandesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f82df903-0fb7-4eaa-9fd2-4d990f8685d2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vineeth mohan-2) #2

Hello Sudheesh ,

To achieve #1 , you can disable the source -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html

You can disable analyzing on the field which have a pointer to the feed in
your database and index that field along.
That way , you can , at any time go back to the orginal feed in your
database.

Thanks
Vineeth

On Sun, Feb 23, 2014 at 11:20 AM, Sandesh Kumar dev.skg@gmail.com wrote:

hi,

I am new to Elasticsearch and researching on to see if it can fit my
needs.

Requirement - To be able to log data into datastore and generate metrics
out of it. The data would be huge around 50K recrods per hour (each row of
few hundred bytes/KB).
We are seeing how we can utilize elasticsearch for this. By going over
various threads, looks like this is possible to use elastic search for this
need and also use it as a datastore, however could not clarify few items
given below :

  1. Can we use Elasticsearch to only for search capabilities while the
    actual data is stored in another database? (i.e., Elasticsearch shall
    maintain only 'indexed' (index in Database terminology) data which are
    searchable while the actual data would be stored in different store..say
    mongodb.). So if have 50K records (~ 5GB) of data, of which there are 10-20
    properties of interest in each record which we expect elasticsearch to
    search/index. Then can we make elasticsearch to store only these searchable
    properties in its index (size of which we expect to be around few MBs),
    while the actual data is stored in a separate store and then link it
    somehow?
    (In otherwords, elasticsearch just has the properties data which is
    searchable (nothing more) and has a pointer to actual data in Mongo).

  2. What are the memory requirements of elasticsearch? (From what i
    understood, elasticsearch loads all relevant indexed data into memory while
    searching. So in my earlier example, if i want 20 properties (each 50 bytes
    approx) in each record to be searchable, then for searching 50K records,
    elasticsearch approximately would need 1.2GB of RAM to search over a days
    data) Is my understanding correct?

thanks
Sandesh

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f82df903-0fb7-4eaa-9fd2-4d990f8685d2%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5krzzpq8P%3DgF_LqD3JMRxR0minFEmfBzPdEH44Mw3XV8w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #3

Some ideas:

  1. You can turn dynamic mapping to false and then explicitly specify only a
    handful of fields that will be indexed/searchable. Or, if you don't want to
    do this, just send in a smaller JSON document with only the fields you want
    searched or indexed.

  2. RAM is mostly dependent on the types of searches. If you do a lot of
    cached filters, sorting, facets/aggregation, script field access,
    parent-child, that's where most of the RAM will be used. Otherwise, if
    you're just doing standard full-text searches and maybe some occasional
    filtering, you'll probably require "lesser" RAM. The best way to determine
    this is to test your queries (different query types) on a single node and
    monitor RAM usage. You'll want to run the node stats checking for the
    filter, fielddata, and id caches for usage.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85307551-1041-4fd4-9b45-d18211adf614%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4