ES architecture with time based data


(paulp) #1

Hi Folks,

I'm new to ES and currently evaluating it in EC2 environment.

The data I wish to index is time based , social media documents that needs to be indexed and stored for couple of weeks, the idea is to keep the indexes for certain time frame and lock them afterwards (and eventually delete). Total amount of indexed documents should exceed 10 millions.
Most of the queries will be for documents from the last 24 hours (or less).

My questions are as follows:

  1. Do I need to worry about hot shards (or nodes) in this scenario resulting a performance degradation?

  2. Since replication and sharding will take place (and we are talking about EC2 environment) should I be concerned of high latency between nodes?

  3. I would like very much to hear ES experts regarding their best practices & use cases for such scenario.

Best Regards,
Paul


(system) #2