Index Management considerations on ES used as search agent on top of cassandra


(Naren Sree) #1

Hi We are using ES as our search engine for our Cassandra DB. We are dumping our business model data to both Cassandra and Elastic search. I want to design an index management strategy for this scenario. How would I do it ?

For example: Lets say we User Model data, which persists all the user related (first, last, address, phone number etc)

  1. Should I actually just create just one index for all the users or Create weekly/monthly indices based on when user gets created in our system ?
  2. How many shards do I need to allocate if the user data is like 1G.
  3. Lets say the scenario completely changes and we decide to all more data into ES. And then the data might exponentially grow to 200GB or so. If so, then whats the criterion for allocating the more shards to ES. How do I calculate the shards etc.
    4.Since I would not know how my system grows ahead of my time, lets say i make mistake in allocation shards (either too little or too many) Then is there a way to dyanimically shrink or expand them as and when more data is dumped into ES.

Thank you very much for your help..


(Mark Walkom) #2
  1. A single index sounds fine.
  2. Depends on what size that is in Elasticsearch, what queries you use and their rates, what response SLAs you need, the underlying infrastructure, etc.
  3. We recommend no more than 50GB per shard. But how large depends on 2
  4. You can shrink easily with _shrink. To expand you need to _reindex.

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.