Reducing the number of shards on an index without the source


(Emilie Lavigne) #1

Is there a way to merge the shards of a an index without the source (aka
without reindexing)? More specifically, we would like to turn an index of
5 shards into an index of a single shard once writing is no longer
happening on that index.

What our environment looks like:

  • Our ElasticSearch cluster is used as a search layer atop a Hadoop
    database. The source is therefore not stored in ElasticSearch
  • Our ElasticSearch cluster consists of 4 nodes, each with 32 cores, for
    a total of 128 cores.
  • We are currently indexing roughly 6gb of data (18gb with replicas)
    every day.
  • We create an index per day, each with 5 shards
  • Many of our searches are being run against all indices.

We figure that multiple shards per index make writing more efficient.
However, we are always only writing to the most recent day. After that,
having multiple shards per index actually makes our cluster more fragile
since each search is firing off too many threads in order to search all our
data (currently, we have 100 days of data - aka 500 shards). It only takes
roughly 10 parallel searches to get a ThreadPoolExecutor exception.

If we can't merge the shards without the source, do you have any other
suggestions?

Thank you,

Emilie

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2