The question what if we create index nodes and use local SSD (x10 faster) ?
we can probably use 2-3 indexing nodes and then just clone the indexes to the data nodes
In order to optimising indexing performance, it is quite common to have different types of data nodes in the cluster. This is often referred to as the Hot/Warm architecture. A subset of nodes equipped with fast, attached SSDs handle all the indexing and a lot of the querying of recent data, while nodes with larger amount of slower, spinning disks are used for long term storage and querying of older data. This can be a very efficient approach and is described in more detail in the blog post I linked to.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.