Currently 50 million documents (within 3 month) are stored and I want to
store the data at least for 10 years. In addition the count of servers
grows as well.
Now my question:
Do you think to split the type "stats" into multiple types would boost the
search time of elasticsearch? As example if a type for each server would be
created:
Do you think to split the type "stats" into multiple types would
boost the
search time of elasticsearch? As example if a type for each
server would be
created:
Or does it makes no difference if the data are stored to a
single or
multiple types?
Types are really just for mapping separation. You won't really
see much difference in raw search perf, but if you create a lot
of types it can make the cluster state large, which amplifies the
communication in the cluster among the nodes.
In this case, if the docs will have the same general structure,
use a consistent type and include a field in each doc to store the
server name. It's likely a better design anyway.
The only disadventage is that the index buffer size is divided between the
shards. Older indices and their shards are nearly static because server
statistics are stored into the latest index.
How could I handle this? Is there a way to set the max memory size for a
special index? As example the index statistics2013.08 is the latest index.
The indices statistics2013.05 - 2013.07 are just used to search statistics.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.