I wast to store my log files in Elasticsearch for further analysis. But as far as I understand, there is a limit of 2bil of docs per shard.
My application generates about 1 bil of log lines per day, so this limit will be reached fast.
What is the solution for this (I think that large number of log lines is rather common situation)? I want to store at least 1 month of logs (30 bil) or even more (6 month or even 1 year).
What really matters is, that you at all costs keep the physical shard_size <= size of process memory and that the process memory are pinned to the ES Java process, so no swapping has to be carried out.
And make sure, that you only utilize 50% of your physical memory for the ES process. The rest should be accessible to the OS layer for Lucene to use via the OS.
Apart from this, you can query across several indices without any problems.
I'm stating, that you easily can write much more to single node, but when you want to read it - you will run into troubles if you need to read multiple full shards into memory.
If you known what you are looking for and know your data, then it's usually not a problem, but it will be if you provide access to Kibana to a large group of data scientist. If you can manage to have pre-configured dashboards, then you are also safe.
Honestly, I have never seen this to be an issue to the extent you seem to be hinting at.
ES can handle TB's of data per node, with super fast response times.
Elasticsearch can be tuned in a number of ways for different use cases, and determining the ideal shard size and number of shards a node can handle is no different. Having a small enough data set per node so that all data can be cached by the OS (which it seems you are recommending if I read your post correctly) may be applicable for search use cases with very high query rates and low required latencies, but this does in my opinion rarely make sense for the vast majority of logging use cases.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.