I'm doing project related to Elasticsearch (ES), Kibana and Logstash. I would like to setup log data visualizing system using them. My data is log data and it from other system so I need to save original log data in stable data source. Also I need to fetch data everyday because it is log data. I have some questions about this.
People say that ES is not stable so it shouldn't be used as sole data source. What is the reason? I think that it is because nodes can be died suddenly. But I'm not sure and I want to have correct reason.
1-1. if ES is not stable, how do I know if my data is gone from ES?
1-2 . Backfilling data from DB to ES is necessary?
In my system, data flow is like below. My processor load and store log data every day using the system. Is this right system structure? If you have any recommendation for the system structure, please let me know.
External Log Data > RDB > (Logstash) > Elasticsearch > Kibana
This is very important for my project. Please answer my questions.
Thanks a lot.
Jiyoon
Is there a source for this statement? Elasticsearch is stable, there are many stable releases.
Do you mean Elasticsearch loses data on acknowledged writes? That's a different story but unfortunately true under certain conditions. See also Elasticsearch Resiliency Status | Elastic
Thank you for answering my question!
I cannot find the source but I remember that data backfiling is needed.
My ES version is 1.7.1. Is this version stable? I cannot change the version of ES because it is out of my ability.
Let's imagine that I have one replica per shard and I have one shard for index. Also I have lots of indices because I create index periodically (such as daily). I have 5 nodes for ES cluster. If two nodes are died which contain shard and replica for index 1, the data for index 1 should be backfilled. How about this situation?
I don't have much knowledge on 1.7.1. I am using 2.3.3 which is stable. Most of the cluster stability depends on your hardware and ES configurations. [quote="wldbs0508, post:3, topic:56017"]
I have 5 nodes for ES cluster.
[/quote]
If you have five nodes & you think your cluster is not stable & data is very precious maintain for replicas. And ES proving snapshot concept which is highly useful in this scenario.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.