Hi There,
I currently have ELK setup with redis being the messaging queue in the setup.
My question is if I receive 10 log files from 10 different servers, how far back can I go..and if I go back in time..do I also need some storage to store those metrics?
Also, if I am reading logs from a database, how fast or slow will it be compared to when I am reading from a logfile in a remote server?
you can go back as far as you have data/storage and ability to process that data.
To clarify, are you talking about data in elasticsearch or redis?
If your talking about ElasticSearch basic rule of thumb for me is LOG volume * (Replication level +1 )
so if you log 10GB of data to a file and elasticsearch is at its default level of replication (1) you would need 20GB of disk space a day (There is some overhead and any additional fields your creating from the data) but this is a rough estimate.
I am not sure what you mean about "Database" if you do a select * from table, you will be able to read that fast. Or do you mean Elasticsearch? either way depends on the hardware, cpu, memory and how complex your query is, and caching is. but think of it this way.
Greping a log file vs Elasticsearch
If you have 1 server and you grep a log file of 10GB , your limited by the disk speed and 1 cpu of the system your on.
If you have 10 Elasticsearch data nodes, holding that same 10GB file which is in 1 index with 10 SHARES , you are now using 10 CPU's and 10 DISK each searching only 1GB of data each.
In simple theory it will be 10 times faster. Of course this is only theoretical speed will probably be less. but definitely faster then grep. your milage will vary based on the data and architecture, and turnings. The only way to be sure is to benchmark it for your self
For example, I INDEX 1TB of logs a day on 9 servers ( 90GB ram, 16 CPU's and EMC storage for Disk space). When I search for something for 1 day, it takes less then a minute to return a value. a week takes in the area takes less then 4 minutes. Of course I could make it faster by better tunning the environment but its good enough for my department.
elasticserch will hold it as long as you don't delete it or corrupt your disk there is no automated clean up process built in to elasticsearch. you have to use something like curator or roll-your-own script to delete
it is stored where ever your data directory is set by default it is in the data directory of elasticsearch but that is easily changed in your elasticsearch.yml
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.