I have query with regards to raw documents, for example if we send syslogs, netflow data, firewall logs and windows event logs, we can either send it via logstash or directly elasticsearch. If we send through logstash it parse the data and send to elasticsearch. So all the documents are stored in json format in elasticsearch.
For example if there is any security audits if they want to see the raw data, is it possible to retrieve the raw documents of the data or only the json documents can be retrieved?
if its possible how?
if not what are the other possible ways to achieve it ?
What happens to the original log files? Its these that any auditors are expecting you to keep as they are the 'original' copies of the data concerned. Sending the contents of these files to Elasticsearch is great for monitoring and searching the contents of these files, but if its a requirement, you should also be keeping the original files somewhere for the required amount of time. Every document in Elasticsearch should contain a field housing of the name of the file the event came from to allow you to pinpoint the original file should you need to find it.
Thank you Steve for the info and your input, so does that mean once it’s indexed in to elasticsearch ,we can’t retrieve the raw documents? Is it right ?
As @thiago said, you can store a copy of the raw log entry in elasticsearch and then have a process in place to rebuild a new log file from those entries should you need to, but I know in our case, we have to keep the original files for a time. For example if we need to produce the log files for a criminal investigation, just rebuilding the files from the data stored in elasticsearch might not be sufficient, and we would have to supply the original file.
Elasticsearch itself doesn't do anything with the original log files, its entirely out of scope and nothing to do with Elasticseacrh what files are ingested and what you do with the files afterwards.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.