Is there a way to re parse log entries in elastic search? The reason I ask is I've been inputing all of our old logs into elasticsearch. I thought my groks parsing handled the majority of entries, however out of the 3 years of logs, I ran into a few months where the log entries differed slightly, just enough to mess up my grok filters. Is there a way to search for the grok parse failures in elastic search and reparse the information for the ones that messed up?
If you have saved the original log message as a field it would be easy to recreate the original logs that resulted in grok problem and push them to Logstash once again and afterwards delete the _grokparsefailure messages. There's nothing turn-key solution for this though.
If you don't have the full original message then perhaps you can stitch it together using various fields in the message or write a custom Logstash configuration to fix the extracted logs.
Now that I have a search which shows all entries with grok parse failures, how would I go about deleting only those entries from elasticsearch? Also there seems to be two fields for the orginal message, _source and message which one would be better to use for outputting to a file.
Also there seems to be two fields for the orginal message, _source and message which one would be better to use for outputting to a file.
_source is the whole document, i.e. all fields. Depending on how your filters are (and were) set up you might get away with dumping just the message field, but that's impossible for me to say. It should be easy for you since you're familiar with your logging format and your Logstash filters.
Based on what you said I've been experimenting with the api. I'm still having a few issues though. I think this command will return messages from all indexes of type apache which have tags of _grokparsefailure. Also it will print it out in a nice format with the pretty command. However I keep getting an exception when I use it. Also I'm not quite sure how to get it to only print out the message field? I saw there is a way to have it print out the source message with the _source_include and _source_exclude commands. But I'm still not sure about just returning the message.
Please always post exact error messages instead of saying "I'm getting an exception".
If this is the exact command you're using the problem is that some of the double quotes are proper, typographically correct, quotation marks rather than the ones normally used in programming. Compare "query" and “term”.
Use fields to select which document fields to include in the response (although you can always extract the message from the full source document in _source).
Sorry I forgot the error on my last message. So based on your last input I modified what I was using, got rid of the incorrect quotes and added the field field. All of my indexes start with logstash-[date] as below, and they are of type "apache". when I think I have the command correct I'll be switching the specific index to _all and also changing the size to a much larger number. Anyway this is what I"m using:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.