I see that my Elasticsearch index only contains 19,502 records.
I've tried looking in hardball.log but it's 200 MB.
Any recommendations on what to search for in the log file to see what might have happened to the missing 100+ records? Any other debugging tips w logstash?
As you are explicitly setting the ID of the documents, is it possible that your file contains duplicates? Do you have any records that for some reason has failed parsing and have been indexed with the string "%{baseballPersonID}" as a key?
Thanks again, Christian. I went back a checked the IDs that made it to ES against a full list of IDs that should have made it and found that I had some parsing failures in my grok filter.
Basically, I used Linux commands to produce a list of IDs from my source file and the following command to produce a list of IDs from ES.
You could add a separate output, e.g. to a daily file, and write any records that had a _grokparsefailure there. That could allow you to easily identify records with issues.
Thanks. I now see that I can grep for "_grokparsefailure" in my log file and identify which lines from my source file weren't parsed correctly. That's a big help!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.