We are running 7.17.x Elasticsearch and have setup a reporting for a search query. The team needs a weekly dump and it appears the number of results are not matching the one shown in search query results
For example: 7 days data shows about 850,000 results however exporting the same in CSV shows 100k less
This difference increases as the count increases, smaller number seem to be matching
Any idea what might be causing this, this data goes to our auditors and need to match with screenshot vs csv
Its actually same query which is showing different results when seen from Kibana UI ( Stored search query ) and when exported using share feature into CSV
The number seems to be less in the CSV output when compared to number shown on UI.
Essentially i have added few important fields in the search and need to filter out two fields.
This variance seems to increase as the result count increases, 7 day 14 day output have much higher count missing in CSV as compared to 3 day or shorter terms
For the example shown above count on UI is 1,175,137 however the CSV export shows 7,95,837 rows
Not sure if this is related but i am seeing error while the report is running in the kibana logs
{"type":"log","@timestamp":"2024-03-12T03:25:44-04:00","tags":["error","plugins","taskManager"],"pid":3839139,"message":"[Task Poller Monitor]: Observable Monitor: Hung Observable restarted after 33000ms of inactivity"}
{"type":"log","@timestamp":"2024-03-12T03:28:01-04:00","tags":["error","plugins","taskManager"],"pid":3839139,"message":"[Task Poller Monitor]: Observable Monitor: Hung Observable restarted after 33000ms of inactivity"}
{"type":"log","@timestamp":"2024-03-12T03:30:39-04:00","tags":["error","plugins","taskManager"],"pid":3839139,"message":"Failed to poll for work: Error: work has timed out"}
{"type":"log","@timestamp":"2024-03-12T03:33:27-04:00","tags":["error","plugins","taskManager"],"pid":3839139,"message":"Failed to poll for work: Error: work has timed out"}
{"type":"log","@timestamp":"2024-03-12T03:36:47-04:00","tags":["error","plugins","taskManager"],"pid":3839139,"message":"[Task Poller Monitor]: Observable Monitor: Hung Observable restarted after 33000ms of inactivity"}
Can you set a fixed time interval, e.g. yesterday, and see if the numbers match? It looks like you are using a relative timeframe ("Last 7 days") that will continuously change? If the numbers do not match, can you check the range of timestamps in the exported data? Kibana does adjust timestamps based on the local timezone and I am wondering if it may be that the two methods apply different time periods behind the scenes. I do not use this functionality in Kibana so do not know for sure if this could be an issue or not.
I would recommend picking an interval of a single day so the amount of data is smaller. Then look through the generated CSV file and verify that the timestamps are all within the expected interval so you know whether it may be a timezone issue/difference.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.