We currently have a pipeline setup in our environment to fetch nginx logs to kibana. In the web server, we have a filebeat configuration to push the nginx logs to logstash server for processing. In logstash server, there is a logstash conf file which parses the nginx logs and pushes it to elastic search. We are facing an issue with the parsed data.
when we view the index data in kibana via discover not entire data is available. Each of the parsed message has around 30 fields. But at the time of viewing the data via discover, could see only 9 fields data available`
The data gets updated over a period of time. After some time (after 4 hours), Im able to view the entire 30 fields. Need help in identifying the cause for this issue. Why there is a time delay in displaying the entire data. We are now observing that the time delay to display the entire data seems to be greater than 12 hours.
Analysis done
We have checked the Elastic data nodes and the status of the cluster. The cluster is green and there is no space related issues. THe data node is only 75% full.
Need some pointers on how to fix this issue.
Any help on the above is much appreciated.
Hi, can you share your logstash configuration file?
Is this happening for single documents? Or do you mean that newer documents have fields that older documents do not?
Can you check the Elasticsearch index settings when you see missing fields, and check again when more fields are availble, and see if the mappings are changing over time?
@tsullivan Thanks a lot for your response. The issue is happening for all the documents in this index. To reiterate clearly, when I refresh the index in discover tab for last 15 mins.
when I refresh the index in discover tab for last 15 mins. I see data flowing in, The specified index has around 40 fields. But the fields visible and for which data has been populated are only 10.Also under the filter by type link, Hide missing fields is enabled.
I checked the settings of the index after viewing the data in discover using the below command and getting the below response
GET /ingress-nginx-2021.07.24/_settings
This issue is happening for all the documents in the specific index.Currently all fields are not populating . But after a lag of say 4 to 5 hours, the missing fields get updated automatically. The lag time also varies. I have also observed, during some point, all the 40 fields are getting populated but that is very random.
The logstash scrip that we are using for parsing this is below
The issue is that the data indexed in Elasticsearch does not have mappings defined for the fields. Elasticsearch gives the index a default mapping, and only for fields in the data that are present as they come in. Every time new data contains fields that weren't previously in the mapping, the mapping changed to add a default mapping for the new fields.
Keep in mind, the API to look at here is _mapping, not _settings.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.