i’m facing a strange issue right now. In my Logstash pipeline, I applied a Grok filter to the raw message and generated a field named responseBody from it.
the second filter, I use a JSON filter to parse responseBody. from this parse, there is an array field named “data”. The value format of this field was like data:[{…},{…}]
The third filter uses the split filter with an if condition to check if the field is actually an array. from this filter, there is a field generated named data.endpoint
if [data] and [data][0] {
split {
field => "data"
}
}
the problem is, the value of data.endpoint always the same on each document in prod. because in data array, each element has its own endpoint. That’s why this has become a problem
while I tested on my environment, using the same filter and the same order, it gave me the correct output. all values of data.endpoint field was different
if you think this is the problem. No, it’s not. i tried to change the target to dawa to see if it’s applied to the log or not. and the result is the field name remains the same (data.endpoint)
it means that the last JSON filter with that if condition will not touch or modify the log
You would need to provide more context, what is the pipeline that it is working and what is the pipeline that is not working? You mention that the only difference is the grok pattern, but what is the difference?
Also, the grok filter you shared is unnecessary, you are just matching everything from a field and storing in another field, a simple rename or copy would work.
Can you share a sample of your message a well so this can be replicated?
yes, because it’s in DEV, so I was trying to get straight to the point. but fortunately, I found the core issue. The issue is with document_id in the output section. since the pipeline has a split filter in it, it only splits the array element into separate documents. The problem is, I configured the pipeline to use a custom_id made by the filebeat processor, and after the split filter, that custom_id remains the same. That’s why in production, I got the same [data][endpoint] for all documents. It’s because they just got replaced by other documents from the split filter.
then I chose to use a fingerprint filter to modify custom_id value
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.