Message Created Twice in Elasticsearch via Logstash

When I ingest document via logstash, I found the documents are created twice.

Setup:
Filebeat -> Logstash -> Elasticsearch
Result:
Same messages are ingested into Elasticsearch with different _id

So I tried the following testings:

Testing 1: Inget Document directly from Filebeat
i.e. Filebeat -> Elasticsearch
Result: 1 document is created
Finding: the issue should be related to logstash or elasticsearch

After reading the article: https://www.elastic.co/blog/logstash-lessons-handling-duplicates

I am thinking to provide unique document ID to see if I could prevent any duplicated messages into elasticsearch.

For this testing, I also turned on the debug log to confirm how many messages are sent from logstash to elasticsearch.

Testing 2: Enhance the output filter in logstash to provide unique document ID
Result: 2 documents are created. (still cannot resolve my issue)

Document 1: the document id is my document ID.
Document 2: The other document is using the un-subtitled format. (i.e.. %{[@metadata][fingerprint]}).

In the log of logstash, I found only one message was sent to elasticsearch.

I suspect the problem would be the elasticsearch or its setting.

Any folks here experienced this problem before? thanks.

Are you running Logstash as a service? Do you have more than one logstash config file in the config directory? Be aware that Logstash concatenates all config files it finds in the directory into a single logical pipeline, so if you have more than one file containing the same Elasticsearch output, all data will go to all outputs. If you therefore update one to use the fingerprint but not the other, you will see exactly what you are describing.

If you want to use multiple configuration files and keep them separate you need to either control the flow through conditionals or configure them as multiple separate pipelines.

Thanks for your reply.

My logstash is running as a service. I installed it via yum. just upgraded the version to 6.3.2
2. When stop the logstash, I found the listen port is closed. When I start the logstash, the dedicated TCP port is opened and listening. I think only one process can open and listen with a TCP port only.

For more than one process is running... here is my though:

Case 1. More than one Filebeat Process is running
I just configure one filebeat.yml about this machine. In logstash, I can see one message is sent into logstash.

Case 2. More than one Logstash Process is running
I doubt it really works because the 2nd process should unable to be started because the TCP is listened by a process already. So I think just one Logstash process will take care the message sent via that TCP Port.

What is the full contents of your conf directory? Do you have more than one file there?

I have two conf files under /etc/logstash/conf.d for two purposes:

Logstash Config 1

  • Open and Listen TCP 5044
  • Filebeat will be send logs from nginx and apache to Logstash:5044.

Logstash Config 2.

  • Open and Listen TCP 5055
  • Filebeat will watch a folder and send logs to Logstash:5055.

I tried to disable the config 1. (i.e. just one config file in /etc/logstash/conf.d) and I found Logstash just ingest message for one time only. (i.e. "duplicated messages" is NOT found)

If I enable both config files, "duplicated messages" issue will be found.

Note:
I tried to search logstash.yml in the machine and found only one logstash.yml is found. i.e. /etc/logstash/logstash.yml.

Exactly. If you have two files there, each with an Elasticsearch output, these will be concatenated into one single pipeline and all events will go to all outputs. As mentioned earlier you need to use either conditionals or multiple pipelines to control this.

Thanks for your clarification, Christian.

Let me double check the log in Logstash whether it will show the "concentrated" pipeline or try to warn me "same output appeared twice and a single message will be sent to same destination for more than one time".

That is how it works and it will not warn you. If you have X-Pack monitoring installed you should be able to see the resulting pipeline using the pipeline viewer and be able to verify that it contains 2 separate elasticsearch outputs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.