Some lines sent to Logstash are truncated

Pierre_Vincent_Ledou · March 24, 2017, 2:22pm

Hi,

I'm parsing a lot of old logs files. All logs are in gz, so I have to uncompress then move it a folder watched by filebeat.

On about 30millions of entries, I have about 1300 failures in logstash logs. I'm logging the messages so I can see that Logstash received a partial line, the line is truncated randomly.

I doubled check to ensure that I don't have any special characters or so in my logs. So why Filebeat is sending partial lines?

ruflin · March 27, 2017, 8:16am

Could you share the following?

Filebeat configs
Filebeat logs
Filebeat version
Logstash config
An example of a line that was "partial"

Pierre_Vincent_Ledou · March 27, 2017, 8:26am

Yes no problem, but I would prefer to send it in MP if it's ok for you?

ruflin · March 28, 2017, 2:09pm

You mean the logs? That is ok for me. For the other files it should be possible to post them here (but remove passwords ).

Pierre_Vincent_Ledou · April 19, 2017, 7:44am

Ops, sorry for the delay, I missed your reply.

filebeat.yml

filebeat:
  prospectors:

      - paths:
          - /var/data/level3/beats/*.log
        document_type: level3_log
        exclude_lines: ['^#']
        close_inactive: 10s

  registry_file: /var/data/filebeat_registry

logging.level: info
logging.metrics.enabled: false
logging.to_files: false
logging.to_syslog: false

output:
  logstash:
    hosts: ["logstash:5044"]

logstash.yml

config.reload.automatic: true
config.reload.interval: 5
queue.type: persisted
path.queue: /usr/share/logstash/queue
path.logs: /usr/share/logstash/log
pipeline.workers: 8
pipeline.batch.size: 2500
pipeline.batch.delay: 5
http.host: "0.0.0.0"
xpack.monitoring.enabled: true
xpack.monitoring.elasticsearch.url: http://elasticsearch:9200

The logstash filters and groks are quite heavy, I'm zipping it in MP right now with some logs samples.

Thanks for you help!

Cheers,

Pv

ruflin · April 24, 2017, 7:15am

Thanks for the data. Could you provide an example message which was truncated? Also I was looking for the Filebeat logs. Do you see anything special in there?

Is the volume you read logs from a shared drive and somehow mounted or a local disk?

If you write the log output to file instead of LS, do you still see it happening?

ruflin · April 24, 2017, 7:16am

BTW: Which filebeat, logstash, logstash-beats-input version are you running?

Pierre_Vincent_Ledou · April 25, 2017, 7:42am

I'm now on 5.3 for all the stack except Filebeat that is still in 5.2.2.

I'm running filebeat on a unique node as a docker container but this issue was already existing when filebeat was running directly on the host).
The disk is not a ssd but a raid 5 sata.

Filebeat is streaming to 3 Logstash nodes (a container on the same node and 2 others remote).

I will try to make some test to write on disk directly today or tomorrow.

I send you the failure log in MP.

ruflin · April 26, 2017, 12:59pm

Quite often such behaviour can come from shared drives, but inside docker when on Linux should be ok.

Other ideas:

How do you remove the files after you index them?
Could it be that you some inode reuse issue? When do you remove old files?

Pierre_Vincent_Ledou · April 26, 2017, 3:01pm

I have script reading the registry. If the offset = file size, I delete it

I have close_inactive: 10s in the config, and my script is running every minute.

Cheers,

Pv

ruflin · April 28, 2017, 8:02am

Could it be that your partial lines come actually from an other file because it reuses the inode? We had a similar case here: https://github.com/elastic/beats/issues/714#issuecomment-295329605 If that is the case, I recommend you to first move the files to an other place to clean up the registry and then remove the files later. This will prevent the inode reuse.

Pierre_Vincent_Ledou · May 19, 2017, 2:01pm

Hi, sorry for the late answer, I was waiting to be sure that the issue was resolved. Now I'm moving finished logs to a tmp dir instead of delete them, and I think that solved my problem.
Instead of inode, wouldn't possible to use file path? Or make it configurable for user like me that parse logs not in real time?

ruflin · May 22, 2017, 10:36am

Glad that solve the problem.

About using path as identifier instead of inode: Agree. This should be an option to configure or even be a separate prospector type for example file where it is assumed that files are never renamed or data is never appended. Feel free to open a feature request for this on Github.

Pierre_Vincent_Ledou · May 22, 2017, 11:39am

I will Thanks a lot!

Pierre_Vincent_Ledou · May 22, 2017, 11:51am

Done: https://github.com/elastic/beats/issues/4368

system · June 19, 2017, 11:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Messages truncated Logstash	4	3947	May 9, 2017
FIleBeat not pushing few lines to Logstash Beats filebeat	4	418	December 18, 2021
Filebeat truncates output data Beats filebeat	8	1996	January 2, 2020
Log message truncated at 32k Beats filebeat	2	1587	May 6, 2019
Sometimes beginning of log line is missing resulting in _jsonparsefailed tag Beats filebeat	9	1220	January 9, 2020

Some lines sent to Logstash are truncated

Related topics