I don't know for sure if this is a problem on Filebeat's side or if its from Logstash/Elasticsearch.
I have a setup of Filebeat, Logstash and Elasticsearch. I use filebeat as the input for my Logstash and Elasticsearch as the output. Filebeat is set to work with a text file I have. Filebeat.yml looks like this:
filebeat.prospectors:
- type: log
enabled: true
paths:
- c:\Users\00\Documents\text\*.txt
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
setup.template.settings:
index.number_of_shards: 3
setup.kibana:
output.logstash:
hosts: ["localhost:5044"]
Meanwhile my Logstash.conf looks like this:
input {
beats {
port => "5044"
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
action => "index"
hosts => [ "localhost:9200" ]
index => "extxt"
workers => 1
}
}
The document I'm trying to index is built up like this but with a lot of filler text:
Title
Paragraph 1
Paragraph 2
Paragraph 3
etc
The problem is that not all the paragraphs from the document is indexed. A few of them are while the rest aren't. Instead Elasticsearch seems to index a few "empty" fields.
Some of them looks like this:
"_index": "extxt",
"_type": "doc",
"_id": "jZwHumMBvqM7ycgjlzFj",
"_score": 1,
"_source": {
"@timestamp": "2018-06-01T06:27:22.841Z",
"offset": 483,
"@version": "1",
"host": "DESKTOP-AUVKQK5",
"source": """c:\Users\00\Documents\text\example.txt""",
"prospector": {
"type": "log"
},
"beat": {
"version": "6.2.4",
"name": "DESKTOP-AUVKQK5",
"hostname": "DESKTOP-AUVKQK5"
},
"tags": [
"beats_input_codec_plain_applied"
],
"message": "Tittel teksten. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. "
}
},
While others looks like this:
{
"_index": "extxt",
"_type": "doc",
"_id": "ipwHumMBvqM7ycgjljEb",
"_score": 1,
"_source": {
"@timestamp": "2018-06-01T06:27:22.841Z",
"offset": 979,
"@version": "1",
"host": "DESKTOP-AUVKQK5",
"source": """c:\Users\00\Documents\text\example.txt""",
"prospector": {
"type": "log"
},
"beat": {
"version": "6.2.4",
"name": "DESKTOP-AUVKQK5",
"hostname": "DESKTOP-AUVKQK5"
},
"tags": [
"beats_input_codec_plain_applied"
],
"message": " "
}
},
As you can see not everything from the document was included and some of the fields are blank.
Why is this happening and how could I fix it? I can provide additional information if required.