Some paragraphs of document missing in ES index


#1

I don't know for sure if this is a problem on Filebeat's side or if its from Logstash/Elasticsearch.

I have a setup of Filebeat, Logstash and Elasticsearch. I use filebeat as the input for my Logstash and Elasticsearch as the output. Filebeat is set to work with a text file I have. Filebeat.yml looks like this:

filebeat.prospectors:

- type: log
  enabled: true
  paths:
    - c:\Users\00\Documents\text\*.txt
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
setup.template.settings:
  index.number_of_shards: 3
setup.kibana:
output.logstash:
  hosts: ["localhost:5044"]

Meanwhile my Logstash.conf looks like this:

input {
    beats {
       port => "5044"
    }
}
output {
	stdout { codec => rubydebug }
    elasticsearch {
		action => "index"
        hosts => [ "localhost:9200" ]
		index => "extxt"
		workers => 1
    }
}

The document I'm trying to index is built up like this but with a lot of filler text:

Title

Paragraph 1

Paragraph 2

Paragraph 3

etc

The problem is that not all the paragraphs from the document is indexed. A few of them are while the rest aren't. Instead Elasticsearch seems to index a few "empty" fields.

Some of them looks like this:

    "_index": "extxt",
    "_type": "doc",
    "_id": "jZwHumMBvqM7ycgjlzFj",
    "_score": 1,
    "_source": {
      "@timestamp": "2018-06-01T06:27:22.841Z",
      "offset": 483,
      "@version": "1",
      "host": "DESKTOP-AUVKQK5",
      "source": """c:\Users\00\Documents\text\example.txt""",
      "prospector": {
        "type": "log"
      },
      "beat": {
        "version": "6.2.4",
        "name": "DESKTOP-AUVKQK5",
        "hostname": "DESKTOP-AUVKQK5"
      },
      "tags": [
        "beats_input_codec_plain_applied"
      ],
      "message": "Tittel teksten. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. "
    }
  },

While others looks like this:

  {
    "_index": "extxt",
    "_type": "doc",
    "_id": "ipwHumMBvqM7ycgjljEb",
    "_score": 1,
    "_source": {
      "@timestamp": "2018-06-01T06:27:22.841Z",
      "offset": 979,
      "@version": "1",
      "host": "DESKTOP-AUVKQK5",
      "source": """c:\Users\00\Documents\text\example.txt""",
      "prospector": {
        "type": "log"
      },
      "beat": {
        "version": "6.2.4",
        "name": "DESKTOP-AUVKQK5",
        "hostname": "DESKTOP-AUVKQK5"
      },
      "tags": [
        "beats_input_codec_plain_applied"
      ],
      "message": " "
    }
  },

As you can see not everything from the document was included and some of the fields are blank.

Why is this happening and how could I fix it? I can provide additional information if required.


(Noémi Ványi) #2

Could you please share example logs you are sending? Also, could you share the debug logs of Filebeat? (./filebeat -e -d "*")


(Noémi Ványi) #3

Is it possible that the empty messages are the empty line between paragraphs?


#4

That could be and seems likely. I still don't understand why all the paragraphs aren't present though. I ran it another time and this time its only the last paragraph that is missing (earlier two of them were missing).

Not sure about how the debug logs work. Is this what you're looking for?

2018-06-01T12:55:41.653+0200 INFO instance/beat.go:468 Home path: [C:\Program Files\Filebeat] Config path: [C:\Program Files\Filebeat] Data path: [C:\Program Files\Filebeat\data] Logs path: [C:\Program Files\Filebeat\logs]
2018-06-01T12:55:41.824+0200 DEBUG [beat] instance/beat.go:495 Beat metadata path: C:\Program Files\Filebeat\data\meta.json
2018-06-01T12:55:41.829+0200 INFO instance/beat.go:475 Beat UUID: 057d6a16-268c-45e0-863d-025229cc0564
2018-06-01T12:55:41.830+0200 INFO instance/beat.go:213 Setup Beat: filebeat; Version: 6.2.4
2018-06-01T12:55:41.831+0200 DEBUG [beat] instance/beat.go:230 Initializing output plugins
2018-06-01T12:55:41.831+0200 DEBUG [processors] processors/processor.go:49 Processors:
2018-06-01T12:55:41.832+0200 INFO pipeline/module.go:76 Beat name: DESKTOP-AUVKQK5
2018-06-01T12:55:41.837+0200 INFO instance/beat.go:301 filebeat start running.
2018-06-01T12:55:41.837+0200 INFO [monitoring] log/log.go:97 Starting metrics logging every 30s
2018-06-01T12:55:41.837+0200 DEBUG [service] service/service_windows.go:51 Windows is interactive: true
2018-06-01T12:55:41.846+0200 DEBUG [registrar] registrar/registrar.go:90 Registry file set to: C:\Program Files\Filebeat\data\registry
2018-06-01T12:55:41.847+0200 INFO registrar/registrar.go:110 Loading registrar data from C:\Program Files\Filebeat\data\registry
2018-06-01T12:55:41.848+0200 INFO registrar/registrar.go:121 States Loaded from registrar: 1
2018-06-01T12:55:41.848+0200 WARN beater/filebeat.go:261 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2018-06-01T12:55:41.849+0200 INFO crawler/crawler.go:48 Loading Prospectors: 1
2018-06-01T12:55:41.848+0200 DEBUG [registrar] registrar/registrar.go:152 Starting Registrar
2018-06-01T12:55:41.849+0200 DEBUG [processors] processors/processor.go:49 Processors:
2018-06-01T12:55:41.850+0200 DEBUG [prospector] log/config.go:178 recursive glob enabled
2018-06-01T12:55:41.851+0200 DEBUG [prospector] log/prospector.go:120 exclude_files: []. Number of stats: 1
2018-06-01T12:55:41.852+0200 DEBUG [prospector] file/states.go:51 New state added for c:\Users\00\Documents\test\example.txt
2018-06-01T12:55:41.852+0200 DEBUG [prospector] log/prospector.go:141 Prospector with previous states loaded: 1
2018-06-01T12:55:41.852+0200 DEBUG [registrar] registrar/registrar.go:228 Processing 1 events
2018-06-01T12:55:41.852+0200 INFO log/prospector.go:111 Configured paths: [c:\Users\00\Documents\test*.txt]
2018-06-01T12:55:41.852+0200 DEBUG [registrar] registrar/registrar.go:200 Registrar state updates processed. Count: 1
2018-06-01T12:55:41.853+0200 DEBUG [prospector] prospector/prospector.go:87 Starting prospector of type: log; ID: 4595844107640979923
2018-06-01T12:55:41.853+0200 DEBUG [registrar] registrar/registrar.go:218 Registrar states cleaned up. Before: 1, After: 1, Pending: 0
2018-06-01T12:55:41.853+0200 DEBUG [cfgfile] cfgfile/reload.go:95 Checking module configs from: C:\Program Files\Filebeat/modules.d/*.yml
2018-06-01T12:55:41.853+0200 DEBUG [prospector] log/prospector.go:147 Start next scan
2018-06-01T12:55:41.853+0200 DEBUG [registrar] registrar/registrar.go:259 Write registry file: C:\Program Files\Filebeat\data\registry
2018-06-01T12:55:41.870+0200 DEBUG [cfgfile] cfgfile/reload.go:109 Number of module configs found: 0
2018-06-01T12:55:41.871+0200 DEBUG [prospector] log/prospector.go:362 Check file for harvesting: c:\Users\00\Documents\test\example.txt
2018-06-01T12:55:41.871+0200 INFO crawler/crawler.go:82 Loading and starting Prospectors completed. Enabled prospectors: 1
2018-06-01T12:55:41.871+0200 INFO cfgfile/reload.go:127 Config reloader started
2018-06-01T12:55:41.873+0200 DEBUG [prospector] log/prospector.go:448 Update existing file for harvesting: c:\Users\00\Documents\test\example.txt, offset: 2422
2018-06-01T12:55:41.873+0200 DEBUG [cfgfile] cfgfile/reload.go:151 Scan for new config files
2018-06-01T12:55:41.873+0200 DEBUG [prospector] log/prospector.go:457 Resuming harvesting of file: c:\Users\00\Documents\test\example.txt, offset: 2422, new size: 2885
2018-06-01T12:55:41.875+0200 DEBUG [cfgfile] cfgfile/reload.go:170 Number of module configs found: 0
2018-06-01T12:55:41.875+0200 INFO cfgfile/reload.go:219 Loading of config files completed.
2018-06-01T12:55:41.876+0200 DEBUG [harvester] log/harvester.go:442 Set previous offset for file: c:\Users\00\Documents\test\example.txt. Offset: 2422
2018-06-01T12:55:41.876+0200 DEBUG [harvester] log/harvester.go:433 Setting offset for file: c:\Users\00\Documents\test\example.txt. Offset: 2422
2018-06-01T12:55:41.877+0200 DEBUG [harvester] log/harvester.go:348 Update state: c:\Users\00\Documents\test\example.txt, offset: 2422
2018-06-01T12:55:41.891+0200 DEBUG [prospector] log/prospector.go:168 Prospector states cleaned up. Before: 1, After: 1, Pending: 0
2018-06-01T12:55:41.891+0200 INFO log/harvester.go:216 Harvester started for file: c:\Users\00\Documents\test\example.txt
2018-06-01T12:55:41.892+0200 DEBUG [harvester] log/log.go:85 End of file reached: c:\Users\00\Documents\test\example.txt; Backoff now.

And by example log you mean the document? The example.txt? I can't post it in this message because of the text limit but I have a pastebin of its content here.


(Noémi Ványi) #5

There is no newline at the end of the file. Filebeat cannot create an event, because it waits for a newline character, that's how it knows that a line is ended. If you add a newline at the end of your last line, it should be flushed and sent to ES.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.