Retrieve log line given offset

hding · June 29, 2017, 3:05pm

Since the size of our log files is huge, I choose not to store the original log lines, but instead store the filepath and the offset of the log line so as to retrieve the log lines later.

It says here that "(The exported field offset) is the file offset the reported line starts at." However, when I try to retrieve the log line by providing offset, I am actually getting the next log line. It seems that offset is pointing at the end of the log line instead of the beginning.

Do I need to read the log line backwards from offset? Is there a better solution?

steffens · June 30, 2017, 11:05am

Which filebeat version are you using?

Looks like a very unfortunate bug to me. You can follow the issue on github here: https://github.com/elastic/beats/issues/4587

As workaround: The string (in JSON) should be utf-8 encoded. Using offset - byte size of string - 1 (for newline character) should give you the lines start offset. In case you do some event processing in logstash/Elastic ingest pipeline, you can adjust the offset.

hding · June 30, 2017, 12:42pm

Hi Steffen,

I'm using Filebeat 5.4.1.

In the source code, these are lines 239-277:

    // Get copy of state to work on
    // This is important in case sending is not successful so on shutdown
    // the old offset is reported
    state := h.getState()
    state.Offset += int64(message.Bytes)

    // Create state event
        data := util.NewData()
		if h.source.HasState() {
			data.SetState(state)
		}

		text := string(message.Content)

		// Check if data should be added to event. Only export non empty events.
		if !message.IsEmpty() && h.shouldExportLine(text) {

			data.Event = common.MapStr{
				"@timestamp": common.Time(message.Ts),
				"source":     state.Source,
				"offset":     state.Offset, // Offset here is the offset before the starting char.
			}
			data.Event.DeepUpdate(message.Fields)

			// Check if json fields exist
			var jsonFields common.MapStr
			if fields, ok := data.Event["json"]; ok {
				jsonFields = fields.(common.MapStr)
			}

			if h.config.JSON != nil && len(jsonFields) > 0 {
				reader.MergeJSONFields(data.Event, jsonFields, &text, *h.config.JSON)
			} else if &text != nil {
				if data.Event == nil {
					data.Event = common.MapStr{}
				}
				data.Event["message"] = text
			}
		}

Line 243 first adds the current message length to state.offset:

state.Offset += int64(message.Bytes)

Then line 259 sets the exported field "offset" to state.offset:

"offset":     state.Offset, // Offset here is the offset before the starting char.

If we change the order of these two sections of code, would it solve the issue?

steffens · June 30, 2017, 8:49pm

Thanks. We're still wondering when this bug was introduced

Feel free to open a PR: https://github.com/elastic/beats/pulls
Also see contribution guide if you want to provide a PR with fix and system test.

system · July 20, 2017, 3:05pm

This topic was automatically closed after 21 days. New replies are no longer allowed.

Topic		Replies	Views
Regarding Filebeat Offset Value Beats filebeat	9	14357	April 24, 2017
Filebeat 7.14.0 filestream input field log.offset is character count of the line Beats filebeat	5	1307	September 29, 2021
Restore the sequence of the events Beats	10	2706	December 22, 2016
Mysterious 'offset' and 'count' fields Beats filebeat	4	4612	July 5, 2017
Parsing typical logback log lines Beats filebeat	2	978	February 27, 2018

Retrieve log line given offset

Related topics