Apache log imports importing only small amount of logs?

Hi everyone,

I'm using Logstash to import around 464 MB of logs. You can have a look at my logstash.conf here: https://gist.github.com/mensoh/932968e45395420a0e28aa1b7a13c0e0

As you can see the output is also sent to stdout and there I see thousands of entries flashing by but when I go look at Kibana I only see around 164 entries (with timerange set to past 5 years).

ES logs show no errors, logstash logs show no errors. I would expect to see thousands of entries from all the separate lines in the logfiles. What am I missing?

Thanks a lot,
Menso

My guess would be that your document is is not unique or that the field does not exist, causing lots of documents to get the same is an be updated instead of inserted.

Thanks for your reply! How would I check this and/or remedy this? I'm using standard apache log parser so assumed it would just work with apache logs :thinking:

If this is the problem you should see updates taking place if you look at the cat indices API. If this is the issue send events to ardour or file and check the fields present to find the error.

Not sure about updates but there's a docs.deleted which is high? See:

health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana             MPxywALaTHmN8j5v1fZ68A   1   0          2            0     12.6kb             12.6kb
yellow open   logstash-2017.12.16 40M-4BFHS4iUXuAjyDH7sA   5   1          1         2175    755.7kb        755.7kb
yellow open   logstash-2017.11.12 lGQD5y3uSiipb9DyrGlb2g   5   1          1         1461    530.5kb        530.5kb
yellow open   logstash-2018.02.10 zdEixWK0RqCuULTmPUaFCA   5   1          1         2598    924.1kb        924.1kb
yellow open   logstash-2017.11.07 9wLi8LMqRJyKcJHoVkh09w   5   1          1         1305      504kb          504kb
yellow open   logstash-2017.12.05 8gxkxww7TBmgCmUr-SJvmA   5   1          1         2154    665.6kb        665.6kb

and so on and so on... Is this caused by the documents being updated and if so, how would I prevent this from happening?

Thanks for your patience =)

Yes, that indicates updates or deletes taking place. To fix this you need to look at your data and fix the document ID field. Another way is to not specify an ID and let Elasticsearch assign one.

Thanks for your reply again!

I don't think I'm sending document ID fields to ES, you can see an example of my logstash output as rendered to stdout here: https://gist.github.com/mensoh/bf50b0e1ea8b3655e5b0b3a156d4a69b

The logstash.conf I'm using is here https://gist.github.com/mensoh/932968e45395420a0e28aa1b7a13c0e0 and also doesn't mention any ID fields.

So it seems I might be submitting it without knowing it or ES is not coming up with document IDs when it should?

Thanks again

You are setting ‘document_id’ in your Elasticsearch output. It is the field used there I suspect is problematic.

Doh, I feel like a huge idiot for not spotting that. Thanks a lot, it's all working now! :+1:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.