Some apache log fields appear as “pairs” in duplicated records

Hi,

I have a setup with Logstash forwarding Apache access logs to Elastic Search for analysis in Kibana.

As this is a shared server (WHM/Cpanel) I forward the COMBINEDVHOST log, grok it with " %{IPORHOST:vhost}:%{POSINT:port} %{COMBINEDAPACHELOG}" pattern to extract the virtual host name.

Everything is working fine, except for some entries which seem to generate three different records in Kibana, with one making some fields appear as "pairs". Better with an example from Kibana discover panel (I masked the clientip but it's the same everywhere):

Time response bytes clientip
July 29th 2015, 14:02:23.000 ["404","404"] ["14202","14202"] ["x.x.x.x","x.x.x.x"]

July 29th 2015, 14:02:23.000 404 14202 x.x.x.x

July 29th 2015, 14:02:23.000 404 14202 x.x.x.x

So in this three records every fields are exactly the same, except for the one that appear as "pairs", but with the same content as the other two lines. It does this only on these fields:

  • agent
  • bytes
  • clientip
  • geoip.coordinates
  • geoip.location
  • httpversion
  • ident
  • port
  • referrer
  • request
  • response
  • timestamp
  • verb
  • vhost

What I don't understand is that it does it only for certain entries, not all, and I cannot find any common pattern... What I fear is how it will impact some results when counting hits or adding bytes...

Has anyone experienced something similar?

Thanks.

This likely isn't a KB problem, but an ingestion one.

Can you paste/link to the document in question?

Thanks for your answer. That made me look further into my Apache configuration and part of the error was there as it's sending the logs three times (was difficult to see because it generates so many logs that the triplets were most of the times far apart). I will first correct this (it's in production so I must wait for maintenance window) and see if it corrects the problem.