Data disparity between Kibana and database

(Karlisson) #1


I'm using packetbeat with pf_ring to gather data from Mysql INSERT's and index it in ES, showing on Kibana. pf_ring works on a machine which has 20k packets per second and in total sums 280GB per day. It is an ElasticSearch cluster with 3 nodes m3.3xlarge.

The problem is: there's a difference between the data shown in Kibana and the data in MySQL.
Below, you can see this difference (Index/Database) in one day, per hour (0h-23h):

What possibly can be causing this?

(Tudor Golubenco) #2

Not sure I understand this, what are the numbers in the diff column? Number of rows/documents?

(Sandro Lourenço) #3

Hi @tudor !

In fact the issue here is that we have many servers receiving data from packetbeat strait to this 3 elastic server cluster, on top of a loadbalancer for saving data.
In this box, we have something like 20k tcp packets/s.
On the others we have something obout 10K packets/s in a total of 42 servers.
This totalizes about 300gb of data and 260M documents per day.

Now the issue we have is that we are missing information from the servers; those numbers just indicates the amount missed in da single day only for mysql packets.
We don't know if the problem is on packetbeat dropping data, the elasticsearch time outing the saves or even something about the pf_ring integration; since pf_ring should provide 50k packets/s i think the problem is not it.

  • How can I debug the integration of pf_ring and packetbeat?
  • How can I debug the packetbeat parsing ?
  • How can I debug the data from packetbeat to elasticsearch?
  • and finally but not least, how can I debug elasticsearch for dropping saves or indexing ?

We a planing to build a HUGE server with 5 nodes and 100TB of data and 100 billion documents.

many thanks.

ps. we are using packetbeat 1.0.1 compiled from source for pf_ring support.

(system) #4