Performance Tuning for related docs

joaociocca · March 31, 2020, 2:55am

Hey, it's me again =D so, in my quest to crush those damn RRAS logs I've found myself in a pickle. I had read somewhere that if I wanted my registries to be read in a linear stream I had to setup the pipeline to use a single worker only.

But I think that's not being enough for this. RADIUS outputs some packet types almost at the same time - and the logs I have to deal with don't go further than seconds in precision. Besides, some packet types don't have all the information - for example, hostname and IP.

BUT they all have usernames and account session ID to tie it together. So, an Elasticsearch filter lookup using those two parameters returns them nicely. Except that doesn't seem to play nice with a very fast pipeline? I mean, when I manually input docs thru the HTTP input plugin, everything is beautiful, the lookup works just fine and all the fields are there. When I clean up and turn on the test file input, NONE of the docs ingested show the lookup fields.

Example:

Not only are those docs completely out of order (because, up to the second they actually did happened at the same time) but they also lack the lookup field SourceIP for packet type 2 (Inicio de Conexão com Sucesso), which should have been filled up looking up the SourceIP field from packet type 1 (Solicitação de Conexão).

When done manually with HTTP plugin, this is what happens:

See how nicely done it is? So, I understand I probably need to work on improving the Elasticsearch's indexing rate and/or slow down the pipeline throughput speed, is that correct..? And how could I achieve this?

system · April 28, 2020, 3:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.