Hi,
Sorry for the text, but I tried to detail my problem as much as possible to make it clear.
I am part of a medical devices team. We run hundreds of tests over the devices to assure the quality and we save the results in a .txt file.
In that file, each line represent a campaign of tests where we have the informations about the campaign, the tests, the results of each test and the comments of each test as well.
To have an idea, is something like (each ";" represent a new field):
* Line 1 : Name_campaign; Timestamp; PC_host; OS; IP; PixDyn_version; Test1; Result1; Comment1; Test2; Result2; Comment2
* Line 2 : Name_campaign; Timestamp; PC_host; OS; IP; PixDyn_version; Test1; Result1; Comment1; Test2; Result2; Comment2; Test3; Result3; Comment3; Test4; Result4; Comment4
As you can notice, the number os tests can vary from line to line. Expecting to make my life easier, I created a .py to modify the file. The .py get the line with the maximum number of fields (X) and complete the other lines with an empty string ('') until X. Like that, I could use just one grok filter to match each occurrence instead of have one filter per line.
My grok filter is a bizarre thing that looks like that :
grok {
patterns_dir => ["/patterns"]
match => { "message" => [ "^%{NUMBER:VersionTableauStat};(?<SessionName>%{YEAR}\_%{MONTHNUM}\_%{MONTHDAY}\__%{HOUR}\_%{MINUTE}\_%{SECOND});%{HOSTNAME:PCName};%{CISCO_REASON:OS};%{IP:Host_DLL};%{IP:NIOS};%{IP:FPGA1};%{IP:FPGA2};%{IP:PULL_DLL};%{WORD:SN_PU};%{WORD:SN_Détecteur};%{WORD:SignOn};(?<PULB_PN>%{AD_TYPE:AD};%{WORD:Test_name};%{RESULT_TEST:Result};%{COMMENT_TEST:Comment})" ] }
}
That is the short version, because, for example, in one project I can have 150 tests, i.e., the patterns {WORD:Test_name};%{RESULT_TEST:Result};%{COMMENT_TEST:Comment} will be replicated 150 times. Before someone ask, I created a regex to the last two patterns and they work (not the problem).
With that grok I was expecting to have in the variables "Test_name", "Result" and "Comment" the name of all the tests I ran in one campaign, its results and comments respectively. Like that I could use Kibana or Grafana to visualize which tests failed in some campaign and to monitore in real time.
And the grok works if I do not have a large amount of tests. When I run over a campaign that has 20 tests, for example, the grok matches. But when I have a huge amount of tests, like 150, it shows "groktimeout". To avoid that, I put "timeout_milis => 0" and the file is running for 2:30 hours and no results yet.
My question is : there is a way to make the process faster/to optimize the filter/an easier solution ?
Thank you for read it. .