Multiline, Multi Field input question - newbie here


#1

I am putting together bits and pieces from examples to create my first custom filebeats input.

I have 10s of thousands of these files, that I would like to read into ES.

cdv_nrings=8
cdv_phone=16188835888
cdv_informat=NONE
cdv_tries=1
cdv_callTime=0
cdv_newApp=arcVXML2
cdv_retryInterval=0
cdv_initialScript=http://10.30.30.17:8080/pre/vui/aOut/1176825
cdv_applicationData=15740
# 2015/09/18 17:03:16 
##Fri Sep 18 17:03:59 2015
#OutboundRetCode:603 VXML Event: error.com.arc.tel_initiatecall.tel_failure

I was thinking that I want to define some group names to match my unique field names:
My regex doesn't work in the online testers, I cant see to describe the new line properly, maybe that's not my problem?

multiline.pattern: '=(?P<cdv_nrings>re\w+$)\n=(?P<cdv_phone>re\w+$)\n=(?P<cdv_informat>re\w+$)\n=(?P<cdv_tries>re\w+$)\n=(?P<cdv_callTime>re\w+$)\n=(?P<cdv_newApp>re\w+$)\n=(?P<cdv_retryInterval>re\w+$)\n=(?P<cdv_initialScript>re\w+$)\n=(?P<cdv_applicationData>re\w+$)\n#(?P<date>re\w+$)\n##(?P<daydate>re\w+$)\n#(?P<OutboundRetCode>re\w+$)'

Then in my filebeat.yml file I would match the group name to the field name:

- type: log
  enabled: true
  close_eof: true
  paths:
    - C:\OCS\work\0.CDF*
  fields:
    log_type: work_active
    cdv_nrings: cdv_nrings
    cdv_phone: cdv_phone
    cdv_informat: cdv_informat
    cdv_tries: cdv_tries
    cdv_callTime: cdv_callTime
    cdv_newApp: cdv_newApp
    cdv_retryInterval: cdv_retryInterval
    cdv_initialScript: cdv_initialScript
    cdv_applicationData: cdv_applicationData
    date: date
    daydate: daydate
    OutboundRetCode: OutboundRetCode

  multiline.pattern: '=(?P<cdv_nrings>re\w+$)\n=(?P<cdv_phone>re\w+$)\n=(?P<cdv_informat>re\w+$)\n=(?P<cdv_tries>re\w+$)\n=(?P<cdv_callTime>re\w+$)\n=(?P<cdv_newApp>re\w+$)\n=(?P<cdv_retryInterval>re\w+$)\n=(?P<cdv_initialScript>re\w+$)\n=(?P<cdv_applicationData>re\w+$)\n#(?P<date>re\w+$)\n##(?P<daydate>re\w+$)\n#(?P<OutboundRetCode>re\w+$)'
  multiline.negate: false
  multiline.match: before

I tested my config and that passed:

C:\filebeat-6.4.3-windows-x86_64>filebeat test config filebeat.yml
Config OK

Then I would set something up for Kibana Template but I have not got to this part at this time.

I am on the correct track here?
How does my regex look?

Thanks

Update:
I tried this regex and I got closer but not perfect

^(.+)=(.+)(\r\n\s+(.+))|^#\s(.+)(\r\n\s+(.+))|^##(.+)(\r\n\s+(.+))*|^#(.+):(.+)(\r\n\s

+(.+))


#2

Wow, really kissanime great work mate.Keep it up you are almost letgo on the level of perfection.

Regards,
Shane.


(Steffen Siering) #3

Have you got multiple consecutive events? Also use ---- separator (or some other marker) to show where exactly you want to split the multiline. Having some more samples helps in seeing and understanding a pattern.


#4

Thanks

The group of 12 lines at the top of the case equals one file.
We generate 100,000+ files a day all with the same layout.

Thanks I hope this helps


#5

I was reading up on how grok works, not that I have a grok log statement but I get the idea of moving the regex statement from identifying groups in regex and just creating a regex per feild and adding those statements to yml file.

I'll try my thought tonight an update the case.


#6

I updated the yml to use kv and I deleted the registry then ran filebeat,exe again.
It started but nothing loaded.

Thoughts please?

- type: log
  enabled: true
  close_eof: true
  paths:
    - C:\OCS\work\0.CDF*

     filter {
       kv {
         source => "message"
         field_split => "="
         <b>include_keys => ["cdv_nrings", "cdv_phone", "cdv_informat", "cdv_tries", "cdv_callTime", "cdv_newApp", "cdv_retryInterval", "cdv_initialScript", "cdv_applicationData", "date", "daydate", "OutboundRetCode"]</b>
         trim => "<>[],"
         trimkey => "<>[]," 
         }
      }

(Steffen Siering) #7

So one file == 1 event? In this case you don't need a complicated regex. Just try to capture everything, no matter the contents.

Grok or kv filter is not part of filebeat, but logstash or Ingest node. The filter config as used is a Logstash configuration. If you want to use it like this, publish the event to Logstash. If you want to turn your work into a filebeat module, better start with Ingest Node (config pipeline in elasticsearch output).


#8

Thanks

Is it best practice to have the message "cdv_informat=NONE" or should I create a field called "cdv_informat" with a sample value of "NONE" in this example?

Thanks


(Steffen Siering) #9

Better create a field. So you can query/search/visualise specific fields and kibana.