How to dissect/script/grok a long number into different fields using regex in an ingest pipeline?

Patr123 · March 25, 2022, 2:01am

Hello,
I have a number which needs to be dissected into different fields using regex. I think It needs to be regex because the first set of number changes in length.
I have been struggling to find a similar example anywhere, so can anyone help me in this?

My number field is like:

123400000123000056781

This needs to be separated into different fields like:

field1: 1234
field2: 00000
field3: 123
field4: 0000
field5: 5678
field6: 1

Sometime the field1 can have 1-4 digits. Field2 will always have 5 zeros, field3 will always 3 digits, field4 will always have 4 zeros, field 5 will always have 4 digits and then field 6 will always be a number 1 which indicates the end of string.

Thank you.

angelo · March 26, 2022, 10:05pm

I'm going to assume you don't need fields 2, 4 and 6 since you know that they will always be what you noted (00000, 0000, 1) and could set them directly if needed ... so that leaves just the other 3 fields.

For dissect: %{field1}00000%{field3}0000%{field5}1
For grok: (?<field1>(?>\d){4,5})00000(?<field3>(?>\d){3})0000(?<field5>(?>\d){4})1

NOTE: I didn't actually try these out in Logstash, I only quickly tested these in the online dissect tester and grok debugger.

Patr123 · March 27, 2022, 3:33am

Hello Angelo,
Thank you so much. Yes, you are absolutely right and I don't need the field #2, 4 and 6.
I tried the dissect filter and it works great only if the ending digit of the field1 is not ending with 0. If it ends with 0 then for some reason that 0 is added to the field3 which is not we want.
The grok pattern works in the grok debugger and dev tools but I am not able to add the same pattern in the ingest pipeline and its giving me a "Invalid JSON String" error message.

But your grok pattern helped me in creating my own regex in the following format:

(?<field1>([0-9]{1,4}))(?<field2>([0-9]{5}))(?<field3>([0-9]{3}))(?<field4>([0-9]{4}))(?<field5>([0-9]{4}))(?<field6>([0-9]{1}))

and I was able to get the result I needed.

Now that it works:
I can use this regex too:

(?<field1>([0-9]{1,4}))(?<field2>([0]{5}))(?<field3>([0-9]{3}))(?<field4>([0]{4}))(?<field5>([0-9]{4}))(?<field6>([1]{1}))

system · April 24, 2022, 3:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can dissect use a variable number of fields? Logstash	11	1369	October 24, 2019
GROK Pattern syntax error, working in GROK debugger but not pipeline Elasticsearch ingest-pipeline	2	443	June 27, 2023
Unable to Grok scientific notation numbers using ingest pipelines Elasticsearch	1	1363	June 12, 2018
How to add leading zeros to a numeric field using the ingest pipeline? Elasticsearch ingest-pipeline	3	1212	April 27, 2022
Ingest pipe line grok pattern with field name having spaces Elasticsearch ingest-pipeline	11	1423	February 9, 2021

How to dissect/script/grok a long number into different fields using regex in an ingest pipeline?

Related topics