Parse a comma separate value that contains a comma in it

d-ring · March 23, 2021, 12:29am

Evening All,

I am trying to parse out a log entry that is comma separated, but if the SOMEUSER field exists will have a comma in it as well that should be ignored.

The fields that exist in some fashion are SOMEUSER, SOMENETWORK, SOMENETWORK

Examples of the format of the logs are:

"Smith, John A. - Some Business Title (Smi,SOMENETWORK,SOMECOMPUTER"
"Smith, John A. - Some Business Title (Smi,SOMECOMPUTER"
"SOMECOMPUTER,SOMENETWORK"
"SOMECOMPUTER"
"SOMENETWORK"

I do have a identity_type field that will tell me what kinds of identities are in the csv list that I need to split out, but I am not sure how to skip the first comma if the AD User identity field is present.

Formats I have seen in the logs:
SOMEUSER,SOMECOMPUTER
SOMEUSER,SOMENETWORK,SOMECOMPUTER
SOMECOMPUTER,SOMENETWORK
SOMEUSER,SOMENETWORK
SOMENETWORK
SOMECOMPUTER
SOMEUSER

I am looking to see if there is a way to skip the first comma if the SOMEUSER value exists.

My thoughts were to use an if statement to process things depending on what identity types are in the appropriate field, which works for everything except when the SOMEUSER field with the comma in it messes it all up.

Supermathie · March 23, 2021, 1:18am

Instead of trying to write the parsing code yourself, perhaps Csv filter plugin | Logstash Reference [7.11] | Elastic would be a good fit for your needs?

d-ring · March 23, 2021, 2:30pm

That was the first thing I had used and it does not skip the first comma when the username is there. That is what I am trying to figure out how to handle.

Smith, John would be taken as the first and second fields instead of just the first field.

Supermathie · March 23, 2021, 3:37pm

Oh - the CSV being parsed is badly formed. Can you have the upstream source fix that?

If not, you could probably do the equivalent of an rsplit with a split limit of 3, except for e.g. this case that messes that idea up:

Smith, John A. - Some Business Title (Smi,SOMECOMPUTER

You need a way you can programmatically differentiate that from:

SOMEUSER,SOMENETWORK,SOMECOMPUTER

If you have rules such as:

user fields may contain spaces
computers only container uppercase letters
networks may contain lowercase letters

then you're in luck, you could do something like this:

grok rules:

SOMEUSER %{DATA}
SOMECOMPUTER [A-Z]+
SOMENETWORK [a-z]+

grok capture:

^%{SOMEUSER:someuser}(?:,%{SOMECOMPUTER:somecomputer})?(?:,%{SOMENETWORK:somenetwork})?$

would give you, for example:

{
  "somecomputer": "SOMECOMPUTER",
  "someuser": "Smith, John A. - Some Business Title (Smi"
}

d-ring · March 23, 2021, 3:56pm

I have a feature request in to fix the upstream data, but I am not holding my breath.

User names are mixed character with spaces.
Computer names are in uppercase, but also contain numbers. Something like a Dell Service Tag.
Network names have a mix of upper, lower and contains spaces.

I'll see if I can figure it out with grok, was just hoping for some awesome command I had not been able to find that says skip the first comma.

Supermathie · March 23, 2021, 4:52pm

Instead of skipping the first comma, include it as a separate field (user part 1, user part 2) and then combine it?

Or, if it's always there, use:

SOMEUSER [^,]+, [^,]+

If not, perhaps:

SOMEUSER [^,]+(?:, [^,]+)?

I'd suggest making a corpus of your log lines and test cases for your regexes.

system · April 20, 2021, 4:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to parse a csv file which has some field containing seprator (comma) as-values Logstash	5	753	April 14, 2020
Parsing CSV => columns semicolon separated => values with commas Logstash	4	2981	June 10, 2018
Special occurences Logstash	7	346	August 30, 2018
How can i parse the data with comma seperated in logstash Logstash	5	810	November 22, 2018
Parsing csv file which has some field/column containing separator as values Logstash	4	4674	July 6, 2017

Parse a comma separate value that contains a comma in it

Related topics