Grok patterns with quotation marks

Good Evening, I'm trying to extrapolate some fields from the following log message:

{
"message": "Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta",
"level": "Error",
"logType": "Default",
"timeStamp": "2021-01-01T01:00:38.706009+01:00",
"fingerprint": "bf929041-9964-496c-8eee-e1ba7ce4e42a",
"windowsIdentity": "GANIT\RVDI001",
"machineName": "CL-W10RBT-003",
"processName": "CorrettaAssunzioneANIA_Worker_Win10Produzione",
"processVersion": "1.0.86",
"jobId": "389178ee-a728-44be-8ef1-511060465a5e",
"robotName": "001-VDI-Produzione",
"machineId": 22,
"fileName": "3.Emissione_Nuova_Proposta",
"transactionId": "81152118-8b17-4a3f-aa96-a3e92de402f5",
"queueName": "CorrettaAssunzioneANIA_PROD_INPUT"
}

Because of the quotation marks, in the grok I need to use this syntax for the patterns:

"%{GREEDYDATA:message}"

"%{UUID:finger}"
...

and so on.

Aniway, when I try to insert these patterns sequentially on my grok debugger, like:

"%{GREEDYDATA:message}" "%{UUID:finger}"

I recieve back an error. The error is not present in the case of a log without quotation marks, so I think the problem are the quotation mark theirself. How can I insert all these patterns sequentially without having error? There is a code to set "a space" between the patterns?

There is even another problem:

Using "%{GREEDYDATA:message}" i get the following field:

"message": "message": "Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta"

as you can see, i obtain not only the content of the message but eveng the string " message:"

How can I avoid this problem?

Thank you very much!

Hey,

Have you tried to put the grok expressions inside quotes like follow :

"%{PATTERN01} %{PATTERN02}"

Hi!

Yes, I tried, but it didn't work :frowning:

Hi,

If you are trying to get a full line like :
"message": "Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta"

The you have to update your pattern like this :

In the grok debugger : "%{GREEDYDATA:message}": "%{UUID:finger}" # ':' is missing
In the logstash file : "\"%{GREEDYDATA:message}\": \"%{UUID:finger}\"" # add backslash in front of quotes

But this is not optimal due to the use of GREEDYDATA so i recommand to do:

In the grok debugger : %{QUOTEDSTRING:key}":%{SPACE}%{QUOTEDSTRING:value} 
# use quotestring instead of greedydata
# use %{space} to find any type of space (and null space) between the key and the value
In the logstash file : "%{QUOTEDSTRING:key}:%{SPACE}%{QUOTEDSTRING:value}"
# possibility to remove quotes with mutate -> gsub

More about grok syntaxe here

But, I think, in your case using grok is not the best choice because your data seems to be write in JSON so you can use the JSON filter.

And even if datas are not write in JSON you have the key-value filter who can be more efficient in your case.

Cad

Hi, thank you for you answer.

I tried your solution but in the grok debugger i obtained the following result:

{
"finger": "bf929041-9964-496c-8eee-e1ba7ce4e42a",
"message": "fingerprint"
}

so the GREEDYDATA pattern didn't read the message, but insted it reads "fingerprint".
The version containing QUOTEDSTRING results in an error due to the fact that "Provided Grok patterns do not match data in the input".

I'll try to study about JSON filter, in the meantime can you suggest me an entire grok sequence to parsing all the contents of my log in logstash and grok debugger? I'm new to the logstash and I'm having a lot of trouble to learn how to deal with the log parsing.

About the message: yes i want to keep all the line but not "message:", because it is just the name of the fileld and should not compare in the string.

See you soon.

Yes because the only line who contrains a UUID is thefingerprint one.

I did a mistake for the version QUOTEDSTRING in the debugger.
The code i write :

In the grok debugger : %{QUOTEDSTRING:key}":%{SPACE}%{QUOTEDSTRING:value} 

Must be :

In the grok debugger : %{QUOTEDSTRING:key}:%{SPACE}%{QUOTEDSTRING:value} 
# I remove the ' " ' after the first QUOTEDSTRING.

The QUOTEDSTRING pattern altready do that job (in the logstash file, the debugger return only the first match). In the logstash file, he read the logstash input Line by Line, put the first value in quote in the field key and the second one in the field value

You can do that with a condition before the grok filter or a condition into the output part of the logstash file.

Cad.

Hi Cad, thank you :slightly_smiling_face::

Now i'm obtaining the following result:

"message": ""Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta"",
"key": ""message""

So message correctly contain the string :slightly_smiling_face:. I don't know why it contains even "\ around the text (a backslash and too many quotation marks). Is possible to avoid it?

Next, I would like to insert in my grok debugger even the rest of the patterns, but when they are putted together again it give me back an error, how can I insert multiple patterns in the correct way? For example i'm tryng to put:

%{QUOTEDSTRING:key}:%{SPACE}%{QUOTEDSTRING:message}: (-|"(%{LOGLEVEL:level})"): (-|"(%{TIMESTAMP_ISO8601:logtime})"): "%{UUID:finger}"

You can verify that it don't run...

The following image show the result obtained for the message (value):

It is because the QUOTEDSTRING get the string including the quotes so when the debugger print the string present in the message field, it place backslash before each quote.

"message": "this is a quote:\", three quotes in a raw:\"\"\""
# first and last quotes is here to show that the value is a string
# each \" is to show that the string also contains quotes

As I said in the last response

The debugger only return the first match but all the values are match if you run it into logstash. Try to remove line "message": "Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta", into the debugger and you wile see that the second line match the pattern.

So with your values and a use of the QUOTEDSTRING pattern in the logsatsh file, it gives you :

{
"message": "\"Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta\"",
"key": "\"message\""
} # output 1
{
"message": "\"Error\"",
"key": "\"level\""
} # output 2
{
"message": "\"Default\"",
"key": "\"logType\""
} # output 3
etc...

If you want to create a custom field name for each line and get all the values at one time, you have to create multiple pattern seperate by line break and write it all in the logstash file.
Your logstash config file will have configuration like this :

filter {
  grok {
    match => [
         "message",
         # First possiblity : line break print with \n
         "%{QUOTEDSTRING:key}:%{SPACE}%{QUOTEDSTRING:message},[\n]\"level\": \"(%{LOGLEVEL:level}\",[\n]\"logType\": \"%{WORD:type_log}\",[\n]\"timeStamp\": \"%{TIMESTAMP_ISO8601:logtime}\", etc..."
         # Second possibility : line break print with line break ^^
         "%{QUOTEDSTRING:key}:%{SPACE}%{QUOTEDSTRING:message},
\"level\": \"%{LOGLEVEL:level}\",
\"logType\": \"%{WORD:type_log}\",
\"timeStamp\": \"%{TIMESTAMP_ISO8601:logtime}\", etc..."
    ]
  }
# + add a gsub to all the field to remove the quotes
}

With this two patterns, the output gonna look exactly like the input take by logsatsh.

{
  "message": "Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta",
  "level": "Error",
  "type_loh": "Default",
  "timeStamp": "2021-01-01T01:00:38.706009+01:00"
  etc ...
} # all in one output

I maintain my idea to use the JSON filter or the KV filter instead of the grok pattern.
In your case the configuration of the kv filter should be :

filter {
  kv {
    source => "message",
    field_split => "," # all the key-value pair are seperate by a ','
    value_split => ":" # a key and a value are seperate by a ':'
    remove_char_key => "\"" # remove the quotes in the keys
    remove_char_value => "\"" # remove the quotes in the values
  }
}
# With that, each value on the left of a ':' became a field
# and the value they contains is the string on the right of the ':'
# I don't tried it maybe needed to be changed for working.

Cad

1 Like

Good morning Cad. I thank you again for the complete response.

I spend the weekend studying about the Json filter and the KV filter and I must say that you're right: them seems to be the best choice in my case. I'll try to use even the grok way in order to learn it better.

About your suggestion: I created my conf file as the following:

input {
file {
start_position => "beginning"
path => "C:/Users/Valerio/Desktop/JASON/logs.txt"
sincedb_path => "nul"
}
}
filter {
kv {

source => "message"
field_split => "," 
value_split => ":" 
remove_char_key => " \" " 
remove_char_value => " \" " 

}
}
output { elasticsearch {
hosts => "http://localhost:9200"
index => "logskv"
}
stdout {}
}

Anyway logstash seems to create only the field named "message" in wich any row contain simply the collage of the key and the value of the TimeStamp field of my log. No others fields are created. I tried Even the Json filter way using this config file:

input {
file {
start_position => "beginning"
path => "C:/Users/Valerio/Desktop/JASON/logs.txt"
sincedb_path => "nul"
}
}
filter {
json {
source => "message"
}
}
output { elasticsearch {
hosts => "http://localhost:9200"
index => "logs"
}
stdout {}
}

But i obtained the same result: no fields are created, except for the message field, in wich in that case there is the collage of all the fields (as shown in the image). Of course i'm searching online a solution but the content of the config file seems to be right for my case. Do you have an idea? I show you the import obtained on Kiabana in both cases:

Regards,

qttv

Hi

Your input data came from a file. According to the file input documentation, "By default, each event is assumed to be one line and a line is taken to be the text before a newline character. If you would like to join multiple log lines into one event, you’ll want to use the multiline codec."

And if you take data line by line the KV and JSON filter will not be effective. So, i think you have to implement the multiline codec to get all the values in one event

Cad

It sames odd, i followeed tutorials like this How to use Logstash to parse and import JSON data into Elasticsearch - YouTube and the procedure should work for my log. Using the json filter logstash should recognise the fields and generate it. In my case this is non happening... it just create the message fields putting inside it all the rows. Gonna out of head... :persevere:

His JSON is not the same as your.
In his case, one line = one complet json.
In you case, one line = one element of the complet json.

So you have to :
Remove all line break to have all the json on one line.
Or stay with the json on multiple line but with the multiline codec in the logstash.

Cad.

1 Like

Well I modified my conf file like this:

input {
file {
codec => multiline {
pattern => "{"
negate => true
what => previous
}
start_position => "beginning"
path => "C:/Users/Valerio/Desktop/JasLog/logs.txt"
sincedb_path => "nul"
}
}
filter {
json {
source => "message"
}
}
output { elasticsearch {
hosts => "http://localhost:9200"
index => "logs"
}
stdout {}
}

It give me no errors but when i launch logstash it say "Pipeline Running" but send no data to elasticsearch. This problem is solved if I remove the multiline codec. Do you find an error in my conf file? I can't find a way to fix it at the moment. May be the pattern choosed to separate the multilines is wrong?

What contains the file logs.txt ?
Only one JSON ? Multiple Json ? Text + JSON ?

Cad.

The JASON is the same as above, Actually i'm working on a .txt file whit just one log event in order to semplify the problem:

{
"message": "Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta",
"level": "Error",
"logType": "Default",
"timeStamp": "2021-01-01T01:00:38.706009+01:00",
"fingerprint": "bf929041-9964-496c-8eee-e1ba7ce4e42a",
"windowsIdentity": "GANIT\RVDI001",
"machineName": "CL-W10RBT-003",
"processName": "CorrettaAssunzioneANIA_Worker_Win10Produzione",
"processVersion": "1.0.86",
"jobId": "389178ee-a728-44be-8ef1-511060465a5e",
"robotName": "001-VDI-Produzione",
"machineId": 22,
"fileName": "3.Emissione_Nuova_Proposta",
"transactionId": "81152118-8b17-4a3f-aa96-a3e92de402f5",
"queueName": "CorrettaAssunzioneANIA_PROD_INPUT"
}

I desire to import this JSON and create the 15 fields that you can read at the left of ":"

Then you can find a multiline codec for read a complete file here.

Cad.

Good morning Cad. I finally imported my csv containing Jason Logs using this code:

input { 
 file {  
	 codec => multiline {
 pattern => ";"
 negate => true
 what => previous
	 } 	 
	 start_position => "beginning"
	 path => "C:/Users/Valerio/Desktop/JASON/Logs.csv"
	 sincedb_path => "nul"
	 }
 } 
filter {
  kv {
source => "message"
field_split => "," 
value_split => ":" 
remove_char_key => "\"" 
remove_char_value => "\"" 
  }
  mutate { 
 remove_field => [ "message", "path", "host", "@timestamp", "@version", "tags"]
 }
} 
output { elasticsearch { 
hosts => "http://localhost:9200"
index => "logs"
	} 
	stdout {}
} 

As you can see i used a kv plugin in order to create my 15 fields deriving from the content of the Json logs.

As pattern separator I used the ";" because my csv file is separated by it. In this way I parsed correctly all the log but the last one is omitted. In fact, during the index creation on elasticsearch i can choose between two different TimeStamp, the first relative to the first 499 rows of my file, the second relative to the last row of it (row n. 500). The structure in the Jason is the same in all the rows of the csv so i don't understand why the last one is used differently. Can you suggest something?

Secondly, my original csv contains not only Jason logs but even ordinary columns to import. How can i import a csv containing even and not only multiline logs in it?

Hi,

In my first response, I recommanded you to use KV if the data are not in the JSON format. But it is the state, so why don't just implement the JSON filter.

I don't understand your problem about the timeStamp do you have example ?

Same answer about the "ordinary columns" can you give me an example ?

Cad.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.