KV filter and fields separator

Good morning. I'm using a KV filter in logstash to parse the content of a JSON log. I have to use a comma "," as separator between different fileds, but there is some values wich contains string with commas in the text, so these strings are splitted due to the separator choosed. How can i fix the problem? The JSON file is like this:

{
"message": "Saldo in Saldo Rielaborato In43,93",
"level": "Trace",
"logType": "User",
"timeStamp": "2021-01-01T05:31:40.6407221+01:00",
"fingerprint": "35a67567-dfa3-4b0c-9a1f-7b2413a4b9fa",
"windowsIdentity": "GANIT\RVDI004",
"machineName": "CL-W10RBT-005",
"processName": "SollecitiSaldiCassa3_Win10Produzione",
"processVersion": "1.0.7656.19248",
"jobId": "716d530c-39f3-49d8-b090-519258418fa0",
"robotName": "004-VDI-Produzione",
"machineId": 23,
"fileName": "03.Rileva_Tabella_Nexus"
}

as you can see, the first field is "message" and contains a comma...

Thanks

Why are you using a kv filter instead of a json filter?

I'm using even the Json filter. My conf file is the following:

input { 
     file {  
	 codec => multiline {
     pattern => ";;"
     negate => true
     what => previous
	 } 	 
	 start_position => "beginning"
	 path => "C:/Users/Valerio/Desktop/JASON/Logs.csv"
	 sincedb_path => "nul"
	 } 
 } 
filter {
    json {
	       source => "message"
	}
  kv {
    source => "message"
    field_split => "," 
    value_split => ":" 
    remove_char_key => "\"" 
    remove_char_value => "\"" 
  }
  mutate { 
     remove_field => [ "message", "path", "host", "@timestamp", "@version", "tags"]
 }
} 
output { elasticsearch { 
hosts => "http://localhost:9200"
index => "logs"
	} 
	stdout {}
}

If you are using a json filter then why use a kv filter?

Yes i can remove the json filter (done), but i need the kv filter to import my logs like shown in the following:

I show you even the aspect of the csv file on wich i'm working:

"{
""message"": ""Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta"",
""level"": ""Error"",
""logType"": ""Default"",
""timeStamp"": ""2021-01-01T01:00:38.706009+01:00"",
""fingerprint"": ""bf929041-9964-496c-8eee-e1ba7ce4e42a"",
""windowsIdentity"": ""GANIT\RVDI001"",
""machineName"": ""CL-W10RBT-003"",
""processName"": ""CorrettaAssunzioneANIA_Worker_Win10Produzione"",
""processVersion"": ""1.0.86"",
""jobId"": ""389178ee-a728-44be-8ef1-511060465a5e"",
""robotName"": ""001-VDI-Produzione"",
""machineId"": 22,
""fileName"": ""3.Emissione_Nuova_Proposta"",
""transactionId"": ""81152118-8b17-4a3f-aa96-a3e92de402f5"",
""queueName"": ""CorrettaAssunzioneANIA_PROD_INPUT""
}";;

I have to fix three problems:

  • Some "message" filed contains string whit commas, but the comma is my separator in the kv filter.

  • I have 500 logs in my csv file and Logstash import just 499 of them. The other one is associated to a separate "timestamp" in my Elastic Search and i don't know why,

  • As you can see from the image, the message field is nominated " };; { message " and i desire to clean the };; {

Please post the first 3 lines of you CSV perhaps we can help otherwise we are just guessing.

1 Like

Here we go:

"{
  ""message"": ""Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta"",
  ""level"": ""Error"",
  ""logType"": ""Default"",
  ""timeStamp"": ""2021-01-01T01:00:38.706009+01:00"",
  ""fingerprint"": ""bf929041-9964-496c-8eee-e1ba7ce4e42a"",
  ""windowsIdentity"": ""GANIT\\RVDI001"",
  ""machineName"": ""CL-W10RBT-003"",
  ""processName"": ""CorrettaAssunzioneANIA_Worker_Win10Produzione"",
  ""processVersion"": ""1.0.86"",
  ""jobId"": ""389178ee-a728-44be-8ef1-511060465a5e"",
  ""robotName"": ""001-VDI-Produzione"",
  ""machineId"": 22,
  ""fileName"": ""3.Emissione_Nuova_Proposta"",
  ""transactionId"": ""81152118-8b17-4a3f-aa96-a3e92de402f5"",
  ""queueName"": ""CorrettaAssunzioneANIA_PROD_INPUT""
}";;
"{
  ""message"": ""Eccezione di business"",
  ""level"": ""Warning"",
  ""logType"": ""User"",
  ""timeStamp"": ""2021-01-01T01:00:38.7216107+01:00"",
  ""fingerprint"": ""96d59d87-deb0-406b-9db2-b44e4aca554a"",
  ""windowsIdentity"": ""GANIT\\RVDI001"",
  ""machineName"": ""CL-W10RBT-003"",
  ""processName"": ""CorrettaAssunzioneANIA_Worker_Win10Produzione"",
  ""processVersion"": ""1.0.86"",
  ""jobId"": ""389178ee-a728-44be-8ef1-511060465a5e"",
  ""robotName"": ""001-VDI-Produzione"",
  ""machineId"": 22,
  ""fileName"": ""Corretta_Assunzione_Worker"",
  ""transactionId"": ""81152118-8b17-4a3f-aa96-a3e92de402f5"",
  ""queueName"": ""CorrettaAssunzioneANIA_PROD_INPUT""
}";;
"{
  ""message"": ""Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta"",
  ""level"": ""Error"",
  ""logType"": ""Default"",
  ""timeStamp"": ""2021-01-01T01:03:35.1546269+01:00"",
  ""fingerprint"": ""5f7b1c28-8f81-4cd2-afff-37a6f893ea4a"",
  ""windowsIdentity"": ""GANIT\\RVDI001"",
  ""machineName"": ""CL-W10RBT-003"",
  ""processName"": ""CorrettaAssunzioneANIA_Worker_Win10Produzione"",
  ""processVersion"": ""1.0.86"",
  ""jobId"": ""389178ee-a728-44be-8ef1-511060465a5e"",
  ""robotName"": ""001-VDI-Produzione"",
  ""machineId"": 22,
  ""fileName"": ""3.Emissione_Nuova_Proposta"",
  ""transactionId"": ""1fad05cb-9588-484b-b5de-a52c5e9fd29c"",
  ""queueName"": ""CorrettaAssunzioneANIA_PROD_INPUT""
}";;

That is not a CSV File.... Is that the raw file or after you have processed...
Also I notice "" around the fields is that correct?

It is the csv how it appear if opened as a text file, posting directly from the CSV you find:

First Row Content:

{
  "message": "Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta",
  "level": "Error",
  "logType": "Default",
  "timeStamp": "2021-01-01T01:00:38.706009+01:00",
  "fingerprint": "bf929041-9964-496c-8eee-e1ba7ce4e42a",
  "windowsIdentity": "GANIT\\RVDI001",
  "machineName": "CL-W10RBT-003",
  "processName": "CorrettaAssunzioneANIA_Worker_Win10Produzione",
  "processVersion": "1.0.86",
  "jobId": "389178ee-a728-44be-8ef1-511060465a5e",
  "robotName": "001-VDI-Produzione",
  "machineId": 22,
  "fileName": "3.Emissione_Nuova_Proposta",
  "transactionId": "81152118-8b17-4a3f-aa96-a3e92de402f5",
  "queueName": "CorrettaAssunzioneANIA_PROD_INPUT"
}

Second Row Content:

{
  "message": "Eccezione di business",
  "level": "Warning",
  "logType": "User",
  "timeStamp": "2021-01-01T01:00:38.7216107+01:00",
  "fingerprint": "96d59d87-deb0-406b-9db2-b44e4aca554a",
  "windowsIdentity": "GANIT\\RVDI001",
  "machineName": "CL-W10RBT-003",
  "processName": "CorrettaAssunzioneANIA_Worker_Win10Produzione",
  "processVersion": "1.0.86",
  "jobId": "389178ee-a728-44be-8ef1-511060465a5e",
  "robotName": "001-VDI-Produzione",
  "machineId": 22,
  "fileName": "Corretta_Assunzione_Worker",
  "transactionId": "81152118-8b17-4a3f-aa96-a3e92de402f5",
  "queueName": "CorrettaAssunzioneANIA_PROD_INPUT"
}

Third Row Content:

{
  "message": "Throw: Il prodotto estratto non è presente tra quelli del menu a tendina in fase di emissione della proposta",
  "level": "Error",
  "logType": "Default",
  "timeStamp": "2021-01-01T01:03:35.1546269+01:00",
  "fingerprint": "5f7b1c28-8f81-4cd2-afff-37a6f893ea4a",
  "windowsIdentity": "GANIT\\RVDI001",
  "machineName": "CL-W10RBT-003",
  "processName": "CorrettaAssunzioneANIA_Worker_Win10Produzione",
  "processVersion": "1.0.86",
  "jobId": "389178ee-a728-44be-8ef1-511060465a5e",
  "robotName": "001-VDI-Produzione",
  "machineId": 22,
  "fileName": "3.Emissione_Nuova_Proposta",
  "transactionId": "1fad05cb-9588-484b-b5de-a52c5e9fd29c",
  "queueName": "CorrettaAssunzioneANIA_PROD_INPUT"
}

Ok sorry but 2 the samples are different

to be clear the RAW file is the First one here with the }";; and the "" around the fields is that correct?

Yes, it is the aspect of the CSV file if opened with Notepad, where it present the double quotation marks around the entire logs and the double " ;; " at the end of them (don't know why) where " ; " is the separator between CSV lines. Anyway i upload the CSV file (the last posted) in my config file.

About the origin... these logs where located in an Excel file wich i converted in CSV.

OK try this...

input { 
  file {  
	   codec => multiline {
     pattern => ";;"
     negate => true
     what => next
	 } 	 
	 start_position => "beginning"
	 path => "/Users/sbrown/workspace/sample-data/misc/quoted-json.txt"
	 sincedb_path => "/dev/null" <!- Fix this back 
	 } 
 } 


filter {
 # Get rid of leading and trailing characters
  grok {
    match => { "message" => "\"%{DATA:message_detail}\";;" }
    }

  # Clean Up the message_detail
  mutate {
    gsub => [
      "message_detail", '\n', '',
      "message_detail", '""', '"'
    ]
  }

  json {
        source => "message_detail"
  }

  } 

output { 
  # elasticsearch { 
  #  hosts => "http://localhost:9200"
  #  index => "logs"
	# } 
  stdout {codec => rubydebug}

}

Output I left the message_detail so you can see what I did.

{
              "level" => "Warning",
           "@version" => "1",
        "fingerprint" => "96d59d87-deb0-406b-9db2-b44e4aca554a",
               "host" => "ceres",
     "message_detail" => "{  \"message\": \"Eccezione di business\",  \"level\": \"Warning\",  \"logType\": \"User\",  \"timeStamp\": \"2021-01-01T01:00:38.7216107+01:00\",  \"fingerprint\": \"96d59d87-deb0-406b-9db2-b44e4aca554a\",  \"windowsIdentity\": \"GANIT\\\\RVDI001\",  \"machineName\": \"CL-W10RBT-003\",  \"processName\": \"CorrettaAssunzioneANIA_Worker_Win10Produzione\",  \"processVersion\": \"1.0.86\",  \"jobId\": \"389178ee-a728-44be-8ef1-511060465a5e\",  \"robotName\": \"001-VDI-Produzione\",  \"machineId\": 22,  \"fileName\": \"Corretta_Assunzione_Worker\",  \"transactionId\": \"81152118-8b17-4a3f-aa96-a3e92de402f5\",  \"queueName\": \"CorrettaAssunzioneANIA_PROD_INPUT\"}",
               "tags" => [
        [0] "multiline"
    ],
    "windowsIdentity" => "GANIT\\RVDI001",
        "machineName" => "CL-W10RBT-003",
     "processVersion" => "1.0.86",
              "jobId" => "389178ee-a728-44be-8ef1-511060465a5e",
        "processName" => "CorrettaAssunzioneANIA_Worker_Win10Produzione",
         "@timestamp" => 2021-05-18T15:08:43.509Z,
           "fileName" => "Corretta_Assunzione_Worker",
            "logType" => "User",
          "robotName" => "001-VDI-Produzione",
          "queueName" => "CorrettaAssunzioneANIA_PROD_INPUT",
            "message" => "Eccezione di business",
          "machineId" => 22,
               "path" => "/Users/sbrown/workspace/sample-data/misc/quoted-json.txt",
          "timeStamp" => "2021-01-01T01:00:38.7216107+01:00",
      "transactionId" => "81152118-8b17-4a3f-aa96-a3e92de402f5"
}

This is my first pass I suspect @badger perhaps could make it better

When launched it seems to work fine in the prompt but i can't find anything in the elastic search. There is an explanation?

The elasticsearch output is commented out.....
Plus you probably need to clean up any old 'logs' index you already have

Sorry, I've understood in delay your answer. I'll fix the commented portion and let you know :slight_smile:

Please don't post images... They can not be viewed by some not searched on etc.

Did you uncomment

  # elasticsearch { 
  #  hosts => "http://localhost:9200"
  #  index => "logs"
  # } 

I just ingested it was fine... Also I notice that you have \r in your data so you may need to clean that up in the gsub

Can logstash reach elasticsearch? Do you see logstash connect to elasticsearch in the logstash startup scripts

You can figure this out...

Well it works perfectly Stephen, the logs are wonder imported. I thank you so much!! Can you explain me better the useful of your modification on my config file? I'd like to learn about better :heart:

Your welcome... I would read the docs on grok, gsub etc if you want to learn.

My first approach to the import was excacly the grok, i worked a lot on the grok debugger in order to import my log contents but i didn't find a way. It surprise me that you used a grok. Did it remove the extra quotation marks and commas? Even the json seems to be not a good way without grok, so i used the kv filter.

I even have difficulty to understand this part:

 "message_detail", '\n', '',
 "message_detail", '\"\"', '"

What is the syntax logic in it?

I used grok to pull of the leading and trailing characters I did not want.

Then I used gsub to clean up the characters in the JSON.

Then I used JSON.

gsub = search and replace

  mutate {
    gsub => [
      "message_detail", '\n', '',
      "message_detail", '\"\"', '"'
    ]
  }

In message_detail replace all \n with nothing

In message_detail replace all \"\" with nothing with just a "

I recommend looking at the logstash docs for all the cool filters etc. ...

Good Luck on the Parsing!