Json of varying length and multiline

Shipper01 · July 22, 2019, 5:09pm

I'm very new to everything elastic, logstash and json. I was able to stand everything up and ingest one set of logs and query on them in kibana. My second data set is a json file whose fields vary from message to message. There are a lot of fields, about 176, but a single record which is multilined may only have 30 of the fields represented. My first question is will a records ingest into elastic stack if the field is in the index but others in the index are not present, or do I need to add the null fields? I haven't been able to index any docs for this json source.

The second question is how do I represent the multiline pattern in my config file. I have read many of the multiline posts, but I just can't seem to generalize their solutions to meet my need. I put message in the filter thinking that is some generic value that denotes a message. Does this value have to be some field structure that is present in the json? Here is my current config file:

input {
  file {
    type => "json"
    path => "C:/ESTK/*.json"
    codec => multiline {
    pattern => "}}"
    negate => "false"
    what => "previous"
    }
    start_position => "beginning"
    sincedb_path => "NUL"
    }
}
filter {
    json {source => "message"}
}
output {
   elasticsearch {
     action => "index"
     hosts => "http://localhost:9200"
     index => "gt"
  }
}

I appreciate any help with my issues.

Badger · July 22, 2019, 5:28pm

To answer your first question, no, you do not need to add null values for fields that are missing from some events.

Please edit your post, select the configuration and click on </> in the toolbar above the edit panel. You should see the text appear in the preview panel on the right

    like 
this
    with indentation preserved.

Thirdly, can you show us what the json looks like. It's impossible to suggest what multiline pattern you should use if we do not know what the data looks like. You can sanitize all the values if you want.

Shipper01 · July 23, 2019, 9:08am

Firstly, thank you for the insight regarding excluded fields in a record. I didn't mention before, but I am using es 7.2 and logstash 7.2 I have adjusted my post. Here is a sample json event:

  {
	"THRUSHPLUS": {
		"BILL_OF_LADING": "GBHTY7755563",
		"CARGO_DESCRIPTION": "33 PALLET, CEREALS|",
		"CARRIER": "ACL",
		"CONTAINER_NUMBER": "TOLU9813600",
		"DESTINATION_LRT_CD": "UNECE",
		"DESTINATION_RAW_INFO": "GBLIV",
		"DISCHARGE_PORT_ETA": "2019-07-03 00:00:00",
		"DISCHARGE_PORT_LRT_CD": "UNECE",
		"DISCHARGE_PORT_RAW_INFO": "GBLIV",
		"EQUIPMENT_STATUS": "IMPORT",
		"FCL_LCL": "FCL",
		"FULL_EMPTY": "FULL",
		"ISO_CONTAINER_CODE": "20GP",
		"LLOYDS_NO": "9670573",
		"LOAD_PORT_LRT_CD": "UNECE",
		"LOAD_PORT_RAW_INFO": "NLRTM",
		"MANIFEST_QUANTITY": "1",
		"MANIFEST_QUANTITY_UOM": "LT",
		"MESSAGE_TYPE": "DISCHARGE",
		"RAW_FILE_NAME": "1790180309",
		"SEALING_PARTY": "CA",
		"SEAL_NUMBER": "43790217",
		"SHIPPER_CODE": "WT782",
		"DATA_REF": "OPEN",
		"TIME_FRAME": "20180903",
		"VESSEL_CALL_SIGN": "2ITA4",
		"VESSEL_NAME": "ATLANTIC STAR",
		"VOYAGE_REF": "762E"
	}
}

Badger · July 23, 2019, 1:03pm

If you JSON is pretty-printed like that then you can make your pattern look for the non-indented } that ends the object

codec => multiline { pattern => "^}" negate => true what => next auto_flush_interval => 1 }

Shipper01 · July 23, 2019, 7:40pm

I haven't been able to load any data. I thought that maybe my mapping was not correct. I did several things this afternoon:

I simplified the json event to not be pretty-printed and to no longer be nested under the THRUSHPLUS field.
I altered my es index map to conform to the new format.
I altered my config file to suit the non-nest pretty-print format.

I am unsure of how to use the filter now. It was so easy with the CSV data that I have in es. I really don't know what to make of the json { source => XXXXX} filter. There is no single field under which the message data lies. I have have embedded the new message format and conf file.

input {
  file {
    type => "json"
    path => "C:/ESTK/*.json"
    start_position => "beginning"
    sincedb_path => "NUL"
    }
}
filter {
    json {source => "message"}
}
output {
   elasticsearch {
     action => "index"
     hosts => "http://localhost:9200"
     index => "gt"
  }
}

{"BARGE_CODE":"PAU2752","CARGO_DESCRIPTION":"LIRIODENDRON TULIPIFERA KD LUMBER LIRIODENDRON TULIPIFERA KD LUMBER HS CODE: 4407.97|.","CARRIER":"QRT","CHECK_DIGIT_CALCULATED":"0","CHECK_DIGIT_VALID":"TRUE","VESSEL_NAME":"MV BRIGADOON","VOYAGE_REF":"92W","WEIGHT_1":"21850"}

Badger · July 23, 2019, 8:51pm

If that JSON is a single line from the file then you will get an event that looks like

           "VESSEL_NAME" => "MV BRIGADOON",
"CHECK_DIGIT_CALCULATED" => "0",
            "VOYAGE_REF" => "92W",
              "WEIGHT_1" => "21850",
     "CARGO_DESCRIPTION" => "LIRIODENDRON TULIPIFERA KD LUMBER LIRIODENDRON TULIPIFERA KD LUMBER HS CODE: 4407.97|.",
     "CHECK_DIGIT_VALID" => "TRUE",
               "CARRIER" => "QRT",
            "BARGE_CODE" => "PAU2752"

What do you not like about that?

Shipper01 · July 23, 2019, 9:21pm

I like it, but it doesn't seem to go into es. I do a doc count and it shows 0.

Shipper01 · July 23, 2019, 9:22pm

So my config is ok? Fyi, i don't have a field called message. Is this a problem?

Badger · July 23, 2019, 9:54pm

If you are using a file input like that then each event will have a field called message and it will contain a single line from the file.

Shipper01 · July 24, 2019, 9:37am

ok, so if I want to have all the data in the single line json in es as a single event message with all the fields, I should change the json format to an array and the es map as well?

Shipper01 · July 24, 2019, 10:07am

What I am trying to manipulate is to have that input file shown above to ingest as a single event or doc in es. That is if event and doc are equivalent. I control the structure of the input json, so I can adjust it if need be.

Badger · July 24, 2019, 12:04pm

With the configuration you posted, each line from the file will be a single event in logstash and a single document in elasticsearch.

Shipper01 · July 24, 2019, 12:04pm

I now don't believe that I want an array, but a grouping of objects (that single json record above as a single doc in es.

Shipper01 · July 24, 2019, 3:39pm

I still can't get data to go into es

I have made some changes:

event file looks like this:

{"THRUSHPLUS": {"AGENT_ADDRESS_1": "","BARGE_CODE": "","BILL_OF_LADING": "","BL_ISSUING_AGENT": "","BOL_TYPE": "","CARGO_DESCRIPTION": "","CARGO_VOLUME": "1","CARRIER": "","CONTAINER_NUMBER": "","CONTAINER_TYPE": "","DISCHARGE_PORT_ATA": "2019-07-22 23:23:23","DISCHARGE_PORT_CITY": "","DISCHARGE_PORT_COUNTRY": "","DISCHARGE_PORT_ETA": "2019-07-22 23:23:23","DISCHARGE_PORT_LRT_CD": "","DISCHARGE_PORT_RAW_INFO": "","EQUIPMENT_STATUS": "","FCL_LCL": "","HAZMAT_CLASS": "","HAZMAT_CODE": "","HAZ_DESCRIPTION": "","HAZ_FLAG": "","ISO_CONTAINER_CODE": "","LOAD_PORT_ATD": "2019-07-22 23:23:23","LOAD_PORT_CITY": "","LOAD_PORT_COUNTRY": "","LOAD_PORT_ETD": "2019-07-22 23:23:23","LOAD_PORT_LRT_CD": "","LOAD_PORT_RAW_INFO": "","TARE_WEIGHT": "1","TEMP_CONTROLLED": "","TO_COUNTRY": "","TO_UNECE": "","TRAIN_CODE": "","TRUCK_CODE": "","VESSEL_NAME": "","VOYAGE_REF": "","WEIGHT_1": "1","WEIGHT_2": "1"}}

My logstash config looks like this:

input {
  file {
    type => "json"
    path => "C:/ESTK/*.json"
    start_position => "beginning"
    sincedb_path => "NUL"
    }
}
filter {
    json {source => "THRUSHPLUS"}
}
output {
   elasticsearch {
     action => "index"
     hosts => "http://localhost:9200"
     index => "gt"
  }
}

Though I have also tried substituting "message" for THRUSHPLUS.

This is my index map for es:

PUT gt
{
	"settings" : {
		"number_of_shards" : 1
	},
	"mappings" : {
		"properties" : {
			"CARGOLINKPLUS" : {
				"properties" : {
					"BARGE_CODE": { "type" : "text"},
					"BOL_TYPE": { "type" : "text"},
					"CARGO_DESCRIPTION": { "type" : "text"},
					"CARGO_VOLUME": { "type" : "float"},
					"CARRIER": { "type" : "text"},
					"CONTAINER_NUMBER": { "type" : "text"},
					"CONTAINER_TYPE": { "type" : "text"},
					"DISCHARGE_PORT_ATA": { "type" : "date","format": "yyyy-MM-dd HH:mm:ss"},
					"DISCHARGE_PORT_CITY": { "type" : "text"},
					"DISCHARGE_PORT_COUNTRY": { "type" : "text"},
					"DISCHARGE_PORT_ETA": { "type" : "date","format": "yyyy-MM-dd HH:mm:ss"},
					"DISCHARGE_PORT_LOC_ID": { "type" : "text"},
					"DISCHARGE_PORT_LRT_CD": { "type" : "text"},
					"DISCHARGE_PORT_RAW_INFO": { "type" : "text"},
					"EQUIPMENT_STATUS": { "type" : "text"},
					"FCL_LCL": { "type" : "text"},
					"HAZMAT_CLASS": { "type" : "text"},
					"HAZMAT_CODE": { "type" : "text"},
					"HAZ_DESCRIPTION": { "type" : "text"},
					"HAZ_FLAG": { "type" : "text"},
					"ISO_CONTAINER_CODE": { "type" : "text"},
					"LOAD_PORT_ATD": { "type" : "date","format": "yyyy-MM-dd HH:mm:ss"},
					"LOAD_PORT_CITY": { "type" : "text"},
					"LOAD_PORT_COUNTRY": { "type" : "text"},
					"LOAD_PORT_ETD": { "type" : "date","format": "yyyy-MM-dd HH:mm:ss"},
					"LOAD_PORT_LOC_ID": { "type" : "text"},
					"LOAD_PORT_LRT_CD": { "type" : "text"},
					"LOAD_PORT_RAW_INFO": { "type" : "text"},
					"TARE_WEIGHT": { "type" : "integer"},
					"TEMP_CONTROLLED": { "type" : "text"},
					"TRAIN_CODE": { "type" : "text"},
					"TRUCK_CODE": { "type" : "text"},
					"VESSEL_NAME": { "type" : "text"},
					"VOYAGE_REF": { "type" : "text"},
					"WEIGHT_1": { "type" : "float"},
					"WEIGHT_2": { "type" : "float"}
				}
			}
		}
	}
}

I just get no data going in.

Badger · July 24, 2019, 3:57pm

What happens if you delete the mapping?

Shipper01 · July 24, 2019, 9:35pm

Sorry for the delay. The stack crashed. Rebuilt it and will respond in the AM.

Shipper01 · July 29, 2019, 2:21pm

Sorry for the delay. Deleting the mapping enabled es to index the data. That meant that the mapping and the data weren’t matched, so I reevaluated the mapping. Some extra spaces were found in the mapping. Have redone the mapping to ensure the data types desired are defined. Recreated the index and data is indexing. Thank you.

system · August 26, 2019, 2:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiline json data into logstash Logstash	1	216	May 18, 2023
Parse multi line json Logstash	12	4953	August 25, 2020
How to Ingest MultiLine Json file into ElasticSearch using Logstash Pipeline Logstash	1	212	March 27, 2023
How to parse the multiline json file through logstash Logstash	7	18337	July 6, 2017
File input JSON with multiline codec plugin Logstash	7	3259	December 3, 2019

Json of varying length and multiline

Related topics