Json of varying length and multiline

I'm very new to everything elastic, logstash and json. I was able to stand everything up and ingest one set of logs and query on them in kibana. My second data set is a json file whose fields vary from message to message. There are a lot of fields, about 176, but a single record which is multilined may only have 30 of the fields represented. My first question is will a records ingest into elastic stack if the field is in the index but others in the index are not present, or do I need to add the null fields? I haven't been able to index any docs for this json source.

The second question is how do I represent the multiline pattern in my config file. I have read many of the multiline posts, but I just can't seem to generalize their solutions to meet my need. I put message in the filter thinking that is some generic value that denotes a message. Does this value have to be some field structure that is present in the json? Here is my current config file:

input {
  file {
    type => "json"
    path => "C:/ESTK/*.json"
    codec => multiline {
    pattern => "}}"
    negate => "false"
    what => "previous"
    }
    start_position => "beginning"
    sincedb_path => "NUL"
    }
}
filter {
    json {source => "message"}
}
output {
   elasticsearch {
     action => "index"
     hosts => "http://localhost:9200"
     index => "gt"
  }
}

I appreciate any help with my issues.

To answer your first question, no, you do not need to add null values for fields that are missing from some events.

Please edit your post, select the configuration and click on </> in the toolbar above the edit panel. You should see the text appear in the preview panel on the right

    like 
this
    with indentation preserved.

Thirdly, can you show us what the json looks like. It's impossible to suggest what multiline pattern you should use if we do not know what the data looks like. You can sanitize all the values if you want.

Firstly, thank you for the insight regarding excluded fields in a record. I didn't mention before, but I am using es 7.2 and logstash 7.2 I have adjusted my post. Here is a sample json event:

  {
	"THRUSHPLUS": {
		"BILL_OF_LADING": "GBHTY7755563",
		"CARGO_DESCRIPTION": "33 PALLET, CEREALS|",
		"CARRIER": "ACL",
		"CONTAINER_NUMBER": "TOLU9813600",
		"DESTINATION_LRT_CD": "UNECE",
		"DESTINATION_RAW_INFO": "GBLIV",
		"DISCHARGE_PORT_ETA": "2019-07-03 00:00:00",
		"DISCHARGE_PORT_LRT_CD": "UNECE",
		"DISCHARGE_PORT_RAW_INFO": "GBLIV",
		"EQUIPMENT_STATUS": "IMPORT",
		"FCL_LCL": "FCL",
		"FULL_EMPTY": "FULL",
		"ISO_CONTAINER_CODE": "20GP",
		"LLOYDS_NO": "9670573",
		"LOAD_PORT_LRT_CD": "UNECE",
		"LOAD_PORT_RAW_INFO": "NLRTM",
		"MANIFEST_QUANTITY": "1",
		"MANIFEST_QUANTITY_UOM": "LT",
		"MESSAGE_TYPE": "DISCHARGE",
		"RAW_FILE_NAME": "1790180309",
		"SEALING_PARTY": "CA",
		"SEAL_NUMBER": "43790217",
		"SHIPPER_CODE": "WT782",
		"DATA_REF": "OPEN",
		"TIME_FRAME": "20180903",
		"VESSEL_CALL_SIGN": "2ITA4",
		"VESSEL_NAME": "ATLANTIC STAR",
		"VOYAGE_REF": "762E"
	}
}

If you JSON is pretty-printed like that then you can make your pattern look for the non-indented } that ends the object

codec => multiline { pattern => "^}" negate => true what => next auto_flush_interval => 1 }

I haven't been able to load any data. I thought that maybe my mapping was not correct. I did several things this afternoon:

  1. I simplified the json event to not be pretty-printed and to no longer be nested under the THRUSHPLUS field.
  2. I altered my es index map to conform to the new format.
  3. I altered my config file to suit the non-nest pretty-print format.

I am unsure of how to use the filter now. It was so easy with the CSV data that I have in es. I really don't know what to make of the json { source => XXXXX} filter. There is no single field under which the message data lies. I have have embedded the new message format and conf file.

input {
  file {
    type => "json"
    path => "C:/ESTK/*.json"
    start_position => "beginning"
    sincedb_path => "NUL"
    }
}
filter {
    json {source => "message"}
}
output {
   elasticsearch {
     action => "index"
     hosts => "http://localhost:9200"
     index => "gt"
  }
}

{"BARGE_CODE":"PAU2752","CARGO_DESCRIPTION":"LIRIODENDRON TULIPIFERA KD LUMBER LIRIODENDRON TULIPIFERA KD LUMBER HS CODE: 4407.97|.","CARRIER":"QRT","CHECK_DIGIT_CALCULATED":"0","CHECK_DIGIT_VALID":"TRUE","VESSEL_NAME":"MV BRIGADOON","VOYAGE_REF":"92W","WEIGHT_1":"21850"}

If that JSON is a single line from the file then you will get an event that looks like

           "VESSEL_NAME" => "MV BRIGADOON",
"CHECK_DIGIT_CALCULATED" => "0",
            "VOYAGE_REF" => "92W",
              "WEIGHT_1" => "21850",
     "CARGO_DESCRIPTION" => "LIRIODENDRON TULIPIFERA KD LUMBER LIRIODENDRON TULIPIFERA KD LUMBER HS CODE: 4407.97|.",
     "CHECK_DIGIT_VALID" => "TRUE",
               "CARRIER" => "QRT",
            "BARGE_CODE" => "PAU2752"

What do you not like about that?

I like it, but it doesn't seem to go into es. I do a doc count and it shows 0.

So my config is ok? Fyi, i don't have a field called message. Is this a problem?

If you are using a file input like that then each event will have a field called message and it will contain a single line from the file.

ok, so if I want to have all the data in the single line json in es as a single event message with all the fields, I should change the json format to an array and the es map as well?

What I am trying to manipulate is to have that input file shown above to ingest as a single event or doc in es. That is if event and doc are equivalent. I control the structure of the input json, so I can adjust it if need be.

With the configuration you posted, each line from the file will be a single event in logstash and a single document in elasticsearch.

I now don't believe that I want an array, but a grouping of objects (that single json record above as a single doc in es.

I still can't get data to go into es

I have made some changes:

  1. event file looks like this:

{"THRUSHPLUS": {"AGENT_ADDRESS_1": "","BARGE_CODE": "","BILL_OF_LADING": "","BL_ISSUING_AGENT": "","BOL_TYPE": "","CARGO_DESCRIPTION": "","CARGO_VOLUME": "1","CARRIER": "","CONTAINER_NUMBER": "","CONTAINER_TYPE": "","DISCHARGE_PORT_ATA": "2019-07-22 23:23:23","DISCHARGE_PORT_CITY": "","DISCHARGE_PORT_COUNTRY": "","DISCHARGE_PORT_ETA": "2019-07-22 23:23:23","DISCHARGE_PORT_LRT_CD": "","DISCHARGE_PORT_RAW_INFO": "","EQUIPMENT_STATUS": "","FCL_LCL": "","HAZMAT_CLASS": "","HAZMAT_CODE": "","HAZ_DESCRIPTION": "","HAZ_FLAG": "","ISO_CONTAINER_CODE": "","LOAD_PORT_ATD": "2019-07-22 23:23:23","LOAD_PORT_CITY": "","LOAD_PORT_COUNTRY": "","LOAD_PORT_ETD": "2019-07-22 23:23:23","LOAD_PORT_LRT_CD": "","LOAD_PORT_RAW_INFO": "","TARE_WEIGHT": "1","TEMP_CONTROLLED": "","TO_COUNTRY": "","TO_UNECE": "","TRAIN_CODE": "","TRUCK_CODE": "","VESSEL_NAME": "","VOYAGE_REF": "","WEIGHT_1": "1","WEIGHT_2": "1"}}

My logstash config looks like this:

input {
  file {
    type => "json"
    path => "C:/ESTK/*.json"
    start_position => "beginning"
    sincedb_path => "NUL"
    }
}
filter {
    json {source => "THRUSHPLUS"}
}
output {
   elasticsearch {
     action => "index"
     hosts => "http://localhost:9200"
     index => "gt"
  }
}

Though I have also tried substituting "message" for THRUSHPLUS.

This is my index map for es:

PUT gt
{
	"settings" : {
		"number_of_shards" : 1
	},
	"mappings" : {
		"properties" : {
			"CARGOLINKPLUS" : {
				"properties" : {
					"BARGE_CODE": { "type" : "text"},
					"BOL_TYPE": { "type" : "text"},
					"CARGO_DESCRIPTION": { "type" : "text"},
					"CARGO_VOLUME": { "type" : "float"},
					"CARRIER": { "type" : "text"},
					"CONTAINER_NUMBER": { "type" : "text"},
					"CONTAINER_TYPE": { "type" : "text"},
					"DISCHARGE_PORT_ATA": { "type" : "date","format": "yyyy-MM-dd HH:mm:ss"},
					"DISCHARGE_PORT_CITY": { "type" : "text"},
					"DISCHARGE_PORT_COUNTRY": { "type" : "text"},
					"DISCHARGE_PORT_ETA": { "type" : "date","format": "yyyy-MM-dd HH:mm:ss"},
					"DISCHARGE_PORT_LOC_ID": { "type" : "text"},
					"DISCHARGE_PORT_LRT_CD": { "type" : "text"},
					"DISCHARGE_PORT_RAW_INFO": { "type" : "text"},
					"EQUIPMENT_STATUS": { "type" : "text"},
					"FCL_LCL": { "type" : "text"},
					"HAZMAT_CLASS": { "type" : "text"},
					"HAZMAT_CODE": { "type" : "text"},
					"HAZ_DESCRIPTION": { "type" : "text"},
					"HAZ_FLAG": { "type" : "text"},
					"ISO_CONTAINER_CODE": { "type" : "text"},
					"LOAD_PORT_ATD": { "type" : "date","format": "yyyy-MM-dd HH:mm:ss"},
					"LOAD_PORT_CITY": { "type" : "text"},
					"LOAD_PORT_COUNTRY": { "type" : "text"},
					"LOAD_PORT_ETD": { "type" : "date","format": "yyyy-MM-dd HH:mm:ss"},
					"LOAD_PORT_LOC_ID": { "type" : "text"},
					"LOAD_PORT_LRT_CD": { "type" : "text"},
					"LOAD_PORT_RAW_INFO": { "type" : "text"},
					"TARE_WEIGHT": { "type" : "integer"},
					"TEMP_CONTROLLED": { "type" : "text"},
					"TRAIN_CODE": { "type" : "text"},
					"TRUCK_CODE": { "type" : "text"},
					"VESSEL_NAME": { "type" : "text"},
					"VOYAGE_REF": { "type" : "text"},
					"WEIGHT_1": { "type" : "float"},
					"WEIGHT_2": { "type" : "float"}
				}
			}
		}
	}
}

I just get no data going in.

What happens if you delete the mapping?

Sorry for the delay. The stack crashed. Rebuilt it and will respond in the AM.

Sorry for the delay. Deleting the mapping enabled es to index the data. That meant that the mapping and the data weren’t matched, so I reevaluated the mapping. Some extra spaces were found in the mapping. Have redone the mapping to ensure the data types desired are defined. Recreated the index and data is indexing. Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.