Import JSON with nested fields

Hello all,

I would like your support in defining a proper filter for importing the following JSON object:

    {
    	"status": "ok",
    	"message-type": "work-list",
    	"message-version": "1.0.0",
    	"message": {
    		"facets": {},
    		"total-results": 801,
    		"items": [
    			{
    				"volume": "69",
    				"issue": "3",
    				"title": [
    					"Non-Hermitian physics"
    				],
    				"URL": "http://dx.doi.org/10.1080/00018732.2021.1876991"
    			},
    			{
    				"volume": "69",
    				"issue": "2",
    				"title": [
    					"Correction"
    				],
    				"URL": "http://dx.doi.org/10.1080/00018732.2020.1859069"
    			}
    		],
    		"items-per-page": 5,
    		"query": {
    			"start-index": 0,
    			"search-terms": null
    		}
    	}
    }

The intention is to import the message field and split "items" as separate docs/searchable fields.

Reading into the forums and other helpful posts, I have reached to the following filter file (I added JSON section twice with source message as my object has a message field).

    input {
      file {
      #type => "json"
      start_position => "beginning"
      path => "D:/elastic/articles-file.json"
      sincedb_path => "NUL"
      }
    }

    filter {
    	json {
    		source => "message"
    	}
    	
    	json {
    		source => "message"
    	}
    	
    	split {
    	field => "items"
    	}
    	
    	mutate { 
    		remove_field => ["message", "@timestamp", "path", "host", "@version"]
    	}
    	
    	
    }

    output {
      elasticsearch {
        hosts => ["http://localhost:9200"]
        index => "articles"
    	}
    	
    	stdout {}
       
    }

Logstash pipeline seems to run indefinitely with the above filter.

I'm new to Elasticsearch stack and would appreciate your support.

Thank you,
Omran

If your json file is pretty-printed like that then you will need to use a multiline codec to consume the file as a single event. This is an example of how to do that.

You do not need two json filters. The first one will parse all the nested fields.

The split should be

split { field => "[message][items]" }

You may then want to move some of the fields to the root, which can be done using ruby, like this.

@Badger I appreciate you taking the time to respond to the question above.

Applying the codec as mentioned, doesn't seem to make Logstash able to read the json file. I would like to note that it was displayed as pretty json in the original post for demonstration purpose only.

This is an object that we receive directly from another system and manipulating it would be an expensive task (we are talking above 100GB of json objects!)

The object is as below:

{"status":"ok","message-type":"work-list","message-version":"1.0.0","message":{"facets":{},"total-results":801,"items":[{"volume":"69","issue":"3","title":["Non-Hermitian physics"],"URL":"http:\/\/dx.doi.org\/10.1080\/00018732.2021.1876991"},{"volume":"69","issue":"2","title":["Correction"],"URL":"http:\/\/dx.doi.org\/10.1080\/00018732.2020.1859069"},{"volume":"69","issue":"2","title":["Classical dynamical density functional theory: from fundamentals to applications"],"URL":"http:\/\/dx.doi.org\/10.1080\/00018732.2020.1854965"},{"volume":"69","issue":"1","title":["Molecular quantum materials: electronic phases and charge dynamics in two-dimensional organic solids"],"URL":"http:\/\/dx.doi.org\/10.1080\/00018732.2020.1837833"},{"volume":"68","issue":"4","title":["Light-matter interactions within the Ehrenfest\u2013Maxwell\u2013Pauli\u2013Kohn\u2013Sham framework: fundamentals, implementation, and nano-optical applications"],"URL":"http:\/\/dx.doi.org\/10.1080\/00018732.2019.1695875"}],"items-per-page":5,"query":{"start-index":0,"search-terms":null}}}

configuration file:

    input {
      file {
      start_position => "beginning"
      path => "D:/elastic/fulljson.json"
      codec => multiline { pattern => "^Spalanzani" negate => true what => previous auto_flush_interval => 1 multiline_tag => "" }
      sincedb_path => "NUL"
      }
    }

    filter {
    	json {
    		source => "message"
    	}

    	split {
    		field => "[message][items]"
    	}
    	
    	mutate { 
    		remove_field => ["message", "@timestamp", "path", "host", "@version"]
    	}
    	
    	
    }

    output {
      elasticsearch {
        hosts => ["http://localhost:9200"]
        index => "articlescodec"
    	}
    	
    	stdout {}
       
    }

There is something that we are not getting quite correct here. We are working on this project for our university and we would appreciate your support and/or pointing us to correct learning resources to find out more above the expected input for Logstash based on the filters.

Thank you,
Omran

If it is not pretty-printed you do not need the multiline codec.

By default, if your JSON contains a field called message then it will overwrite the [message] field. Do not use mutate+remove_field to delete all the JSON that you have just parsed.

Thank you @Badger much appreciate your support.

We were able to correctly ingest the JSON once we add another object in the same file. For a single test object, Logstash seems to wait indefinitely.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.