Ingest json file containing several json objects (one per line) sent from java (over http) to logstash to elasticsearch

Hi there,

I'm very very new to ELK and I am facing an issue.

I try to send a json file containing several json object (one per line) from a Java application to logstash.

Here is the file I send over http post request to logstash

{"ISO8601 TIME":"2024-07-16T03:00:03.591148+02:00","name":"bob","age":20,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:04.591148+02:00","name":"patrick","age":19,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:05.591148+02:00","name":"carlo","age":40,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:06.591148+02:00","name":"captain krabs","age":54,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:07.591148+02:00","name":"garry","age":8,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:08.591148+02:00","name":"sandy","age":22,"index_":"test-log"}

Here is the java code portion were I send the file :

            HttpClient client = HttpClient.newBuilder()
                    .followRedirects(HttpClient.Redirect.ALWAYS)
                    .build();
            
            HttpRequest request = HttpRequest.newBuilder()
                    .uri(URI.create(PropertiesManager.getConfig().getProperty("endpoint")))
                    .timeout(Duration.ofMinutes(2))
                    .header("Content-Type","application/json")
                    .method("POST", BodyPublishers.ofFile(jsonFile.toPath()))
                    .build();
            
            HttpResponse response = client.send(request, BodyHandlers.ofString());

Here is my logstash.conf file :

input {
  http {
    port => 5044
	codec => json_lines
  }
}

output {
	elasticsearch {
		hosts => "https://localhost:9200"
		ssl_certificate_authorities => ["C:\Users\xxx\Elastic\logstash-8.14.1\config\certs\http_ca.crt"]
		user => "xxx" # replaced
		password => "xxx" # replaced
		index => "%{[index_]}-%{+YYYY.MM.dd}"
	}
}

My json file is correctly sent to logstash, but whenever the data is sent to elastic, I just have one document created, only the first json line of the file.
The most strange part is that the whole file is stored in the document, in the "event.original" field :

{
  "@timestamp": [
    "2024-07-20T14:34:12.714Z"
  ],
  "@version": [
    "1"
  ],
  "@version.keyword": [
    "1"
  ],
  "age": [
    20
  ],
  "event.original": [
    "{\"ISO8601 TIME\":\"2024-07-16T03:00:03.591148+02:00\",\"name\":\"bob\",\"age\":20,\"index_\":\"test-log\"}\r\n{\"ISO8601 TIME\":\"2024-07-16T03:00:04.591148+02:00\",\"name\":\"patrick\",\"age\":19,\"index_\":\"test-log\"}\r\n{\"ISO8601 TIME\":\"2024-07-16T03:00:05.591148+02:00\",\"name\":\"carlo\",\"age\":40,\"index_\":\"test-log\"}\r\n{\"ISO8601 TIME\":\"2024-07-16T03:00:06.591148+02:00\",\"name\":\"captain krabs\",\"age\":54,\"index_\":\"test-log\"}\r\n{\"ISO8601 TIME\":\"2024-07-16T03:00:07.591148+02:00\",\"name\":\"garry\",\"age\":8,\"index_\":\"test-log\"}\r\n{\"ISO8601 TIME\":\"2024-07-16T03:00:08.591148+02:00\",\"name\":\"sandy\",\"age\":22,\"index_\":\"test-log\"}\r\n"
  ],
  "event.original.keyword": [
    "{\"ISO8601 TIME\":\"2024-07-16T03:00:03.591148+02:00\",\"name\":\"bob\",\"age\":20,\"index_\":\"test-log\"}\r\n{\"ISO8601 TIME\":\"2024-07-16T03:00:04.591148+02:00\",\"name\":\"patrick\",\"age\":19,\"index_\":\"test-log\"}\r\n{\"ISO8601 TIME\":\"2024-07-16T03:00:05.591148+02:00\",\"name\":\"carlo\",\"age\":40,\"index_\":\"test-log\"}\r\n{\"ISO8601 TIME\":\"2024-07-16T03:00:06.591148+02:00\",\"name\":\"captain krabs\",\"age\":54,\"index_\":\"test-log\"}\r\n{\"ISO8601 TIME\":\"2024-07-16T03:00:07.591148+02:00\",\"name\":\"garry\",\"age\":8,\"index_\":\"test-log\"}\r\n{\"ISO8601 TIME\":\"2024-07-16T03:00:08.591148+02:00\",\"name\":\"sandy\",\"age\":22,\"index_\":\"test-log\"}\r\n"
  ],
  "host.ip": [
    "127.0.0.1"
  ],
  "host.ip.keyword": [
    "127.0.0.1"
  ],
  "http.method": [
    "POST"
  ],
  "http.method.keyword": [
    "POST"
  ],
  "http.request.body.bytes": [
    "589"
  ],
  "http.request.body.bytes.keyword": [
    "589"
  ],
  "http.request.mime_type": [
    "application/json"
  ],
  "http.request.mime_type.keyword": [
    "application/json"
  ],
  "http.version": [
    "HTTP/1.1"
  ],
  "http.version.keyword": [
    "HTTP/1.1"
  ],
  "index_": [
    "test-log"
  ],
  "index_.keyword": [
    "test-log"
  ],
  "ISO8601 TIME": [
    "2024-07-16T01:00:03.591Z"
  ],
  "name": [
    "bob"
  ],
  "name.keyword": [
    "bob"
  ],
  "url.domain": [
    "127.0.0.1"
  ],
  "url.domain.keyword": [
    "127.0.0.1"
  ],
  "url.path": [
    "/"
  ],
  "url.path.keyword": [
    "/"
  ],
  "url.port": [
    5044
  ],
  "user_agent.original": [
    "Java-http-client/17.0.11"
  ],
  "user_agent.original.keyword": [
    "Java-http-client/17.0.11"
  ],
  "_id": "l9eQ0JABEpRCJdcZj2Hq",
  "_index": "test-log-2024.07.20",
  "_score": null
}

What am I doing wrong ?
I have tried a lot of different options in logstash conf file, but none of them solved my problem unfortunately.

Any help would be appreciated.

Thanks

Welcome!

Not answering your question, but not sending the data directly to Elasticsearch?

I think this is ignored because the content is application/json. Try

additional_codecs => {}
codec => json_lines

Similar to this.

Besides what @Badger mentioned, I think you can just use the json codec instead of the json_lines.

What does your payload looks like?

If this is something like this:

[{json_document_1},{json_document_2},{json_document_3}]

Then the json codec would work, I would also use a target field.

additional_codecs => {}
codec => json { target => "[document]" }

Many thanks, this solved the issue.
If I may, what does this additional_codecs do actually ?
Because it really is a strange behavior

Are you referring to the Elasticsearch Java API Client ?
I have seen it, it looks like it is very easy to set up.
Be sure I will look for this !

The reason I did use logstash in the first place was to look for the whole stack. If not necessary, I will try to avoid its use.

My first approach was to develop a very light java app that just format data and send it as a json file as quickly as possible, and sending the file to logstash seemed to be the easiest way to do that.

The java app will send 5M json objects/day to elastic (100K/15min), which approach would you suggest ?

Many thanks

It is used to apply different codes for different content types.

Apply specific codecs for specific content types. The default codec will be applied only after this list is checked and no codec for the request’s content-type is found

But I think that the issue here is that you are sending the content type in your request, so the default codec option is ignored.

This work around forces the input to use the codec specified in the codec option.

I'm using this as well without any issue.

Yeah, there are some things like that in Logstash as it haven't got the same improvements and attention that other elastic tools have in the last years.

1 Like

Thank you for the clarification.

As to your previous question (about using json codec instead of json_lines), this is the input file I'm using :

{"ISO8601 TIME":"2024-07-16T03:00:03.591148+02:00","name":"bob","age":20,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:04.591148+02:00","name":"patrick","age":19,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:05.591148+02:00","name":"carlo","age":40,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:06.591148+02:00","name":"captain krabs","age":54,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:07.591148+02:00","name":"garry","age":8,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:08.591148+02:00","name":"sandy","age":22,"index_":"test-log"}

There is only one json per line, and they are not in an array

Yeah, but how is the payload being built by your application?

If you are sending a batch of lines in one request it is probably an array of json objects.

I do not use java, but this is how I normally see people doing in python for example.

Oh sorry for the confusion.

I just create a file from Java where I build JSON object, convert it to a compact json string (with Jackson library) and append it to the file, one by one, in a loop.

The content of the file is as is :

{"ISO8601 TIME":"2024-07-16T03:00:03.591148+02:00","name":"bob","age":20,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:04.591148+02:00","name":"patrick","age":19,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:05.591148+02:00","name":"carlo","age":40,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:06.591148+02:00","name":"captain krabs","age":54,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:07.591148+02:00","name":"garry","age":8,"index_":"test-log"}
{"ISO8601 TIME":"2024-07-16T03:00:08.591148+02:00","name":"sandy","age":22,"index_":"test-log"}

The above is the whole content of the file I'm sending to logstash.

The http input needs to be able to use different codecs for different content-types. These are configured using the additional_codecs option.

If additional_codecs does not include the content-type in the POST then the input needs a default, which is provided by the codec option. So the codec option does not work the way it does on every other input.

I think if the options had been named differently (e.g codecs and fallback_codec) there might be less confusion.

Yes

That's better indeed not to use when you don't need it.

Have a look at this: Switching from the Java High Level Rest Client to the new Java API Client | Elastic Blog

It's easy and will do the job.