Wrong log combinations

Initially, I wrote a Logstash configuration file to pull Balena information in JSON data and then parse it with Logstash ruby filter plugin which made the data look like this:

{
      "@version" => "1",
        "values" => [
        [ 0] {
            "machineId" => 1415733,
              "release" => "031d61fb13fb362afefb59143e5ae5d2",
            "timestamp" => "2019-03-22T22:12:50.761Z"
        },
        [ 1] {
            "machineId" => 1415733,
              "release" => "0a67dbf3644916bdbb4e36710131720a",
            "timestamp" => "2019-05-09T01:06:25.489Z"
        },
        [ 2] {
            "machineId" => 1415733,
              "release" => "0e3641dffad6fb901227270c60a9639d",
            "timestamp" => "2019-03-29T00:10:59.828Z"
        },
        [ 3] {
            "machineId" => 1415733,
              "release" => "178dddaa4a3dd66bd9844f11b9016949",
            "timestamp" => "2019-04-04T16:28:55.552Z"
        },
        [ 4] {
            "machineId" => 1415733,
              "release" => "2200c40d0972f4fe25bba661e7945112",
            "timestamp" => "2019-06-12T18:22:53.281Z"
        },
        [ 5] {
            "machineId" => 1415733,
              "release" => "228245f20d67ac801f5b4f4f111caa10",
            "timestamp" => "2019-04-09T22:52:52.100Z"
        },
        [ 6] {
            "machineId" => 1415733,
              "release" => "29bdf4d57ea97a9f1b4f6b7357beb1bc",
            "timestamp" => "2019-06-20T18:13:27.244Z"
        },
        [ 7] {
            "machineId" => 1415733,
              "release" => "2be83890252b2599f5c16bbc773a89d8",
            "timestamp" => "2019-04-29T22:58:58.475Z"
        },
        [ 8] {
            "machineId" => 1415733,
              "release" => "40905efa747878c5ab5d7f238fd8d048",
            "timestamp" => "2019-06-20T21:05:55.330Z"
        },
        [ 9] {
            "machineId" => 1415733,
              "release" => "47bccb6aa9f65bd044d14c213894ea10",
            "timestamp" => "2019-05-08T22:42:02.767Z"
        },
        [10] {
            "machineId" => 1415733,
              "release" => "488dceed662b85dc604161ac834e97dc",
            "timestamp" => "2019-05-03T18:18:44.272Z"
        },
        [11] {
            "machineId" => 1415733,
              "release" => "4fae8f7aa5f01425f1c8c882350ee488",
            "timestamp" => "2019-04-22T23:34:26.274Z"
        },
        [12] {
            "machineId" => 1415733,
              "release" => "57441b6fbb6382b980f704bb27bdcca0",
            "timestamp" => "2019-03-25T19:01:28.316Z"
        },
        [13] {
            "machineId" => 1415733,
              "release" => "65cafefe43a8faa082388e3d6a6c76c0",
            "timestamp" => "2019-04-05T22:09:08.675Z"
        },
        [14] {
            "machineId" => 1415733,
              "release" => "6cb260fe85023253659a9955ba56bf3b",
            "timestamp" => "2019-05-13T23:24:50.640Z"
        },
        [15] {
            "machineId" => 1415733,
              "release" => "798bd260cc74601363cb4653774a1003",
            "timestamp" => "2019-03-29T16:48:59.544Z"
        }
       *** deleted some of them ***
    ],
    "@timestamp" => 2019-06-21T15:21:28.145Z
}

Then I sent the data to Elasticsearch for indexing purposes, and I wanted to show a table in Kibana where we have a timestamp, machine id, and the release values. However, if you look in the picture below you can see the release number is the same and it is being repeated for every timestamp which is the wrong data. Every timestamp should have a new release values (it is unique in the json data above). Is there anyway to fix this in either Logstash, Elasticsearch or Kibana?

Here is my logstash configuration for reference:

input{
  http_poller {
    urls => {      
      authentication => {
        method => get
        user => "myEmailAddress"
        password => "myPassword"
        url => "https://api.balena-cloud.com/v4/release?$filter=belongs_to__application%20eq%20<APP ID>"
        headers => {
          "Content-Type" => "application/json"
          "Authorization" => "Bearer <AUTH_TOKEN>"
        }
      }
    }
    request_timeout => 60
    schedule => { every => "5s"}
    codec => "json"
  }
} 

filter{
  if ["event"] != "" {
    # ***** filters json data ***** #
    ruby {
      code => '
        a = []
        i = 0
        event.get("d").each { |x|
          h = {}
          h["release"] = x["commit"]
          h["timestamp"] = x["created_at"]
          h["machineId"] = x["belongs_to__application"]["__id"]
          a[i] = h
          i += 1
        }
        event.set("values", a)
      '
      remove_field => ["d"]
    }

    date {
      match => ["log-datestamp", "YYYY-MM-dd HH:mm:ss,SSS"]
      target =>  "@timestamp"
      timezone => "UTC"
    }       
    date {
      match => ["log-datestamp", "YY-MM-dd HH:mm:ss,SSS"]
      target =>  "@timestamp"
      timezone => "UTC"
    }    
    date {
      match => ["log-datestamp", "ISO8601"]
      target =>  "@timestamp"
      timezone => "UTC"
    }    
    date {
      match => ["log-epoch", "UNIX"]
      target =>  "@timestamp"
      timezone => "UTC"
    }    
    date {
      match => ["log-epoch", "UNIX_MS"]
      target =>  "@timestamp"
      timezone => "UTC"
    }
  }
}

output{
  stdout { 
    codec => rubydebug 
  }
}

I appreciate any help/suggestion.

Here are my Kibana configurations for the Table visualization:

Also, I updated the original photo in my question.

Your table configuration looks correct to me. Perhaps the data has been ingested into ES incorrectly. Could you do a search for one of those exact timestamps in discover and see which individual documents come back?

Hi @Bargs, thanks for your suggestion. I used Elasticsearch to search my index and I've found that I only have 1 record (1 hit) inside my index which is why I believe this kind of error is occurring.

Could you please tell me how can I make every single JSON to count as one hit?

  • Personally, I have thought of using different variables for each json instead of using an array, but that makes it hard to build my visualization and dashboard because I have to enter each release, timestamp, and machine id one by one which is a lot of work.

Elasticsearch Index:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "newbalenaindex",
        "_type" : "_doc",
        "_id" : "1aqMd2sBLWZVc3oBFxi6",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2019-06-21T00:59:57.743Z",
          "values" : [
            {
              "release" : "031d61fb13fb362afefb59143e5ae5d2",
              "timestamp" : "2019-03-22T22:12:50.761Z",
              "machineId" : 1415733
            },
            {
              "release" : "0a67dbf3644916bdbb4e36710131720a",
              "timestamp" : "2019-05-09T01:06:25.489Z",
              "machineId" : 1415733
            },
            {
              "release" : "0e3641dffad6fb901227270c60a9639d",
              "timestamp" : "2019-03-29T00:10:59.828Z",
              "machineId" : 1415733
            },
            {
              "release" : "178dddaa4a3dd66bd9844f11b9016949",
              "timestamp" : "2019-04-04T16:28:55.552Z",
              "machineId" : 1415733
            },
            {
              "release" : "2200c40d0972f4fe25bba661e7945112",
              "timestamp" : "2019-06-12T18:22:53.281Z",
              "machineId" : 1415733
            },
            {
              "release" : "228245f20d67ac801f5b4f4f111caa10",
              "timestamp" : "2019-04-09T22:52:52.100Z",
              "machineId" : 1415733
            },
            {
              "release" : "29bdf4d57ea97a9f1b4f6b7357beb1bc",
              "timestamp" : "2019-06-20T18:13:27.244Z",
              "machineId" : 1415733
            },
            {
              "release" : "2be83890252b2599f5c16bbc773a89d8",
              "timestamp" : "2019-04-29T22:58:58.475Z",
              "machineId" : 1415733
            },
            {
              "release" : "40905efa747878c5ab5d7f238fd8d048",
              "timestamp" : "2019-06-20T21:05:55.330Z",
              "machineId" : 1415733
            },
            {
              "release" : "47bccb6aa9f65bd044d14c213894ea10",
              "timestamp" : "2019-05-08T22:42:02.767Z",
              "machineId" : 1415733
            },
            {
              "release" : "488dceed662b85dc604161ac834e97dc",
              "timestamp" : "2019-05-03T18:18:44.272Z",
              "machineId" : 1415733
            },
            {
              "release" : "4fae8f7aa5f01425f1c8c882350ee488",
              "timestamp" : "2019-04-22T23:34:26.274Z",
              "machineId" : 1415733
            },
            {
              "release" : "57441b6fbb6382b980f704bb27bdcca0",
              "timestamp" : "2019-03-25T19:01:28.316Z",
              "machineId" : 1415733
            },
            {
              "release" : "65cafefe43a8faa082388e3d6a6c76c0",
              "timestamp" : "2019-04-05T22:09:08.675Z",
              "machineId" : 1415733
            },
            {
              "release" : "6cb260fe85023253659a9955ba56bf3b",
              "timestamp" : "2019-05-13T23:24:50.640Z",
              "machineId" : 1415733
            },
            {
              "release" : "798bd260cc74601363cb4653774a1003",
              "timestamp" : "2019-03-29T16:48:59.544Z",
              "machineId" : 1415733
            },
            {
              "release" : "e8b8ede03a1b9082f8005a0221dd9507",
              "timestamp" : "2019-05-09T16:57:25.041Z",
              "machineId" : 1415733
            },
            {
              "release" : "ead32058b64f8282e195a7111e38af67",
              "timestamp" : "2019-04-03T23:35:30.663Z",
              "machineId" : 1415733
            },
            {
              "release" : "f24bc0a36b0bfa70dd4a6f7a51795f45",
              "timestamp" : "2019-04-22T23:27:27.764Z",
              "machineId" : 1415733
            },
            {
              "release" : "f8b01b60151713d2d0817f09241af0e7",
              "timestamp" : "2019-06-20T19:27:22.403Z",
              "machineId" : 1415733
            },
            {
              "release" : "fe8fcbedd2893f6f5a40523cd0c1843b",
              "timestamp" : "2019-04-05T21:31:58.174Z",
              "machineId" : 1415733
            }
          ],
          "@version" : "1"
        }
      }
    ]
  }
}

Note: I deleted some of the JSON inside the array because I had > 7000 character limit.

It depends on how you want to search and aggregate on your data, but it looks to me like each of those objects in the array should be its own document inside Elasticsearch.

Yes, I agree, each one of those JSON { ... } objects have to be inside their own _source, but as we can see, they are all under one. How can I separate them? Any idea how to do this in Elasticsearch or Logstash? Preferrably Logstash.

I'm not a wiz with logstash, so I've moved this thread over to the Logstash forum where someone with more ingest experience can take a look.

In logstash

split { field => "values" }

will create a new event for every entry in the array [values].

1 Like

@Badger, thank you hero!

This fixed my issue :slight_smile:

Solution:

filter{
ruby {
  code => '
    a = []
    event.get("d").each { |x|
      h = {}
      h["release"] = x["commit"]
      h["timestamp"] = x["created_at"]
      h["machineId"] = x["belongs_to__application"]["__id"]
      a << h
    }
    event.set("message", a)
  '
  remove_field => ["d"]
}

split { 
  field => "message" 
}
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.