Wrong log combinations

EZprogramming · June 21, 2019, 3:32pm

Initially, I wrote a Logstash configuration file to pull Balena information in JSON data and then parse it with Logstash ruby filter plugin which made the data look like this:

{
      "@version" => "1",
        "values" => [
        [ 0] {
            "machineId" => 1415733,
              "release" => "031d61fb13fb362afefb59143e5ae5d2",
            "timestamp" => "2019-03-22T22:12:50.761Z"
        },
        [ 1] {
            "machineId" => 1415733,
              "release" => "0a67dbf3644916bdbb4e36710131720a",
            "timestamp" => "2019-05-09T01:06:25.489Z"
        },
        [ 2] {
            "machineId" => 1415733,
              "release" => "0e3641dffad6fb901227270c60a9639d",
            "timestamp" => "2019-03-29T00:10:59.828Z"
        },
        [ 3] {
            "machineId" => 1415733,
              "release" => "178dddaa4a3dd66bd9844f11b9016949",
            "timestamp" => "2019-04-04T16:28:55.552Z"
        },
        [ 4] {
            "machineId" => 1415733,
              "release" => "2200c40d0972f4fe25bba661e7945112",
            "timestamp" => "2019-06-12T18:22:53.281Z"
        },
        [ 5] {
            "machineId" => 1415733,
              "release" => "228245f20d67ac801f5b4f4f111caa10",
            "timestamp" => "2019-04-09T22:52:52.100Z"
        },
        [ 6] {
            "machineId" => 1415733,
              "release" => "29bdf4d57ea97a9f1b4f6b7357beb1bc",
            "timestamp" => "2019-06-20T18:13:27.244Z"
        },
        [ 7] {
            "machineId" => 1415733,
              "release" => "2be83890252b2599f5c16bbc773a89d8",
            "timestamp" => "2019-04-29T22:58:58.475Z"
        },
        [ 8] {
            "machineId" => 1415733,
              "release" => "40905efa747878c5ab5d7f238fd8d048",
            "timestamp" => "2019-06-20T21:05:55.330Z"
        },
        [ 9] {
            "machineId" => 1415733,
              "release" => "47bccb6aa9f65bd044d14c213894ea10",
            "timestamp" => "2019-05-08T22:42:02.767Z"
        },
        [10] {
            "machineId" => 1415733,
              "release" => "488dceed662b85dc604161ac834e97dc",
            "timestamp" => "2019-05-03T18:18:44.272Z"
        },
        [11] {
            "machineId" => 1415733,
              "release" => "4fae8f7aa5f01425f1c8c882350ee488",
            "timestamp" => "2019-04-22T23:34:26.274Z"
        },
        [12] {
            "machineId" => 1415733,
              "release" => "57441b6fbb6382b980f704bb27bdcca0",
            "timestamp" => "2019-03-25T19:01:28.316Z"
        },
        [13] {
            "machineId" => 1415733,
              "release" => "65cafefe43a8faa082388e3d6a6c76c0",
            "timestamp" => "2019-04-05T22:09:08.675Z"
        },
        [14] {
            "machineId" => 1415733,
              "release" => "6cb260fe85023253659a9955ba56bf3b",
            "timestamp" => "2019-05-13T23:24:50.640Z"
        },
        [15] {
            "machineId" => 1415733,
              "release" => "798bd260cc74601363cb4653774a1003",
            "timestamp" => "2019-03-29T16:48:59.544Z"
        }
       *** deleted some of them ***
    ],
    "@timestamp" => 2019-06-21T15:21:28.145Z
}

Then I sent the data to Elasticsearch for indexing purposes, and I wanted to show a table in Kibana where we have a timestamp, machine id, and the release values. However, if you look in the picture below you can see the release number is the same and it is being repeated for every timestamp which is the wrong data. Every timestamp should have a new release values (it is unique in the json data above). Is there anyway to fix this in either Logstash, Elasticsearch or Kibana?

Here is my logstash configuration for reference:

input{
  http_poller {
    urls => {      
      authentication => {
        method => get
        user => "myEmailAddress"
        password => "myPassword"
        url => "https://api.balena-cloud.com/v4/release?$filter=belongs_to__application%20eq%20<APP ID>"
        headers => {
          "Content-Type" => "application/json"
          "Authorization" => "Bearer <AUTH_TOKEN>"
        }
      }
    }
    request_timeout => 60
    schedule => { every => "5s"}
    codec => "json"
  }
} 

filter{
  if ["event"] != "" {
    # ***** filters json data ***** #
    ruby {
      code => '
        a = []
        i = 0
        event.get("d").each { |x|
          h = {}
          h["release"] = x["commit"]
          h["timestamp"] = x["created_at"]
          h["machineId"] = x["belongs_to__application"]["__id"]
          a[i] = h
          i += 1
        }
        event.set("values", a)
      '
      remove_field => ["d"]
    }

    date {
      match => ["log-datestamp", "YYYY-MM-dd HH:mm:ss,SSS"]
      target =>  "@timestamp"
      timezone => "UTC"
    }       
    date {
      match => ["log-datestamp", "YY-MM-dd HH:mm:ss,SSS"]
      target =>  "@timestamp"
      timezone => "UTC"
    }    
    date {
      match => ["log-datestamp", "ISO8601"]
      target =>  "@timestamp"
      timezone => "UTC"
    }    
    date {
      match => ["log-epoch", "UNIX"]
      target =>  "@timestamp"
      timezone => "UTC"
    }    
    date {
      match => ["log-epoch", "UNIX_MS"]
      target =>  "@timestamp"
      timezone => "UTC"
    }
  }
}

output{
  stdout { 
    codec => rubydebug 
  }
}

I appreciate any help/suggestion.

EZprogramming · June 21, 2019, 6:24pm

Here are my Kibana configurations for the Table visualization:

Also, I updated the original photo in my question.

Bargs · June 21, 2019, 10:57pm

Your table configuration looks correct to me. Perhaps the data has been ingested into ES incorrectly. Could you do a search for one of those exact timestamps in discover and see which individual documents come back?

EZprogramming · June 24, 2019, 11:25pm

Hi @Bargs, thanks for your suggestion. I used Elasticsearch to search my index and I've found that I only have 1 record (1 hit) inside my index which is why I believe this kind of error is occurring.

Could you please tell me how can I make every single JSON to count as one hit?

Personally, I have thought of using different variables for each json instead of using an array, but that makes it hard to build my visualization and dashboard because I have to enter each release, timestamp, and machine id one by one which is a lot of work.

Elasticsearch Index:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "newbalenaindex",
        "_type" : "_doc",
        "_id" : "1aqMd2sBLWZVc3oBFxi6",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2019-06-21T00:59:57.743Z",
          "values" : [
            {
              "release" : "031d61fb13fb362afefb59143e5ae5d2",
              "timestamp" : "2019-03-22T22:12:50.761Z",
              "machineId" : 1415733
            },
            {
              "release" : "0a67dbf3644916bdbb4e36710131720a",
              "timestamp" : "2019-05-09T01:06:25.489Z",
              "machineId" : 1415733
            },
            {
              "release" : "0e3641dffad6fb901227270c60a9639d",
              "timestamp" : "2019-03-29T00:10:59.828Z",
              "machineId" : 1415733
            },
            {
              "release" : "178dddaa4a3dd66bd9844f11b9016949",
              "timestamp" : "2019-04-04T16:28:55.552Z",
              "machineId" : 1415733
            },
            {
              "release" : "2200c40d0972f4fe25bba661e7945112",
              "timestamp" : "2019-06-12T18:22:53.281Z",
              "machineId" : 1415733
            },
            {
              "release" : "228245f20d67ac801f5b4f4f111caa10",
              "timestamp" : "2019-04-09T22:52:52.100Z",
              "machineId" : 1415733
            },
            {
              "release" : "29bdf4d57ea97a9f1b4f6b7357beb1bc",
              "timestamp" : "2019-06-20T18:13:27.244Z",
              "machineId" : 1415733
            },
            {
              "release" : "2be83890252b2599f5c16bbc773a89d8",
              "timestamp" : "2019-04-29T22:58:58.475Z",
              "machineId" : 1415733
            },
            {
              "release" : "40905efa747878c5ab5d7f238fd8d048",
              "timestamp" : "2019-06-20T21:05:55.330Z",
              "machineId" : 1415733
            },
            {
              "release" : "47bccb6aa9f65bd044d14c213894ea10",
              "timestamp" : "2019-05-08T22:42:02.767Z",
              "machineId" : 1415733
            },
            {
              "release" : "488dceed662b85dc604161ac834e97dc",
              "timestamp" : "2019-05-03T18:18:44.272Z",
              "machineId" : 1415733
            },
            {
              "release" : "4fae8f7aa5f01425f1c8c882350ee488",
              "timestamp" : "2019-04-22T23:34:26.274Z",
              "machineId" : 1415733
            },
            {
              "release" : "57441b6fbb6382b980f704bb27bdcca0",
              "timestamp" : "2019-03-25T19:01:28.316Z",
              "machineId" : 1415733
            },
            {
              "release" : "65cafefe43a8faa082388e3d6a6c76c0",
              "timestamp" : "2019-04-05T22:09:08.675Z",
              "machineId" : 1415733
            },
            {
              "release" : "6cb260fe85023253659a9955ba56bf3b",
              "timestamp" : "2019-05-13T23:24:50.640Z",
              "machineId" : 1415733
            },
            {
              "release" : "798bd260cc74601363cb4653774a1003",
              "timestamp" : "2019-03-29T16:48:59.544Z",
              "machineId" : 1415733
            },
            {
              "release" : "e8b8ede03a1b9082f8005a0221dd9507",
              "timestamp" : "2019-05-09T16:57:25.041Z",
              "machineId" : 1415733
            },
            {
              "release" : "ead32058b64f8282e195a7111e38af67",
              "timestamp" : "2019-04-03T23:35:30.663Z",
              "machineId" : 1415733
            },
            {
              "release" : "f24bc0a36b0bfa70dd4a6f7a51795f45",
              "timestamp" : "2019-04-22T23:27:27.764Z",
              "machineId" : 1415733
            },
            {
              "release" : "f8b01b60151713d2d0817f09241af0e7",
              "timestamp" : "2019-06-20T19:27:22.403Z",
              "machineId" : 1415733
            },
            {
              "release" : "fe8fcbedd2893f6f5a40523cd0c1843b",
              "timestamp" : "2019-04-05T21:31:58.174Z",
              "machineId" : 1415733
            }
          ],
          "@version" : "1"
        }
      }
    ]
  }
}

Note: I deleted some of the JSON inside the array because I had > 7000 character limit.

Bargs · June 25, 2019, 9:04pm

It depends on how you want to search and aggregate on your data, but it looks to me like each of those objects in the array should be its own document inside Elasticsearch.

EZprogramming · June 25, 2019, 9:09pm

Yes, I agree, each one of those JSON { ... } objects have to be inside their own _source, but as we can see, they are all under one. How can I separate them? Any idea how to do this in Elasticsearch or Logstash? Preferrably Logstash.

Bargs · June 25, 2019, 9:58pm

I'm not a wiz with logstash, so I've moved this thread over to the Logstash forum where someone with more ingest experience can take a look.

Badger · June 25, 2019, 10:14pm

In logstash

split { field => "values" }

will create a new event for every entry in the array [values].

EZprogramming · June 25, 2019, 11:04pm

@Badger, thank you hero!

This fixed my issue

Solution:

filter{
ruby {
  code => '
    a = []
    event.get("d").each { |x|
      h = {}
      h["release"] = x["commit"]
      h["timestamp"] = x["created_at"]
      h["machineId"] = x["belongs_to__application"]["__id"]
      a << h
    }
    event.set("message", a)
  '
  remove_field => ["d"]
}

split { 
  field => "message" 
}
}

system · July 23, 2019, 11:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can Logstash filtering fix Kibana table issue? Logstash	5	412	July 22, 2019
Incorrectly collected data in Logstash to Elasticsearch Logstash	12	898	June 7, 2022
How to parse mix json logs Logstash	28	4616	March 25, 2019
Using logstash for parsing logs Logstash	21	6818	July 6, 2017
Logstash nested array JSON parsing Logstash	15	12143	October 20, 2017

Wrong log combinations

Related topics