Separate documents from API request

Hi,

We're using logstash with the HTTP_input plugin, to index events from a ReST API. The problem is that for every API request, we effectively index duplicates. If we run a GET request the output looks something like this:

{
  "_index": "randomindex",
  "_type": "_doc",
  "_id": "XTqUyG8BFEo1Fcv79Ns-",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2020-01-21T14:50:02.087Z",
    "@version": "1",
    "requests": [
      {
        "internalIp": "127.0.0.1",
        "categories": [
          "Ecommerce/Shopping",
          "Business Services",
          "Financial Institutions",
          "Phishing",
          "Infrastructure",
          "Application"
        ],
        "datetime": "2020-01-21T12:26:36.000Z",
        "originType": "Roaming Computers",
        "originId": 1231231,
        "tags": [],
        "externalIp": "127.0.0.1",
      },
      {
        "internalIp": "127.0.0.1",
        "categories": [
          "Ecommerce/Shopping",
          "Business Services",
          "Financial Institutions",
          "Phishing",
          "Infrastructure",
          "Application"
        ],
        "datetime": "2020-01-21T06:02:42.000Z",
        "originType": "Roaming Computers",
        "originId": 1231231,
        "tags": [],
        "externalIp": "127.0.0.1",
      },
      {
        "internalIp": "127.0.0.1",
        "categories": [
          "Sports",
          "Malware"
        ],
        "datetime": "2020-01-21T10:52:54.000Z",
        "originType": "Roaming Computers",
        "originId": 123123,
        "tags": [],
      },

The next time we GET from the API, we get the same logs back, effectively indexing duplicates.

Could we somehow use some field in the event to sort out duplicates? Maybe using the "datetime" field?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.