Double documents?

Hi,

For some reason, all documents end up twice in my indexes. and I do not have the slightest idea why... I mean... Even the doc_id seems the same.
Consider these 2:


Both exactly the same. When I log this on the appserver with filebeat, I see only one record send to logstash.

These are the first few fields of both documents. I see no difference...


But then... why do I have 2? The least I expect is that elasticsearch overwrites the one with the other? (Although I'ld rather prefer fincing out why the are double in the first place).

Will place some configs in the next posts.

Active settings in logstash.yml:

path.data: /var/lib/logstash
pipeline.batch.size: 200
path.config: /etc/logstash/conf.d
config.support_escapes: true
queue.type: persisted



path.logs: /var/log/logstash

pipelines.yml:

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html

- pipeline.id: agl
  path.config: "/etc/logstash/conf.d/agl.*.conf"
- pipeline.id: apicache
  path.config: "/etc/logstash/conf.d/apicache.*.conf"
- pipeline.id: authms
  path.config: "/etc/logstash/conf.d/authms.*.conf"
- pipeline.id: rest
  path.config: "/etc/logstash/conf.d/rest.*.conf"

The logstash pipeline config (input,filter and output are 3 seperate files, hence the agl.*.conf setting above):

input {
        beats {
        port => 5044
        }
}
filter {
        if "avs6" in [tags] and "api-log" in [tags] and "apigateway" in [tags] {
                mutate {
                strip => ["message"]
                }

                grok {
                        pattern_definitions => {
                                "TIMESTAMP" => "\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}"
                                "DELIM" => "\s*\|\s*|"
                                "STRING" => "[^|]+"
                                "GUID" => "[a-zA-Z0-9\-]+"
                        }

                        keep_empty_captures => false
                        named_captures_only => true
                        match => { "message" => "^ts:\s*%{TIMESTAMP:ts}%{DELIM}logLevel:\s*%{WORD:log-level}%{DELIM}appId:\s*%{STRING:app-id}%{DELIM}thread:\s*(%{INT:thread-id})?%{DELIM}SID:\s*(%{GUID:session-id})?%{DELIM}TN:\s*(%{GUID:transaction-id})?%{DELIM}clientIp:\s*(%{STRING:client-ip})?%{DELIM}userId:\s*%{USERNAME:user-id}%{DELIM}apiType:\s*%{STRING:api-type}%{DELIM}api:\s*(%{STRING:api-url})?%{DELIM}platform:\s*(%{WORD:platform})?%{DELIM}eventType:\s*%{WORD:event-type}%{GREEDYDATA:additional-data}" }
                }

                mutate {
                        strip => ["app-id", "client-ip", "api-type", "api-url"]
                }

                if [event-type] == "API_CALL" {
                        grok {
                                pattern_definitions => {
                                        "STRING" => "[^|]+"
                                }

                                keep_empty_captures => false
                                named_captures_only => true
                                match => { "additional-data" => "message:\s*%{STRING:ms-url}\|?(%{STRING:ms-result-code})?\|?(%{STRING:ms-result})?\|?(%{INT:execution-time})?" }
                        }

                        mutate {
                                strip => ["ms-url", "ms-result-code", "ms-result"]
                        }
                }
                else if [event-type] == "NONE" {
                        grok {
                                keep_empty_captures => false
                                named_captures_only => true
                                match => { "additional-data" => "message:\s*%{GREEDYDATA:log-message}" }
                        }

                        mutate {
                                strip => ["log-message"]
                        }
                }
                else if [event-type] == "APIEND" {
                        grok {
                                pattern_definitions => {
                                        "DELIM" => "\s*\|\s*|"
                                        "STRING" => "[^|]+"
                                }

                                keep_empty_captures => false
                                named_captures_only => true
                                match => {
                                        "additional-data" => [
                                                "RC:\s*%{STRING:ms-result}%{DELIM}executionTime\(ms\):\s*%{INT:execution-time}",
                                                "RC:\s*%{STRING:ms-result}%{DELIM}qs:\s*%{STRING:api-query-string}%{DELIM}executionTime\(ms\):\s*%{INT:execution-time}"
                                        ]
                                }
                        }

                        mutate {
                                strip => ["ms-result", "api-query-string"]
                        }
                }

                if [ms-url] and [ms-url] != "" {
                        grok { match => { "ms-url" => [ "[htpsHTPS]+://(?<ms-host>[a-zA-Z0-9_\-\.]+)" ] } }
                }

                if [api-url] and [api-url] != "" {
                        mutate { strip => ["api-url"] }

                        grok {
                                tag_on_failure => ["_apiparsefailure"]
                                match => {
                                        "api-url" => [ "%{WORD:http-verb}\s*/%{INT:api-tenant}/(?<api-version>[\d\.]+)/[a-zA-Z]+/[a-zA-Z]+/%{WORD:api-platform}/(?<api-brand>[a-zA-Z0-9]+)/(?<api-nanoservice>[a-zA-Z/]+)" ]
                                }
                        }

                        mutate { gsub => [ "api-nanoservice", "/$", "" ] }
                        mutate { uppercase => [ "http-verb", "api-platform", "api-brand", "api-nanoservice" ] }
                }

                if [api-nanoservice] =~ /(?i)USER\/DEVICES\// {
                        mutate { replace => { "api-nanoservice" => "USER/DEVICES" } }
                }
                else if [api-nanoservice] =~ /(?i)Page\// {
                        mutate { replace => { "api-nanoservice" => "PAGE" } }
                }

                date {
                        match => ["ts", "yyyy-MM-dd HH:mm:ss.SSS"]
                        timezone => "CET"
                        target => "@timestamp"
                        remove_field => [ "ts" ]
                }

                fingerprint {
                        concatenate_sources => true
                        source => ["message","agent.hostname"]
                        target => "[@metadata][fingerprint]"
                        method => "SHA1"
                        key => "deduplication-key"
                }


                mutate {
                        add_field => { "[@metadata][indexbasename]" => "agl-api" }
                }

                if [client-ip] and [client-ip] == "" {
                        mutate {
                                remove_field => [ "client-ip" ]
                        }
                }

                if [execution-time] and ([execution-time] == "" or [execution-time] == "-") {
                        mutate {
                                remove_field => [ "execution-time" ]
                        }
                }

                if [thread-id] and [thread-id] == "" {
                        mutate {
                                remove_field => [ "thread-id" ]
                        }
                }

                if [ms-result] and [ms-result] == "" {
                        mutate {
                                remove_field => [ "ms-result" ]
                        }
                }

                if [session-id] == "undefined" {
                        mutate {
                                remove_field => [ "session-id" ]
                        }
                }

                if [transaction-id] == "undefined" {
                        mutate {
                                remove_field => [ "transaction-id" ]
                        }
                }

                if [platform] == "undefined" {
                        mutate {
                                remove_field => [ "transaction-id" ]
                        }
                }

                if [api-url] == "undefined" {
                        mutate {
                                remove_field => [ "transaction-id" ]
                        }
                }

                if [client-ip] == "undefined" {
                        mutate {
                                remove_field => [ "client-ip" ]
                        }
                }

                if [additional-data] {
                        mutate {
                                remove_field => [ "additional-data" ]
                        }
                }
        }
}

output {
        if "api-log" in [tags] or "access-log" in [tags] or "tls-proxy" in [tags] {
                elasticsearch {
                        hosts => ["pdbs301.grn.prd.itv.local:9200","pdbs302.grn.prd.itv.local:9200","pdbs303.grn.prd.itv.local:9200","pdbs304.grn.prd.itv.local:9200","pdbs305.grn.prd.itv.local:9200","pdbs306.grn.prd.itv.local:9200"]
                        document_id => "%{[@metadata][fingerprint]}"
                        index => "%{[@metadata][indexbasename]}-%{[@metadata][version]}-%{+YYYY.MM.dd}-00001"
                        sniffing => false
                }
        }

}

filebeat.yml:

filebeat.inputs:
- type: log
  enabled: true

  paths:
    - /product/AGL/agl-core/logs/agl.log

  exclude_files: ['\.gz$']

  multiline.pattern: '^ts:'
  multiline.negate: true
  multiline.match: after

  tags: ["avs6", "api-log", "apigateway", "rt"]

  ignore_older: 48h
  close_inactive: 5m
  close_removed: true
  clean_removed: true
  scan_frequency: 10s
  harvester_limit: 4444

filebeat.config.modules:
  enabled: false
  # path: ${path.config}/modules.d/*.yml
  # reload.enabled: false

processors:
  - drop_fields:
      fields: ["host"]

fields:
  environment: production
  # environment: docker

queue.mem:
  events: 4096
  #flush.min_events: 1024
  #flush.timeout: 10s

output.logstash:
  enabled: true
  hosts: ["papps1479.ora.prd.itv.local:5044"]
  #hosts: ["papps1479.ora.prd.itv.local:5044","papps1480.ora.prd.itv.local:5044","papps1481.ora.prd.itv.local:5044"]
  loadbalance: true
  timeout: 1m
  #bulk_max_size: 2048
  slow_start: true

logging:
  level: info
  metrics:
    enabled: false

Index template:

{
  "order": 0,
  "index_patterns": [
    "agl-api-*"
  ],
  "settings": {
    "index": {
      "lifecycle": {
        "name": "AVS-logs",
        "rollover_alias": "agl-*"
      },
      "refresh_interval": "10s",
      "number_of_shards": "5",
      "number_of_replicas": "1"
    }
  },
  "aliases": {
    "agl": {}
  },
  "mappings": {
    "_doc": {
      "_source": {},
      "_meta": {},
      "dynamic_templates": [
        {
          "message_field": {
            "path_match": "message",
            "mapping": {
              "norms": false,
              "type": "text"
            },
            "match_mapping_type": "string"
          }
        },
        {
          "string_fields": {
            "mapping": {
              "norms": false,
              "fields": {
                "keyword": {
                  "ignore_above": 256,
                  "type": "keyword"
                }
              },
              "type": "text"
            },
            "match_mapping_type": "string",
            "match": "*"
          }
        }
      ],
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "@version": {
          "norms": false,
          "type": "keyword"
        },
        "additional-data": {
          "norms": false,
          "type": "text"
        },
        "agent": {
          "dynamic": true,
          "type": "object",
          "enabled": true,
          "properties": {
            "hostname": {
              "norms": false,
              "type": "keyword"
            },
            "name": {
              "norms": false,
              "type": "text"
            },
            "version": {
              "norms": false,
              "type": "text"
            }
          }
        },
        "api-brand": {
          "norms": false,
          "type": "keyword"
        },
        "api-nanoservice": {
          "norms": false,
          "type": "keyword"
        },
        "api-platform": {
          "norms": false,
          "type": "keyword"
        },
        "api-query-string": {
          "norms": false,
          "type": "text"
        },
        "api-tenant": {
          "norms": false,
          "type": "keyword"
        },
        "api-type": {
          "norms": false,
          "type": "keyword"
        },
        "api-url": {
          "norms": false,
          "type": "text"
        },
        "api-version": {
          "norms": false,
          "type": "keyword"
        },
        "app-id": {
          "norms": false,
          "type": "keyword"
        },
        "client-ip": {
          "type": "ip"
        },
        "event-type": {
          "norms": false,
          "type": "keyword"
        },
        "execution-time": {
          "type": "integer"
        },
        "fields": {
          "type": "object",
          "properties": {
            "environment": {
              "norms": false,
              "type": "keyword"
            }
          }
        },
        "host": {
          "norms": false,
          "type": "keyword"
        },
        "http-verb": {
          "norms": false,
          "type": "keyword"
        },
        "log-level": {
          "norms": false,
          "type": "keyword"
        },
        "log-message": {
          "norms": false,
          "type": "text"
        },
        "message": {
          "norms": false,
          "type": "text"
        },
        "ms-error-message": {
          "norms": false,
          "type": "text"
        },
        "ms-host": {
          "norms": false,
          "type": "keyword"
        },
        "ms-request-body": {
          "norms": false,
          "type": "text"
        },
        "ms-result": {
          "norms": false,
          "type": "keyword"
        },
        "ms-result-code": {
          "norms": false,
          "type": "text"
        },
        "ms-url": {
          "norms": false,
          "type": "text"
        },
        "offset": {
          "type": "long"
        },
        "platform": {
          "norms": false,
          "type": "keyword"
        },
        "prospector": {
          "type": "object",
          "properties": {
            "type": {
              "norms": false,
              "type": "text"
            }
          }
        },
        "session-id": {
          "norms": false,
          "type": "text"
        },
        "source": {
          "norms": false,
          "type": "text"
        },
        "tags": {
          "norms": false,
          "type": "text"
        },
        "thread-id": {
          "type": "integer"
        },
        "transaction-id": {
          "norms": false,
          "type": "text"
        },
        "user-id": {
          "norms": false,
          "type": "text"
        },
        "username": {
          "norms": false,
          "type": "text"
        }
      }
    }
  }
}

That is weird. Can you search for the _id in that single index and ensure that it returns two documents? Can you also add the explain: true parameter to that search and paste the full response?

Thanks!

Uhm.... Now you are asking me to do a search in that creepy dev tools window with all the mumbo jumbo, right?

Hope I'm doing the right thing.

I entered:

GET /agl-api-7.9.1-2020.09.17-00001/_search?explain=true&q=_id:e3dc4ecca17cd580cea813db38f509e3b1a07ca2

Strange enough I got just 1 hit.... Where the **** does Kibana find 2? And not just Kibana, Grafana as well?

	e3dc4ecca17cd580cea813db38f509e3b1a07ca2
	agl-api-7.9.1-2020.09.17-00001
	
	{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_shard" : "[agl-api-7.9.1-2020.09.17-00001][1]",
        "_node" : "qSsrrcisQXyULhf2oVrLgg",
        "_index" : "agl-api-7.9.1-2020.09.17-00001",
        "_type" : "_doc",
        "_id" : "e3dc4ecca17cd580cea813db38f509e3b1a07ca2",
        "_score" : 1.0,
        "_source" : {
          "tags" : [
            "avs6",
            "api-log",
            "apigateway",
            "rt",
            "beats_input_codec_plain_applied"
          ],
          "ms-result" : "OK",
          "transaction-id" : "876892d1-f462-0e2a-0372-82b4a9a2d6b3",
          "api-type" : "NANO",
          "ms-host" : "avs-npvr-ms-agl-service",
          "api-brand" : "KPN",
          "client-ip" : "****",
          "user-id" : "****",
          "session-id" : "b9393c24-bd43-e27a-2f50-f7c197eebb9b",
          "event-type" : "API_CALL",
          "input" : {
            "type" : "log"
          },
          "ecs" : {
            "version" : "1.5.0"
          },
          "execution-time" : "100",
          "platform" : "stb",
          "fields" : {
            "environment" : "production"
          },
          "api-url" : "GET /100/1.2.0/A/nld/stb/kpn/TRAY/USER/RECORDING/EVENT",
          "message" : "ts: 2020-09-17 18:46:48.772 | logLevel: INFO | appId: AGL | thread: 128906 | SID: b9393c24-bd43-e27a-2f50-f7c197eebb9b | TN: 876892d1-f462-0e2a-0372-82b4a9a2d6b3 | clientIp: **** | userId: **** | apiType: NANO | api: GET /100/1.2.0/A/nld/stb/kpn/TRAY/USER/RECORDING/EVENT | platform: stb | eventType: API_CALL | message:  http://avs-npvr-ms-agl-service:8080/avsbe-npvr-ms/besc?channel=stb&lang=nld&recordingType=ALL&status=finished&isAscOrderEnabled=Y&startIndex=0&noOfRecordings=1000&action=GetRecordingList|20X|OK|100",
          "thread-id" : "128906",
          "api-platform" : "STB",
          "api-nanoservice" : "TRAY/USER/RECORDING/EVENT",
          "ms-url" : "http://avs-npvr-ms-agl-service:8080/avsbe-npvr-ms/besc?channel=stb&lang=nld&recordingType=ALL&status=finished&isAscOrderEnabled=Y&startIndex=0&noOfRecordings=1000&action=GetRecordingList",
          "app-id" : "AGL",
          "agent" : {
            "id" : "17a147f3-9c3f-4bc9-a21d-3cabfbd55c8f",
            "name" : "papps1639.prdl.itv.local",
            "version" : "7.9.1",
            "type" : "filebeat",
            "ephemeral_id" : "c36f6706-abd0-43c1-997c-f9a6858c04f7",
            "hostname" : "papps1639.prdl.itv.local"
          },
          "log" : {
            "file" : {
              "path" : "/product/AGL/agl-core/logs/agl.log"
            },
            "offset" : 2116966192
          },
          "@timestamp" : "2020-09-17T16:46:48.772Z",
          "ms-result-code" : "20X",
          "http-verb" : "GET",
          "api-tenant" : "100",
          "@version" : "1",
          "api-version" : "1.2.0",
          "log-level" : "INFO"
        },
        "_explanation" : {
          "value" : 1.0,
          "description" : "ConstantScore(_id:[7b 77 5c e1 e7 1c 6b 5e dc 77 9f 34 71 e6 bc d7 77 5b df c7 f9 d3 d7 b7 6f 56 b4 ed c6 b6])",
          "details" : [ ]
        }
      }
    ]
  }
}

OK,

Found out that 2 indices are crated for some reason.
For example, today I see agl-api-7.9.1-2020.09.18 and agl-api-7.9.1-2020.09.18-00001

Bot give 1 hit as result.

Now see why those 2 are created?

Hi

I am not sure if you found out why two indexes are created. Couple of points to be noted here:

  • You are seeing single hit in the dev tools result because you are referencing single specific index name, i.e. "GET /agl-api-7.9.1-2020.09.17-00001/". If you run the same query like following you'd see the second index returned in the result: "GET /agl-api-7.9.1-2020.09.17*"

  • I assume you are using a single ingest pipeline (via Logstash) and not the default ingest pipeline of Elasticsearch for Filebeat.

  • I think two indexes are being created (with almost same naming) because the Output section of your Logstash pipeline contains the index naming convention which could be in conflict with the index lifecycle settings in the template. As a general rule, you should not add the "-00001" at end of your index naming if you are using index lifecycle because that's a prerogative of the lifecyle controller itself. (Index lifecycle & rollover settings are itself a tricky thing but I am not delving into that for now.) Try remove the "-00001" from the Output section and then see if it creates single index or dual.

Hope it helps!

I assume that agl-api-7.9.1-2020.09.17 should be an alias, but somehow became an index...

Ayad's analysis looks good to me!

Thnx for the reply.

Indeed,changing the output of logstash changed it. Still don't know why the old one was there as well though.

I am trying to setup lifecycle management and read somewhere my index should start with 000001 so ElasticSearch LCM could raise that number upon rollover.

That LCM is a very, very, very (did I say very?) difficult monster to catch.

Hey,

indeed ILM has some quirks, we are currently working on. Another concept named datastreams https://www.elastic.co/guide/en/elasticsearch/reference/7.9/data-streams.html will come in handy there, is you do not need to do the initialization you mentioned above anymore.

If you have the time, I'd be curious, what your biggest pain points with ILM are, which might to help improve us the handling around ILM.

Thanks a ton for the feedback so far!

--Alex

Well,

That is a bit difficult to say since at the moment my entire ELK stack is driving me totally crazy,

Had it successfully running for over 2 years on version 6.2 and 6.3 and later on upgraded to 6.7 and 6.8. Hover when load started to increase I figured is was time to rebuild and upscale the platform to be able to handle bigger loads. So I build a new stack based on 7.9.1 currently and it...just...does...not...work... whatever I do, the platform behaves irrational.

So for now I just switched off ILM to elimante one confusing thing.

OK,Back to ILM, that was what you question was:
a few remarks:

  • I can configure an index alias and an index rollover alias. These 2 are not clear. It i also unneccasary. have the user define an alias for the index to use in queries. Do the ILM part under the hood. Just connect to the template any tme and let the user just be able to chose the stages and measures. No stages and measures? Fine then you still have ILM connected,however it does nothing. If measures are defined, create a rollover alias and numbering under the hood. It should not matter how I define my indices. Your ILM should just rotate it, no matter how I designed it to be called and filled. I wanted to have 1 index (agl-api) and have the Ilm cut it in max 50GB pieces and remove the peaces that are 14 days old. And I just can't get it to work. So I have the indices now created with a date in it to at least have 1 index a day (They typically grow to 300 or 400 GB, hence the need to cut them in pieces). But I cannot create an alias with the date since now variables are allowed in the lcm alias name I read somewhere. So my biggest point is: it is not clear and it seems cluttered allover the place. This latter thing might be my mistake, but then this adds to the first one :wink:

I will have a look at that datastreamsthingy. Thnx for the suggestion. However,my first job is now to find out why my logstashesconstantly lose and restore connections to the db's and the db's even lose and restore connections on the transportlevel. As said in the beginning, I'm driving nuts.

Grtx

Hey Tuckson

I am glad to know that it helped. :slight_smile:

Regarding your frustration with Elastic stack, I know how it feels. Been there, done that. If you want to deal with Elastic stack you have to understand one primary principle. Engineers approach Elastic stack taking it as a Physics problem. And this is why/where they fail to grapple with it. They need to approach Elastic as a Chemistry problem. Every rule in Elastic has a Chemistry of its own - like balancing the equations, and knowing how two materials interact with other. So what works fine with less data and single node, blows out of hands when you scale up to tons of data with multiple nodes. But there is nothing that cannot be worked around in Elastic.

As for your struggles with ILM, totally felt it. I struggled with it for several days and multiple tries before I figured it out. And then it works like clockwork. If you can list down your requirements of indexing, and lifecycle requirements, in cleanly listed bullet points then probably I can walk you through the steps to succesfully implement ILM :slight_smile:

Cheers!

Hi,

Just another note. I went the datastreamroad, and though I still do not see exactly why things work, they work. ILm seems to work more or less (not exactly, because I set the rollover limit to 40GB and yet I see indices from 80GB) but I am already happy I am getting rid of those 1TB indices. Also this seems to have positive impact on the performance issues I mentioned in another topic.

I still am puzzled a bit because logstash complains about double documents sometimes since the output plugin needs create as action and not 'index' (I do not know where that 'double docuemntscome from),, but the vast majority of docs are appearing in the datastream now.

Another positive fact is that I had to recreate the template so that is a 7.* template now instead of a legacy one.

Still have tons of questions, but we are getting there, one step at the time.