Logstash clone filter relies on deprecated _type field

I just went through what appears to be a rite of passage for Logstash users and discovering that the "clone" filter won't actually do anything unless the "clones" array has at least one element.

The result is a cloned event record with a string in the _type field corresponding to each element in the clones array. This isn't very clearly documented, by the way. The docs could use an update and an example. I'll submit one shortly.

I am now left wondering what will happen when the _type field is removed in Logstash 7.0. Will the clone filter start using a @metadata field (which I would prefer, because it doesn't add content to the document sent to Elasticsearch), or a tag, or a new field, or what? Just curious.

Are you able to show the field in question?

Sure. Some values have been removed for privacy. Here is a JSON document from the cluster where the cloned data are routed.

{
  "_index": "logstash-2018.11.14",
  "_type": "vger_logs",
  "_id": "F236CE3FBA7C45FE9CF181F7A8C90B83",
  "_version": 1,
  "_score": null,
  "_source": {
    "msg": "proslic: P_TH_HVIC 2011mW VBAT_IRQ_TH 33V",
    "process": "kernel",
    "type": "vger_logs",
    "svr": "err",
    "message_size": 43,
    "rsyshost": "rsyslog-proxy-5",
    "@timestamp": "2018-11-14T14:05:16.484Z",
    "port": 38852,
    "host": <VALUE REMOVED>,
    "@version": "1",
    "tag": "kernel",
    "original_message_size": 98
  },
  "fields": {
    "@timestamp": [
      1542204316484
    ]
  },
  "sort": [
    1542204316484
  ]
}

For comparison, here's a JSON doc from an uncloned event:

{
  "_index": "logstash-2018.11.14",
  "_type": "logs",
  "_id": "646529A051614473A1FE5FBADFCD2D39",
  "_version": 1,
  "_score": null,
  "_source": {
    "msg": "proslic: P_TH_HVIC 2011mW VBAT_IRQ_TH 4V",
    "rsyshost": "rsyslog-proxy-15",
    "process": "kernel",
    "@timestamp": "2018-11-14T14:09:33.346Z",
    "port": 37334,
    "host": <VALUE REMOVED>,
    "@version": "1",
    "tag": "kernel",
    "original_message_size": 97,
    "svr": "err",
    "message_size": 42
  },
  "fields": {
    "@timestamp": [
      1542204573346
    ]
  },
  "sort": [
    1542204573346
  ]
}

And here is the relevant Logstash clone filter:

filter {
  if [@metadata][clone] != "true" {
    clone {
      clones => ["vger_logs"]
      add_field => { "[@metadata][clone]" => "true" }
      id => "cortana-relay-clone"
    }
  }
}

The version of Logstash is 6.4.2. The Elasticsearch cluster receiving the original event is 5.4.0, and the cluster receiving the clone is version 5.5.2. These are going to be upgraded to 6.4.2 in the near future but I doubt the Elasticsearch version makes much difference. On the other hand, maybe the type and _type fields are stripped with the most recent version during ingestion? I dunno.

Also I suppose if the type field gets added I could use that field's value in the clone filter in the "if" statement rather than the metadata field I add to clones, but I wasn't expecting the type field even to exist.

The clone filter has been recently modified to warn the users if the clones option is empty or not specified: https://github.com/logstash-plugins/logstash-filter-clone/pull/15/files#diff-09d6f4c683c7d2b6c256bd830fd0cabdR20

In Logstash 7.0 we can change this to raise an error instead.

When it comes to the document_type when sending to elasticsearch, this is how the es output behaves currently:

  • if document_type is set in the ES output, that value is used (it's not set by default)
  • if it's not set, then:
    • when connecting to ES < 6.0, if the event contains a type field the value of that field is used, otherwise it's set to `"doc"
    • when connecting to ES == 6.x, it's set to `"doc"
    • when connecting to ES >= 7.0, it's set to `"_doc"
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.