Replacing values of fields by regex

Hi there,

I'm fairly new to whole ES world and currently evaluating an ELK Setup. I parsed several Logmessages via Logstash into Elasticsearch. So far so good. However some of theses logmessages contain a field with usernames which i would like to replace, to anonymise it as it is required by law after 30 days.
This what what the shortened JSON looks like:

 {
  "_index": "logstash-2017.08.15",
  "_type": "network",
  "_id": "AV9UTBKxt7g8Eg6MIG6R",
  "_version": 1,
  "_score": 2,
  "_source": {
    "offset": 1597078,
    "input_type": "log",
    "logmessage": "configured by this_is_a_username"
  },
  "fields": {
    "@timestamp": [
      1502756404000
    ]
  }
}

The crux is that I want to keep the Logs as they are for this timespan.

Since documents are immutable, I guess i have to reindex them in order to do that. I've read something about the
Pattern Replace Char Filter (https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html) and thought about creating an new index that replaces the username and reindex the old index to the new one but I haven't been able to replace anything.

This is what I tried:

PUT clearedlogstash-2017.08.15
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "this_is_a_username|this_is_another_username",
          "replacement": "ANON"
        }
      }
    }
  }
}

followed by:

POST _reindex
{
  "source": {
    "index": "logstash-2017.08.15"
  },
  "dest": {
    "index": "clearedlogstash-2017.08.15"
  }
}

Is this even the right way or is there another way to accomplish the replacment. Your help is kindly appreciated.

Regards

Found something that works for me. For everyone having a similar problem, have a look at Painless.

Okay, last edit:

You can even do this at logstash level by simply cloning the event and creating a second index for anonymous documents. In this case i can just delete the non anonymised documents.

input {

}
filter {
        clone {
                clones => ["clone"]
                add_tag => [ "ANON" ]
        }

        if "ANON" in [tags]{
                mutate {
                        gsub => [
                                "logmessage", "this_is_a_username", "ANON"
                        ]
                }
        }
}
output {
        if "ANON" in [tags]{
                elasticsearch {
                        hosts => localhost
                        index => "anonlogstash-%{+YYYY.MM.dd}"
               }
        }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.