Hi there,
I'm fairly new to whole ES world and currently evaluating an ELK Setup. I parsed several Logmessages via Logstash into Elasticsearch. So far so good. However some of theses logmessages contain a field with usernames which i would like to replace, to anonymise it as it is required by law after 30 days.
This what what the shortened JSON looks like:
{
"_index": "logstash-2017.08.15",
"_type": "network",
"_id": "AV9UTBKxt7g8Eg6MIG6R",
"_version": 1,
"_score": 2,
"_source": {
"offset": 1597078,
"input_type": "log",
"logmessage": "configured by this_is_a_username"
},
"fields": {
"@timestamp": [
1502756404000
]
}
}
The crux is that I want to keep the Logs as they are for this timespan.
Since documents are immutable, I guess i have to reindex them in order to do that. I've read something about the
Pattern Replace Char Filter (https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html) and thought about creating an new index that replaces the username and reindex the old index to the new one but I haven't been able to replace anything.
This is what I tried:
PUT clearedlogstash-2017.08.15
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "this_is_a_username|this_is_another_username",
"replacement": "ANON"
}
}
}
}
}
followed by:
POST _reindex
{
"source": {
"index": "logstash-2017.08.15"
},
"dest": {
"index": "clearedlogstash-2017.08.15"
}
}
Is this even the right way or is there another way to accomplish the replacment. Your help is kindly appreciated.
Regards