How to replace a String with another string in a field while Re-indexing an old Index

Hi,

I'm trying to Re-Index an old index in which I have to replace a String with another in a field. Here is the old index data.

From this I want to replace "Mr.Rakesh Kumar" with "RK" in the new Index.
I have tried

input {
elasticsearch {
host => [ "localhost" ]
index => "student_info_idx"
}
}

filter {
mutate {
gsub => ["patientName","Mr%","X"]
}
}

output
{
elasticsearch_http
{
host => "localhost"
index => "student_idx_new"
index_type => "student_idx_new"
}
stdout
{
codec => "json"
}
}

But Nothing has changed in the new Index. Please suggest me how to do this.

Thanks and Regards,
Sanjay Reddy

I invite you to try :

filter {
	mutate {
            gsub => ["patientName","Mr.(\w)\w* (\w)\w*","\1\2"]
    }
}
2 Likes

This works fine. Thanks @fbaligand

Is there a way to provide the field value and replace it in gsub???
like

filter {
mutate {
gsub => ["Summary","value("patientName")","\1\2"]
}
}

Here I want to replace the value of "patientName" field in "Summary" with some other String.
Is it possible. If yes, Can you please guide me through this?

You can do it using 2 mutate filters :

filter {
	mutate {
            add_field => {"Summary" => "%{patientName}"}
    }
}

filter {
	mutate {
            gsub => ["Summary","Mr.(\w)\w* (\w)\w*","\1\2"]
    }
}
1 Like

Thanks for the suggestion @fbaligand but this is not working for me :frowning:

Let me be clear from my end. I have two fields "patientName" and "Summary".
"Summary" field contains the "patientName" . I want to remove the name from Summary field. Please help

OK, to be clearer, could you give an example with input document and expected output document.

Something like this :

INPUT DOC : 
{
  Summary => "..."
  patientName => "..."
}

OUTPUT DOC : 
{
  Summary => "..."
  patientName => "..."
}

Here I'm giving you the smaple data

In this, what ever present in the "patientName" field should not appear in Summary field while reindexing. This is the requirement which I have

OK, I understand your need.
Note there is a little problem : content in "patientName" field is not exactly content contained in "Summary" field :
Mr Rakesh Kumar <=> Mr. Rakesh Kumar <=> Mr.Rakesh

Yeah that problem exist. But If there is a possibility to take the Content of the field "patientName", then we may use some regex to solve. I am stuck at the initial stage itself

OK.
So I invite you to try this logstash configuration :

    mutate {
            add_field => {"[@metadata][initials]" => "%{patientName}"}
    }

    mutate {
            gsub => ["[@metadata][initials]", "Mr.(\w)\w* (\w)\w*", "\1\2"]
    }

    mutate {
            gsub => ["Summary", "%{patientName}", "%{[@metadata][initials]}"]
    }

It works fine with this input data :

            "patientName" => "Mr.Rakesh Kumar"
            "Summary" => "Student: Mr.Rakesh Kumar is not performing well in the class. Please take care of Mr.Rakesh Kumar to get better result."

Thanks for the clue @fbaligand

Based on your suggestion I have tried this.

mutate {
split => ["Name" , " "]
add_field => ["FName", "%{[Name][0]}" ]
add_field => ["LName", "%{[Name][1]}" ]
}
mutate {
gsub => ["history", "(?i)(?:[%{LName}])", "***"]
}
mutate {
gsub => ["history", "(?i)(?:[%{FName}])", "***"]
}

Now this is taking each and every letter in to consideration. How to make it to consider the entire word?

I'm really not sure to understand the result you expect.
Can you give an example with input document (with "Name" and "history" fields) and expected output document ?

Sorry @fbaligand my bad, It's not "history", it is "summary".

The data in the field "StudentName" is "Mr. Rakesh Kumar". I want to replace whatever is present in the "StudentName" with "***" in the entire index. So, I have taken "Summary" field for testing.

As the Name in the "StudentName" is not same in the remaining fields in the index (might be "Rakesh" OR "Mr Rakesh OR "rakesh kumar" not sure), I have divided the "StudentName" field to "FName" and "LName".

So, I thought if I apply CaseInsensitivity for "FName" and "LName" and replace with "***" using gsub, task will be done.

Lets say "summary" = "rakesh is a good student. Rakesh kumar have to undergo some training. Mr rakesh Kumar will be provided a certificate."

After applying the filters, the final result should be
"summary" = "*** is a good student. *** *** have to undergo some training. Mr *** *** will be provided a certificate."

Hope you got the point. Please let me know if I should explain more!!!

OK, I understand now :slight_smile:

With StudentName="Mr. Rakesh Kumar", here's the right logstash configuration :

    grok {
      match => { "StudentName" => "%{NOTSPACE:Courtesy} %{NOTSPACE:FName} %{NOTSPACE:LName}" }
    }

    mutate {
      gsub => ["summary", "(?i:%{FName})", "***"]
    }
    mutate {
      gsub => ["summary", "(?i:%{LName})", "***"]
    }
1 Like

@fbaligand

Thank you sooo much :slight_smile: It worked!!!
I have been trying for this from quite long time.
Thanks a ton :slight_smile:

You're welcome :slight_smile:

@fbaligand

Need one more help.
Can you please look into this issue. I'm not able to index this.

Please help me