Write log into 2 elasticsearch in realtime

I would like to write with logstash into elasticsearch pseudonymized data in a way that it can no be linked to a single subject without the use of additional data.

Also this additional data should be kept separate ( another elasticsearch cluster) from the pseudonymized data.

An example of logs:

Full log:
{First name:John |Last name:Doe | Gender:Male | Job:Manager | Company:ABC}

Into first elasticsearch I want to write this log:
{First name:12345 |Last name:67890 | Gender:Male | Job:Manager | Company:5555}

Into second elasticsearch I want to write this log:
{12345:John|67890:Doe | 5555:ABC}

In output logstash filter it seems that I must to write the entire log and not a single portion.

You will want to use grok to restructure your messages. Once for each output.

Can you make an example? I don't understand.

You would use pipeline to pipeline communications with a forked path pattern.

In that pipeline that forks the data I would parse the field and generate the hashes. Something like

input { file { path => "/home/user/foo.txt" sincedb_path => "/dev/null" start_position => beginning } }
filter {
    mutate { gsub => [ "message", "^{", "", "message", "}$", "" ] }
    kv { field_split => "|" value_split => ":" trim_key => " " trim_value => " " remove_field => "message" }
    fingerprint { source => "First name" target => "First name Hash" method => "SHA256" }
    fingerprint { source => "Last name" target => "Last name Hash" method => "SHA256" }
    fingerprint { source => "Company" target => "Company Hash" method => "SHA256" }
}
output { pipeline { send_to => ["pipe1", "pipe2"] } }

SHA256 creates long hashes, like "fd53ef835b15485572a6e82cf470dcb41fd218ae5751ab7531c956a2a6bcd3c7". You could use something shorter, for example generating a 32-bit checksum in a ruby filter, but that increases your risk of collisions.

Then in one pipeline, you replace the fields with the hashes

input { pipeline { address => "pipe1" } }

filter {
    mutate {
        rename => {
            "First name Hash" => "First name"
            "Last name Hash" => "Last name"
            "Company Hash" => "Company"
        }
    }
}

and in the other, save the hashes and the data items they map

input { pipeline { address => "pipe2" } }

filter {
    ruby {
        code => '
            event.set(event.get("First name Hash"), event.get("First name"))
            event.set(event.get("Last name Hash"), event.get("Last name"))
            event.set(event.get("Company Hash"), event.get("Company"))
        '
    }
    mutate { remove_field => [ "First name Hash", "First name", "Last name Hash", "Last name",
                             "Company Hash", "Company", "Job", "Gender" ] }
}

For a line like

{First name:John |Last name:Doe | Gender:Male | Job:Manager | Company:ABC}

this will generate two events. One like

"fd53ef835b15485572a6e82cf470dcb41fd218ae5751ab7531c956a2a6bcd3c7" => "Doe",
"a8cfcd74832004951b4408cdb0a5dbcd8c7e52d43f7fe244bf720582e05241da" => "John",
"b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78" => "ABC",

and the other like

    "Gender" => "Male",
       "Job" => "Manager",
 "Last name" => "fd53ef835b15485572a6e82cf470dcb41fd218ae5751ab7531c956a2a6bcd3c7",
   "Company" => "b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78",
"First name" => "a8cfcd74832004951b4408cdb0a5dbcd8c7e52d43f7fe244bf720582e05241da",

Really thanks!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.