Logstash CSV mutate, split and filter

Hello,

I hope my message finds the Elastic community safe and healthy.

I am trying to import CSV files wherein I want to create a new field - tld using data from one of the columns being imported.

Source Column name: domain_name
Data will have "." as a separator: example: bbc.co.uk or simpler bbcnews.com
New field name: tld - Hence using the example tld should hold .co.uk or .com

In both cases I want to create a new field "tld" from domain_name with data after the first "." reading from left to right. Hence I wrote the following configuration but static text got added to "tld".

filter {
        csv {
                skip_header => "true"
                columns => ["num","domain_name","query_time","create_date","update_date","expiry_date","domain_registrar_id","domain_registrar_name","domai>                remove_field => ["num"]
                }
        mutate {
                add_field => { "tld1" => "%{domain_name}" }
                split => { "tld1" => "." }
                add_field => { "tld" => "%{[tld1][1]}" }
                remove_field => ["tld1"]
                }
        }

My current configuration returns the value "%{[tld1][1]}" for tld in all the entires being imported. I am not sure but is my filter being taken as a string?

I followed the example here: Mutate filter plugin | Logstash Reference [8.1] | Elastic

I am currently running version 7.17.1 of the stack.

You need to use differente mutates, you can't add a field in a mutate and do another operation with the same field in the same mutate.

There is a note in the documentation about it:

Each mutation must be in its own code block if the sequence of operations needs to be preserved.

This is your case since the operations needs to follow a sequence.

Try the following:

mutate {
    add_field => { "tld1" => "%{[domain_name]}" }
}
mutate {
    split => { "tld1" => "." }
}
mutate {
    add_field => { "tld" => "%{[tld1][1]}"  
}
mutate {
    remove_field => ["tld1"]
}
1 Like

Very sorry for that. I feel ashamed being in the TL:DR group :). I'll pay more attention henceforth :). Thank you very much.

I just answered a similar question here. A general solution to this problem is ridiculously complicated.

1 Like

Hello,

I made the changes as suggested (I am still working on how to manage TLDs having multiple .'s [like .co.uk]

Here is my current configuration:

filter {
        csv {
                skip_header => "true"
                columns => ["num","domain_name","query_time","create_date","update_date","expiry_date","domain_registrar_id","domain_registrar_name","domain_registrar_whois","domain_registrar_url","registrant_name","registrant_company","registrant_address","registrant_>                remove_field => ["num"]
            }
        mutate {
                add_field => { "tld1" => "%{[domain_name]}" }
        }
        mutate {
                split => { "tld1" => "." }
        }
        mutate {
                add_field => { "tld" => "%{[tld1][1]}"
        }
        mutate {
                remove_field => ["tld1"]
        }
}

Here is the error I get when running the configuration check:

Reason: Expected one of [ \t\r\n], "#", "=>" at line 23, column 9 (byte 1389) after filter {

[FATAL] 2022-03-31 10:51:43.603 [LogStash::Runner] runner - The given configuration is invalid. Reason: Expected one of [ \t\r\n], "#", "=>" at line 23, column 9 (byte 1389) after filter {
        csv {
                skip_header => "true"
                columns => ["num","domain_name","query_time","create_date","update_date","expiry_date","domain_registrar_id","domain_registrar_name","domain_registrar_whois","domain_registrar_url","registrant_name","registrant_company","registrant_address","registrant_city","registrant_state","registrant_zip","registrant_country","registrant_email","registrant_phone","registrant_fax","administrative_name","administrative_company","administrative_address","administrative_city","administrative_state","administrative_zip","administrative_country","administrative_email","administrative_phone","administrative_fax","technical_name","technical_company","technical_address","technical_city","technical_state","technical_zip","technical_country","technical_email","technical_phone","technical_fax","billing_name","billing_company","billing_address","billing_city","billing_state","billing_zip","billing_country","billing_email","billing_phone","billing_fax","name_server_1","name_server_2","name_server_3","name_server_4","domain_status_1","domain_status_2","domain_status_3","domain_status_4"]
                remove_field => ["num"]
                  }
        mutate {
                add_field => { "tld1" => "%{[domain_name]}" }
                }
        mutate {
                split => { "tld1" => "." }
                }
        mutate {
                add_field => { "tld" => "%{[tld1][1]}"
                }
        mutate
[FATAL] 2022-03-31 10:51:43.612 [LogStash::Runner] Logstash - Logstash stopped processing because of an error: (SystemExit) exit
org.jruby.exceptions.SystemExit: (SystemExit) exit
        at org.jruby.RubyKernel.exit(org/jruby/RubyKernel.java:747) ~[jruby-complete-9.2.20.1.jar:?]
        at org.jruby.RubyKernel.exit(org/jruby/RubyKernel.java:710) ~[jruby-complete-9.2.20.1.jar:?]
        at usr.share.logstash.lib.bootstrap.environment.<main>(/usr/share/logstash/lib/bootstrap/environment.rb:94) ~[?:?]

Thank you very much, both the solutions work perfectly. :slight_smile:

I did have it easy that I don't have subdomains as part of my dataset. :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.