How to add country name field based on mobile number using Mobile_Number using logstash

Hi,

I am having logs in txt file contains fields below. I want to extract country code from mobile number and create (translate) into new field called country.

I checked with translate filter but not getting how to extract country code digits from mobile number field to use it on translate filter.

20 is the country code for Egypt,971 is the country code for bahrain.I want to do it for all country code.

Pls, suggest to me if it's doable for at least one country code so that it can translate for others and as all country codes are unique.

log-

Mobile_Number: 201123123123User_id<....>
Mobile_Number: 971123123123User_id<....>

logstash filter-

filter{
 grok {
        break_on_match => false
        match => {
            "message" => [
               "Mobile_Number: (?<Mobile_Number>[0-9]+)",
               "User_id(?<User_id>[0-9]+)"
                ]
           }}}

Thank you in Advance.

See here. You would need to rewrite libphonenumber in Ruby, and then use a translate filter to map the country code to a country name.

Hi @Badger,

Thanks for your suggestion.

I took a different approach, using python script (phone number library) and enriching country code and country name fields to the index, as I am not well versed in ruby.

py script

def makeConnection():
    elastic = Elasticsearch([{'host': es_host, 'port': es_port}], http_auth=(
        es_uname, es_pwd), scheme="https", verify_certs=False)
    return elastic

elastic = makeConnection()
def enrich_data():
    get_data = elastic.search(index="type1", scroll='30m',body={
        "size":10000,
        "sort":{"event_timestamp":"asc"},
        "query": {
            "bool": {
                "must_not": [
                    {
                        "exists": {
                            "field": "Country_Name"
                        }
                    }
                ]
            }
        }
    }
    )
    scroll = elastic.scroll(scroll_id=get_data['_scroll_id'])
    get_data['hits']['hits'] += scroll['hits']['hits']
    if get_data['hits']['total']['value'] != 0:
        for data in get_data['hits']['hits']:
            Mobile_Number = data['_source']['Mobile_Number']
            print(Mobile_Number)
            phone_number= "+"+ Mobile_Number
            phone_number_query = phonenumbers.parse(phone_number, None)
            Country_name= geocoder.description_for_number(phone_number_query, "en")
            Country_code= phone_number_query.country_code
            try:
                data['_source']['Country_Name'] = Country_name
                data['_source']['Country_Code'] = Country_code
            except:
                pass
            body_json = data['_source']
            elastic.index(index="telebu-logs-smscountry", id=data['_id'],doc_type=data['_type'], body=body_json)
            elastic.indices.refresh(index=data['_index'])
    return

if __name__ == '__main__':
     enrich_data()

Pros-

  • Country code, country name field is getting added successfully in index with 98% accuracy.

Cons-

  • enrichment process is slow as compared to docs in an index, ex- 400 docs/min is enrichment rate, total docs are around 1m docs are getting added in index every hr in the same quantity.

As a single doc is getting checked and updated at a time the speed is slow, I am checking with bulk API update of docs using es helpers and scroll API but didn't work out in the above script.

Any suggestion regarding the above will be highly helpful.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.