Translate filter with yml file

Honestabe · September 27, 2024, 9:01pm

I made trying to use translate filter with a yml file to add geo_point this is the yml file configuration
JP.558-0045:[135.4949, 34.6134]
`

yet when I run the pipeline with this config

 translate {
        field => "GeoPostalEnrich"  # The field to match
        destination => "GeoLocation"  # Destination field for the result
        dictionary_path => "/path/to/file"  
        fallback => "Unknown"  # What to return if no match is found
    }
}

All I get in Unknown.

I know that there there is be matches . I made sure there case sensitive. i have gotten to translate filter to work with csv file for simple stuff, but cant seam to get yml or json files to work. is my formatting off? What should the correct formatting be to get a geo_point out of it? How do i fix the no match issue?

Badger · September 27, 2024, 10:08pm

With that filter, and a dictionary that contains

JP.558-0045: [135.4949, 34.6134]

(with a space after the colon) I get

    "GeoLocation" => [
    [0] 135.4949,
    [1] 34.6134
],
"GeoPostalEnrich" => "JP.558-0045",

Honestabe · September 27, 2024, 10:21pm

i think my file was missing the spacing
had JP.558-0045:[135.4949, 34.6134] instead of JP.558-0045: [135.4949, 34.6134]
and that why it was not matching. thanks

also what is the correct format for geo_points so that elastic index it a correctly? this just gives me a number field

Honestabe · September 27, 2024, 10:30pm

seem like that worked

Badger · September 27, 2024, 10:39pm

You would need to create an index template in elasticsearch to tell it that the field should be created as a geo_point.

Honestabe · September 27, 2024, 10:40pm

I did and it worked thanks! is there a recommended dictionary file size and can I break it up into multiple files to help lower memory usage?

Badger · September 27, 2024, 11:14pm

I cannot think of a way of breaking up a dictionary that would reduce memory usage.

Honestabe · September 27, 2024, 11:32pm

if I split a file into 10 files. and had 10 translate filters that would take up the same amount of memory as just the one file with one translate filter?

Badger · September 27, 2024, 11:48pm

There will be per-filter overhead so in total it will use slightly more memory if you do that.

Honestabe · September 27, 2024, 11:58pm

thanks for the heads up

leandrojmp · September 28, 2024, 12:32am

How big is your yml dictionary? Is it impacting logstash in any way?

Honestabe · September 28, 2024, 2:18am

It is about 1,500,000 and it is causing memory usage problems.

I might have been too ambitious

leandrojmp · September 28, 2024, 2:37am

Yeah, this is too big.

The documentation mentions that the translate filter is internally tested with very large dictionaries of abou 100 k values, yours have 10 times more values than that.

One alternative may be to use the memcached filter to enrich your data, but this also can lead to some issues because of the latency added.

Probably it may work well if you use the memcached on the same machine as your logstash and connects to it using a unix socket, but only testing it to know for sure.

Honestabe · September 28, 2024, 4:59am

I tried to use an enrich pipeline but I am using 2 piplines to compbine two databases into 1 index. And the enrich pipeline does not work with

doc_as_upsert => true
action => "update"

leandrojmp · September 28, 2024, 2:45pm

You need to provide more context of what you are trying to do and what is not working.

By 2 pipelines you mean 2 different logstash pipelines? Pipelines in logstash are independent from each other, if you are trying to update the same document in both of them it will not work by design as each pipeline will try to update the documents at is own time.

Honestabe · September 30, 2024, 4:18pm

I am joining 2 separate databases using matching id fields. I was able to create an enrich policy/index/pipeline that matched postal codes to geo point so I could add geo location info to my index (for maps). both databases have some of the same info and some info that the other leaves out. this way I could see all the information about a specific id in 1 database instead of 2 to make data analysis easier. I also named the field with database_name_fieldname so that each field would be unique and the fields would not over lap or be overridden. i was under the assumption that

doc_as_upsert => true
inserted missing documents and that
action => "update"
only updated field that match and did not erases the fields that did not match. This way though the quires are run at different times they are they are only updating the doc with fields that don't already exist. and leaving the ones that are already there. is this not correct? this seems to be the way it is currently working. so I am trying to find an efficient way to add the geo location data. is it even possible to create a runtime field that uses the postal code field to add the geo point for my enrich policy as an alternative?

leandrojmp · September 30, 2024, 5:40pm

With 2 databases you mean 2 indices in Elasticsearch? There are no databases in Elasticsearch, the data is stored in indices, so I'm assuming you are talking about indices.

Also, you mentioned 2 pipelines, but you didn't share any of the configurations, it is really hard to understand what you are trying to do and how without seeing the configuration.

If possible share the logic you are using in your pipelines.

When using doc_as_upsert => true and action => "update", in combination with a custom document_id, logstash will update a document if the document id used exists, if the document id does not exist, it will then create a new document with it.

Is that what you want to do?

Not sure, it is not clear what exactly you are trying to do, but runtime fields can be very expensive to run.

Honestabe · September 30, 2024, 11:28pm

sorry I am trying to join 2 MySQL databases into 1 elastic index.
this is the basic config


input {
  jdbc {
    jdbc_driver_library => "/path/to/file"
    jdbc_driver_class => "your.jdbc.DriverClass"
    jdbc_connection_string => "jdbc:your_database_url"
    jdbc_user => "your_db_username"
    jdbc_password => "your_db_password"

    jdbc_paging_enabled => true
    jdbc_paging_mode => "explicit"
    jdbc_page_size => 150000

    tracking_column => "updated_at"
    use_column_value => true
    tracking_column_type => "timestamp"

    last_run_metadata_path => "/path/to/file"
    schedule => "1-59/2 * * * *"
    statement_filepath => "/path/to/file"
  }
}


filter  # Conditionally transform postcode2 field if CountryCode is "US"
if [country_id] == "US" {
    mutate {
      # Remove '-' and everything after it
      gsub => [
        "postcode2", "-.*", ""
 # Matches '-' and any following characters, replaces with an empty string
      ]
    }
  }

  mutate {
    add_field => {
      "GeoPostalEnrich" => "%{country_id}.%{postcode2}"
    }
  }
 mutate {
    rename => {
      "additional_data" => "database1_additional_data"
       "website" => "database1_website"
    



# Remove postcode2 field
  mutate {
    remove_field => ["postcode2"]
  }




output {
  elasticsearch {
    hosts => ["http://localhost:9200"]  
    index => "%{index_name}"            
    document_id => "%{order_id}"     
    doc_as_upsert => true               
    action => "update"                  
    manage_template => false            
    user => "your_username"             
    password => "your_password"         
    }
  stdout {
    codec => rubydebug                  
  }
}

both pipelines are the same except one pipeline does not the the postcode field.

and I changed this filter

 mutate {
    rename => {
      "additional_data" => "database2_additional_data"
       "website" => "database2_website"

this way I can tell which MySQL database it is coming from.

yes but Am i correct in assuming that this config would not allow my pipeline config to use an enrichment pipeline from elastic?

I am trying to use the "GeoPostalEnrich" field to match and add the geo location data.

Is there any other information you might need?

Topic		Replies	Views
Facing issues on translate filter Logstash	14	614	June 11, 2019
My translate plugin doesn't seem to work with a dictionary file Logstash	3	1194	July 6, 2017
Translate filter is not working with yaml Logstash	3	335	December 16, 2019
Using Regex in Yaml for Logstash Translate Filter Logstash	3	4435	February 5, 2017
Translate filter plugin Logstash	6	1070	March 26, 2019

Translate filter with yml file

Related topics