Memcached in logstash

I have huge file that I am using in logstash with translate filter. But I think it is causing some issue as it misses some match for some event.

By looking around I discover that memcached may be answer to that.

but I can't work out that filter.

   memcached {
        hosts=> ["mem_host1"]
        get => {
            "%{project}" => "[site]"
        }
        add_tag => ["from_cache"]
    }

I have memcached running on mem_host1. and I can see key exist
but logstash is not producing field called site. project field is also exist on output of logstash

what am I doing wrong?

Can you share a sample of your data and the rest of your configuration pipeline?

Also share the output that logstash is generating.

By the way I am following your blog post for this

[logstash: using the memcached filter | @leandrojmp]
(logstash: using the memcached filter | @leandrojmp)

here is input, I am trying to replace translate filter.

   # Add sitefrom translate filter. -- this is working
   translate {
      source => "[project]"
      target => "[site_translate]"
      dictionary_path => "/s1/logstash/csv_files/project_center.csv"
      fallback => "na"
      refresh_interval => 36000
   }

   mutate { add_field => { "site" => "sachin........................." } }

   # use memcached filter -- this is not working
   memcached {
        hosts=> ["10.29.249.111"]
        namespace => "convert_mm"
        get => {
            "%{project}" => "[site1]"
        }
        add_tag => ["from_cache"]
    }

here is output

{
            
    "site_translate" => "crawley",
               "job" => 13495137,
         "project" => "3dsymra"
              "site" => "sachin........................."
}

here is memcached result from memcached server. I preloaded key:value using python.

# cat memcached_get.py
#!/usr/bin/python3

import memcache
mc_client = memcache.Client(['10.29.249.111:11211'])

# Retrieve the value for the key 
value = mc_client.get('3dsymra')

# Print the value
print(value)

Execute code gives me value

# ./memcached_get.py
crawley

Oh, you are using a namespace in the memcached filter, but in your python example your keys do not have a namespace.

When you use a namespace, the filter will look for a key named namespace:key as explained in the documentation.

So your memcached filter is looking for a key named convert_mm:3dsymra, but it seems that the key name in your memcached is just 3dsymra.

Can you remove the namespace option from your filter and test again?

1 Like

yes it works.

question I use namespace if I am loading more then few other key:value pair

like project - site
project - user
user - address right?

then I can use three different namespace on same memcached server to pull info correct?

Yeah, you can use the same memcached server and different datasets if you use namespace.

Basically you need to prefix all your keys with some name for your namespace, then you can use this same name in your memcached.

excellent this is final version. I will do large scale testing now.

python3 to load in to memcache

with open ('/home/sachin/project_center.csv','rb') as f:
   for line in f:
      # use decod as it is string, 
      # then split to key-value pair and insert in to memcached
      line = line.decode()
      fields = line.split(',')
      (project,site) = fields
      project = "proj_site_"+project
      my_dict[project] = site
mc_client.set_multi(my_dict)

this is to retrive in logstash

   memcached {
        hosts=> ["10.29.249.111"]
        get => {
            "proj_site_%{project}" => "[site]"
        }
   }

problem with this is that if memcached server is down then logstash fails.

If I'm not wrong, Logstash won't start if any configuration has a memcached filter and the memcached server is down.

But if the memcached server goes down after logstash already started, then Logstash will keep running, but the memcached will stop working.

You are not wrong :smile: If it cannot get a connection to the memcached server then the plugin's register function will raise a RuntimeError, and that prevents the pipeline starting.

If the pipeline loses the connection it will keep trying to reconnect and just tag the failure on the event if it cannot. I would expect the reconnect attempts to significantly reduce throughput.

what about if I have multiple memcached server and place that in hosts. what if one fails. will it go to second one?

My reading of the documentation is that it will.

:failover - if a server is down, look for and store values on another server in the ring. Default: true

It is on by default and the plugin does not change the default.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.