Regarding big dictionary (2GB) in filter translate plug-in

Hi all,

Is a 2GB dictionary considered too big for the filter translate plug-in ?

Is there any alternative option (such as access the dictionary from elastic search)?

Thanks.

Is it too big? That depends. I run logstash with a 512 MB heap, it would definitely be too big for that. If I had a 2 TB heap (or even a 8 GB heap) then it might not be too big. Does it have two 1 GB entries or 2 billion single bit entries? That could also determine whether it is "too big".

There are other filters that might provide this functionality such as elasticsearch, jdbc_streaming, http, or memcached.

1 Like

Hi Badger,

Thanks for the information. I will try to increase heap and see what happens.

Is there any material that I can look at regarding using elasticsearch as dictionary?

Hi Badger,

I tried to use elasticsearch plug-in in filter plug-in.

Here is what I got

input{
  file{
      path => "/data/threat_event/data/2018-01-01/all_1204346130001631.csv"
      start_position => "beginning"
      sincedb_path => "/dev/null"
      max_open_files => 65535
  }
}

filter{

    csv {
      autodetect_column_names => "true"
      autogenerate_column_names => "true"
      skip_header => "true"
      separator => ","
    }

    elasticsearch{
        hosts => ["localhost:9200"]
        index => "severity-mapping"
        query => "number.keyword:%{[severity]}"
        result_size => 1
        fields => {"level" => "severity_level"}
    }
}

output
{
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "es_dictionary_test"
  }
}

It creates field but the field contains nothing.

We also tried many combinations of the query part.

Any idea? Thanks

severity-mapping

number,level
1,Low
2,Medium
3,High
4,Critical

I do not have an elasticsearch instance running, so I cannot help. I suggest you ask a new question in the logstash forum, about how to do a simple lookup using an elasticsearch filter, and mention that the dataset is too large to use translate. I would include a sample of a couple of documents from the severity-mapping index.

Are you certain that the thing you want to translate is in [severity]?

It seems a little odd that you are using both autodetect_column_names and autogenerate_column_names. There are use cases where you need both but they are unusual.

1 Like

Hi Badger,

Thank for the suggestion,

We are trying to use json instead, because my friend got it work with json.

So it might be some problem to use csv this way.

I'll open a new post if anything thanks a lot.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.