Regarding big dictionary (2GB) in filter translate plug-in

leeyu · June 25, 2019, 7:38pm

Hi all,

Is a 2GB dictionary considered too big for the filter translate plug-in ?

Is there any alternative option (such as access the dictionary from elastic search)?

Thanks.

Badger · June 25, 2019, 8:15pm

Is it too big? That depends. I run logstash with a 512 MB heap, it would definitely be too big for that. If I had a 2 TB heap (or even a 8 GB heap) then it might not be too big. Does it have two 1 GB entries or 2 billion single bit entries? That could also determine whether it is "too big".

There are other filters that might provide this functionality such as elasticsearch, jdbc_streaming, http, or memcached.

leeyu · June 25, 2019, 9:03pm

Hi Badger,

Thanks for the information. I will try to increase heap and see what happens.

Is there any material that I can look at regarding using elasticsearch as dictionary?

leeyu · June 26, 2019, 11:04pm

Hi Badger,

I tried to use elasticsearch plug-in in filter plug-in.

Here is what I got

input{
  file{
      path => "/data/threat_event/data/2018-01-01/all_1204346130001631.csv"
      start_position => "beginning"
      sincedb_path => "/dev/null"
      max_open_files => 65535
  }
}

filter{

    csv {
      autodetect_column_names => "true"
      autogenerate_column_names => "true"
      skip_header => "true"
      separator => ","
    }

    elasticsearch{
        hosts => ["localhost:9200"]
        index => "severity-mapping"
        query => "number.keyword:%{[severity]}"
        result_size => 1
        fields => {"level" => "severity_level"}
    }
}

output
{
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "es_dictionary_test"
  }
}

It creates field but the field contains nothing.

We also tried many combinations of the query part.

Any idea? Thanks

severity-mapping

number,level
1,Low
2,Medium
3,High
4,Critical

Badger · June 27, 2019, 12:30am

I do not have an elasticsearch instance running, so I cannot help. I suggest you ask a new question in the logstash forum, about how to do a simple lookup using an elasticsearch filter, and mention that the dataset is too large to use translate. I would include a sample of a couple of documents from the severity-mapping index.

Are you certain that the thing you want to translate is in [severity]?

It seems a little odd that you are using both autodetect_column_names and autogenerate_column_names. There are use cases where you need both but they are unusual.

leeyu · June 27, 2019, 12:51am

Hi Badger,

Thank for the suggestion,

We are trying to use json instead, because my friend got it work with json.

So it might be some problem to use csv this way.

I'll open a new post if anything thanks a lot.

system · July 25, 2019, 12:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Translate filter with +2.5 million dictionary entries Logstash	1	336	February 28, 2020
Translate Filter plugin for a real CSV Logstash	4	611	March 11, 2021
Huge dictionary in logstash translate filter Logstash	12	1678	December 6, 2020
ES 32kb Field Limit - Logstash Ruby Plugin help Logstash	6	1077	May 31, 2019
Need suggestion for logstash setup Logstash	2	619	July 6, 2017

Regarding big dictionary (2GB) in filter translate plug-in

Related topics