Hi all,
I have some data that I want to upload via logstash.
The data is about 100MB a day and stored locally.
I want to do the following:
-
Upload the old data from the past 3 years (about 100GB).
-
Upload the new data to elasticsearch daily. (One index per day)
Any suggestion on how to tackle down these?
Thanks.
----------------updated: 2019/06/19----------------
Thanks to Kiran, I think I got the daily update part.
Let me be a bit more specific about the old data.
Here is the logstash config file we have right now.
(the repetitive part in the filter is removed)
input{
file{
path => "/data/threat_event/201801/*/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
max_open_files => 65535
}
}
filter{
dissect {
mapping => {
"path" => "/%{}/%{attack}/%{month}/%{}/%{type}_%{}"
}
add_tag => ["%{type}"]
}
if "all" in [tags] {
csv {
autodetect_column_names => "true"
autogenerate_column_names => "true"
skip_header => "true"
separator => ","
translate {
iterate_on => "defender_id"
field => "defender_id"
destination => "defender_id_name"
override => false
dictionary_path => ["/data/mapping-table/defender-mapping.csv"]
}
date {
locale => "en"
match => [ "timestamp", "EEE dd MMM yyyy HH:mm:ss z"]
}
}
output
{
elasticsearch {
hosts => ["localhost:9200"]
index => "cv_%{[attack]}_%{[type]}_%{[month]}"
}
}
The index template is like this.
PUT /_template/template_all
{
"index_patterns": ["cv_threat_event_all_*"],
"settings": {
"number_of_shards": 1
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"event_count": {"type": "long"}
},
"date_detection": true,
"dynamic_date_formats": ["EEE dd MMM yyyy HH:mm:ss z"]
}
}
Since creating one index per day makes too many shards, we decide to go with one index per month.
I know the code might be choppy but we are still figuring things out.
Feel free to make any comment or suggestion.
Thank you.