Hi there,
I'm probably having some infrastructural problem, but I sure can use some help in finding out what I need to look at and fix.
I'm trying to 'enrich' data from one index with data from another index using logstash. My indexing speed is smaller than 1 d/s, which I think could be better
I'm using logstash in a docker on a local macbook (which is probably a reason for low performance, but I reckon that even in this setup, 1 d/s is way to slow and can be better).
I'm using a cloud trial:
- 2 * AWS.data.highio.i3, 4GB
- 1 * AWS.master.R4 1GB tiebreaker
Indexing from a csv from local machine using the same logstash docker performs at around 200 documents per second. I'm fine with that
This is my pipeline, it actually works, but as mentioned: really sloooow:
input {
elasticsearch {
hosts => ["https://xxxxxxx"]
user => "xxxx"
password => "xxxx"
docinfo => true
query => '{
"query": {
"bool": {
"must": [
{ "exists": { "field": "xxxxx" }}
],
"filter": [
{ "range": {
"start_dtime": {
"gte": "2019-03-12",
"lte": "2019-03-13"
}
}
}
]
}
}
}'
}
}
filter {
mutate {
remove_field => [ "@version", "host", "message", "path" ]
add_field => {
"xxxx_date" => "unknown"
"xxxxx_status" => "unknown"
"xxxx_name" => "unknown"
}
}
elasticsearch {
hosts => ["xxxxxxxx"]
user => "xxxx"
password => "xxx"
index => "xxxxxxxx*"
query => "xxxxx: %{[xxxxx]}"
fields => {
"xxxx_date" => "xxxx_date"
"xxxx_status" => "xxxx_status"
"xxxx_name" => "xxxx_name"
}
}
}
output {
elasticsearch {
hosts => ["xxxxxxxx"]
index => "%{[@metadata][_index]}"
action => "update"
document_id => "%{[@metadata][_id]}"
user => "xxxx"
password => "xxxx"
}
}
My logstash configuration:
pipeline:
workers: 5
batch:
size: 1000
delay: 50
Index settings
"settings" : {
"index" : {
"creation_date" : "1552728048556",
"number_of_shards" : "1",
"number_of_replicas" : "1",
Maybe anyone can give me a hint where to look or tweak this so it will be running a bit faster?
Thanks!
Jeroen