I could use some help. I am using logstash with the logstash-input-mongodb plugin and the logstash-output-elasticsearch plugin.
Problems: Sending documents to elasticsearch takes a "type" and an "id" in the URI. Ex: PUT /<host>/<index>/<type>/<uri>. The configuration without a filter will always report the "type" = "logs", and will generate a new "id". This is problematic because restarting logstash will send everything again with new ids.
Goal: I need to parse the collection name, and the id from the input and use it as the type=collection_name and id=mongo_id.
Here is my config:
input {
mongodb {
uri => '<connectionString>'
collection => '(collection1|collection2|collection3)'
batch_size => 300
}
}
filter {
grok {
match => [
????
]
]
}
output {
elasticsearch {
hosts => ["<es-host>"]
index => "<index>"
flush_size => 50
document_type => "<COLLECTIONTOPARSE>"
document_id => "<IDTOPARSE>"
}
I think I need the grok match to parse the collection and id in mongo, but I could use some help.
When I run this without the filter, and search in ES, it shows the doc structure below. A lot has been omitted, but I've included what is necessary for parsing the collection name and mongo_id out.. but I'm not sure how to do it. Do I need a multi-line filter? How do I parse the variables and reference them in the output plugin?
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 300,
"max_score": 1,
"hits": [
{
"_index": "index-name1",
"_type": "logs",
"_id": "AV6Wy_ltr_CCklAgCDn1",
"_score": 1,
"_source": {
"host": "<some host value>",
"@version": "1",
"@timestamp": "2017-09-18T21:01:43.291Z",
"logdate": "2015-09-30T18:25:02+00:00",
"mongo_id": "IDTOPARSE",
"_class": "word.word1.word2.word3.COLLECTIONTOPARSE.subcollection",
}
},
...
]
}
}
My regex expressions that parse the COLLECTIONTOPARSE and IDTOPARSE are below, but I don't know if "match" will filter out everything but the match, or will actually create variables I can use in the output plugin?
.?+.word3.([a-zA-Z]+)..?+",
OR .?.word3.([a-zA-Z]+)..?",
"mongo_id": "([a-zA-Z]+)",