AmazonLogstashPlugin to AmazonElasticsearch documentId question

skanjila · November 13, 2015, 6:42pm

Hello Folks,
First time trying to do this logstash workflow so please bear with me, I am trying to setup the logstash dynamodb plugin (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.DynamoDBLogstash.html) to connect to an amazon hosted elasticsearch instance, however the thing I need to do on the output side is to use one of the fields in the source data set to create the document_id for elasticsearch's index. So lets say the field in the input dynamodb stream is named foo and resides in table Bar, I have tried the following:

1)bin/logstash -e 'input {
dynamodb{endpoint => "validendpoint"
streams_endpoint => "validstream"
view_type => "new_and_old_images"
table_name => "Bar"} }
output {
amazon_es {
hosts => ["somehost"]
region => "someregion"
index => "valid index"
document_id => "%message[foo]"
}
stdout { } }

The above didn't work, I also tried message[0][foo] and that didn't work either, so 2 questions: 1) is it possible to do this without a filter in between the input and output 2) how do I access the foo field and use that as the document_id for my amazon es index?

I have read many a site on this but the docs seem to be not great, any help would be much appreciated.

skanjila · November 15, 2015, 3:38pm

Ok, so I tried the following:

bin/logstash -e 'input {
dynamodb{endpoint => "endpoint"
streams_endpoint => "stream"
view_type => "new_and_old_images"
table_name => "MyTable"} }
filter {
ruby {
code => "event['computed_id']=event['fieldname']"
}
}
output {
amazon_es {
hosts => ["testhost"]
region => "myregion"
index => "myindex"
document_id => "%{computed_id}"
}
stdout { } }'

Incidentally I am using logstash version 1.5.4

I get the following ruby exception:

Ruby exception occurred: undefined local variable or method `computed_id' for #LogStash::Filters::Ruby:0x4612127f {:level=>:error}

Is event not a valid array in this case, I noticed that dynamodb sends a string called message which contains the actual json message, I am wondering whether I need to do a JSON.parse on this and then extract the value of the field I need?

Can someone share with me a successful logstash config that sets the document_id based on a value of particular field of the input coming in if you have it working, I'd just like to see some working example(s) that I can try to mimic and tweak?

Your input/help is much appreciated.
Thanks

skanjila · November 16, 2015, 4:57pm

After many hours of hacking I finally figured it out, showing the logstash config that worked for me to help others:

bin/logstash -e 'input {
dynamodb{endpoint => "someendpoint"
streams_endpoint => "somestream"
view_type => "new_and_old_images"
table_name => "SomeTable"} }
filter {
json {
source => "message"
target => "doc"
}
}
output {
amazon_es {
hosts => ["somehost"]
region => "someregion"
index => "someindex"
document_id => "%{[doc][dynamodb][keys][fieldname][S]}"
}
stdout { } }'

The field that I was interested in is buried deep within the json hierarchy in the message and it seems that the only thing needed that is critical to this whole thing is the filter with the source and target storing a local temporary variable called doc which is then parsed in the output.

Regards

huytv593 · November 28, 2016, 4:26am

Thanks for your suggestion. I have questions:

Can I set DynamoDB key as document_id?
How about DELETE event, using document_id can help us clear items have been deleted in DynamoDB on ES?

Topic		Replies	Views
How to implement document_id/doc_as_upsert/action setting with amazon_es plugin Logstash	1	274	November 3, 2020
Document_id is value Logstash	3	2314	December 27, 2020
Logstash document_id for elasticsearch not incrementing Logstash	3	784	July 6, 2017
Logstash adding duplicate rows for every run Logstash	11	14769	July 6, 2017
Unable to delete documents from index when using logstash output plugin Logstash	3	2849	July 28, 2017

AmazonLogstashPlugin to AmazonElasticsearch documentId question

Related topics