AmazonLogstashPlugin to AmazonElasticsearch documentId question

Hello Folks,
First time trying to do this logstash workflow so please bear with me, I am trying to setup the logstash dynamodb plugin (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.DynamoDBLogstash.html) to connect to an amazon hosted elasticsearch instance, however the thing I need to do on the output side is to use one of the fields in the source data set to create the document_id for elasticsearch's index. So lets say the field in the input dynamodb stream is named foo and resides in table Bar, I have tried the following:

1)bin/logstash -e 'input {
dynamodb{endpoint => "validendpoint"
streams_endpoint => "validstream"
view_type => "new_and_old_images"
table_name => "Bar"} }
output {
amazon_es {
hosts => ["somehost"]
region => "someregion"
index => "valid index"
document_id => "%message[foo]"
}
stdout { } }

The above didn't work, I also tried message[0][foo] and that didn't work either, so 2 questions: 1) is it possible to do this without a filter in between the input and output 2) how do I access the foo field and use that as the document_id for my amazon es index?

I have read many a site on this but the docs seem to be not great, any help would be much appreciated.

Ok, so I tried the following:

bin/logstash -e 'input {
dynamodb{endpoint => "endpoint"
streams_endpoint => "stream"
view_type => "new_and_old_images"
table_name => "MyTable"} }
filter {
ruby {
code => "event['computed_id']=event['fieldname']"
}
}
output {
amazon_es {
hosts => ["testhost"]
region => "myregion"
index => "myindex"
document_id => "%{computed_id}"
}
stdout { } }'

Incidentally I am using logstash version 1.5.4

I get the following ruby exception:

Ruby exception occurred: undefined local variable or method `computed_id' for #LogStash::Filters::Ruby:0x4612127f {:level=>:error}

Is event not a valid array in this case, I noticed that dynamodb sends a string called message which contains the actual json message, I am wondering whether I need to do a JSON.parse on this and then extract the value of the field I need?

Can someone share with me a successful logstash config that sets the document_id based on a value of particular field of the input coming in if you have it working, I'd just like to see some working example(s) that I can try to mimic and tweak?

Your input/help is much appreciated.
Thanks

After many hours of hacking I finally figured it out, showing the logstash config that worked for me to help others:

bin/logstash -e 'input {
dynamodb{endpoint => "someendpoint"
streams_endpoint => "somestream"
view_type => "new_and_old_images"
table_name => "SomeTable"} }
filter {
json {
source => "message"
target => "doc"
}
}
output {
amazon_es {
hosts => ["somehost"]
region => "someregion"
index => "someindex"
document_id => "%{[doc][dynamodb][keys][fieldname][S]}"
}
stdout { } }'

The field that I was interested in is buried deep within the json hierarchy in the message and it seems that the only thing needed that is critical to this whole thing is the filter with the source and target storing a local temporary variable called doc which is then parsed in the output.

Regards

1 Like

Thanks for your suggestion. I have questions:

  1. Can I set DynamoDB key as document_id?
  2. How about DELETE event, using document_id can help us clear items have been deleted in DynamoDB on ES?