I want to index tweets using logstash and elasticsearch. I am using the twitter plugin. Now the tweet that are sent contain too many useless information for my application. I would like to select only some fields and rename/make flat others and eventually divide one document in two distinct documents. For instance, let's say that this is a Twitter answer:
{
"tweet": {
"tweetId": 1025,
"tweetContent": "Hey this is a fake document for stackoverflow #stackOverflow #elasticsearch",
"hashtags": ["stackOverflow", "elasticsearch"],
"publishedAt": "2017 23 August",
"analytics": {
"likeNumber": 400,
"shareNumber": 100,
}
},
"author":{
"authorId": 819744,
"authorAt": "the_expert",
"authorName": "John Smith",
"description": "Haha it's a fake description"
}
}
Now I want to generate the following two documents:
# indexed in twitter/tweet/1025 The id for this document should be the one from tweetId `"tweetId": 1025`
{
"content": "Hey this is a fake document for stackoverflow #stackOverflow #elasticsearch", # this field has been renamed
"hashtags": ["stackOverflow", "elasticsearch"],
"date": "2017/08/23", # the date has been formated
"shareNumber": 100 # This field has been flattened
}
And the second document would be:
# Indexed in twitter/author/819744 The id for this document should be the one from authorId `"authorId": 819744 `
{
"authorAt": "the_expert",
"description": "Haha it's a fake description"
}
Is it possible? How can I do so?