Preprocess Tweets before indexation

Melvyn_Peignon · August 23, 2017, 1:17pm

I want to index tweets using logstash and elasticsearch. I am using the twitter plugin. Now the tweet that are sent contain too many useless information for my application. I would like to select only some fields and rename/make flat others and eventually divide one document in two distinct documents. For instance, let's say that this is a Twitter answer:

{
    "tweet": {
       "tweetId": 1025,
       "tweetContent": "Hey this is a fake document for stackoverflow #stackOverflow #elasticsearch",
       "hashtags": ["stackOverflow", "elasticsearch"],
       "publishedAt": "2017 23 August",
       "analytics": {
           "likeNumber": 400,
           "shareNumber": 100,
       }
    },
    "author":{
       "authorId": 819744,
       "authorAt": "the_expert",
       "authorName": "John Smith",
       "description": "Haha it's a fake description"
    }
}

Now I want to generate the following two documents:

# indexed in twitter/tweet/1025 The id for this document should be the one from tweetId `"tweetId": 1025`
{
    "content": "Hey this is a fake document for stackoverflow #stackOverflow #elasticsearch", # this field has been renamed
    "hashtags": ["stackOverflow", "elasticsearch"],
    "date": "2017/08/23", # the date has been formated
    "shareNumber": 100 # This field has been flattened
}

And the second document would be:

# Indexed in twitter/author/819744  The id for this document should be the one from authorId `"authorId": 819744 `
{
   "authorAt": "the_expert",
   "description": "Haha it's a fake description"
}

Is it possible? How can I do so?

magnusbaeck · August 24, 2017, 8:14pm

Use the clone filter to, well, clone the original event in two. Then use whatever filters you need to process each event. Wrap your filters in conditionals so that you use one set of filters on the original event and another set of filters on the clone.

system · September 21, 2017, 8:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash, how to define unique id for tweets? Logstash	2	540	September 5, 2017
Indexing twitter data since 2015 into elasticsearch Elasticsearch	8	1724	May 1, 2019
How to create new field into an elasticsearch index using logstash and python Logstash	8	4341	March 27, 2018
Reindexing Tweets changing a field type in Elasticsearch using Logstash Logstash	4	883	December 16, 2016
Indexing Tweets with logstash elasticsearch and kibana Logstash	3	812	February 15, 2017

Preprocess Tweets before indexation

Related topics