Importing tweets CSV date parsing

Hi all,

Trying to import tweet data from Twitter's recently released archives and running into issues with date fields. The goal is to convert the tweet_time field from the CSV as @timestamp to be used as time filter field. The field gets loaded as text/string instead of date. The account_creation_date is recognised correctly.

Data source:
https://about.twitter.com/en_us/values/elections-integrity.html#data
("Tweet information" link under Internet Research Agency)

Sample values & format from tweet_date column:
2014-07-24 19:20
2014-11-22 15:28
2017-03-13 22:08

For reference, account_creation_date field is formatted as follows (and is recognised as date, even without me explicitly defining its format):
2012-02-25
2014-04-29
2013-12-07

logstash config:

input {
      file {
          path => "/Users/gid/Downloads/test.csv" # 24-row test sample
          start_position => "beginning"
          sincedb_path => "/dev/null"
      }
}

filter {
    csv {
        separator => ","

        columns => ['tweetid', 'userid', 'user_display_name', 'user_screen_name',
       'user_reported_location', 'user_profile_description',
       'user_profile_url', 'follower_count', 'following_count',
       'account_creation_date', 'account_language', 'tweet_language',
       'tweet_text', 'tweet_time', 'tweet_client_name', 'in_reply_to_tweetid',
       'in_reply_to_userid', 'quoted_tweet_tweetid', 'is_retweet',
       'retweet_userid', 'retweet_tweetid', 'latitude', 'longitude',
       'quote_count', 'reply_count', 'like_count', 'retweet_count', 'hashtags',
       'urls', 'user_mentions', 'poll_choices']

        id => "tweetid"

        convert => {
            "follower_count" => "integer"
            "following_count" => "integer"
            "is_retweet" => "boolean"
            "quote_count" => "integer"
            "reply_count" => "integer"
            "like_count" => "integer"
            "retweet_count" => "integer"
            #"account_creation_date" => "date"
            "tweet_time" => "date_time"
        }
    }

    date {
        match => ["tweet_time", "yyyy/MM/dd HH:mm"]
        target => "tweet_time"
    }
}

output {
    elasticsearch {
        hosts => "localhost"
        index => "iratweetsv1"
        document_type => "tweet"
    }
}

How do I configure this correctly? I've tried many different setups but have not been able to parse these dates correctly. According to the mapping, this field is text:

"tweet_time": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }

As opposed to account_creation_date:

"account_creation_date": {
          "type": "date"
        }

EDIT: Running ES, Kibana and logstash version 6.5.3

Where did u get the mapping? Did you create the mapping for your index which specifies tweet_time to be a text field?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.