Lowercase hashtag twitter field


(Singa Forever) #1

Hello,
I'm collecting tweets from twitter using ELK 5.X, In order to handle my application, I need to :

 1- Convert all collected twitter hashtags to lowercase : 

I used mutate in logstash file

filter{
 mutate {
             lowercase => [ [entities][hashtags][text] ]
     }
}

but this is not working. ( hashtag value text is an element of the object entities.hashtags)

waiting for your help,
Thank you


(Magnus Bäck) #2

Have you tried doublequoting the field name?

         lowercase => [ "[entities][hashtags][text]" ]

If that doesn't help, show what an event looks like. Use a stdout { codec => rubydebug } output.


(Singa Forever) #3

Hi Magnus,
Thank you for your quick answer, I tried using double-quote but nothing new.
Output :

{
"extended_entities" => {
"media" => [
[0] {
"display_url" => "",
"source_user_id" => ,
"type" => "",
"media_url" => "",
"source_status_id" => ,
"url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"source_status_id_str" => "",
"media_url_https" => "",
"id" => ,
> "source_user_id_str" => ""
},
[1] {
"display_url" => "",
"source_user_id" => ,
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"source_status_id_str" => "",
"media_url_https" => "",
"id" => ,
"source_user_id_str" => ""
},
[2] {
"display_url" => "",
"source_user_id" => ,
"type" => "",
"media_url" => "",
"source_status_id" => ,
"url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"source_status_id_str" => "",
"media_url_https" => "",
"id" => ,
"source_user_id_str" => ""
},
[3] {
"display_url" => "",
"source_user_id" => ,
"type" => "",
"media_url" => "",
"source_status_id" => ,
"url" => "",
"indices" => [
[0] 58,
[1] 81
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"source_status_id_str" => "",
"media_url_https" => "",
"id" => ,
"source_user_id_str" => ""
}
]
},
"in_reply_to_status_id_str" => nil,
"in_reply_to_status_id" => nil,
"created_at" => "",
"in_reply_to_user_id_str" => nil,
"source" => "",
"retweeted_status" => {
"extended_entities" => {
"media" => [
[0] {
"display_url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"media_url_https" => "",
"id" => ,
"type" => "",
"media_url" => "",
"url" => ""
},
[1] {
"display_url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"media_url_https" => "",
"id" => ,
"type" => "",
"media_url" => "",
"url" => ""
},
[2] {
"display_url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"media_url_https" => "",
"id" => ,
"type" => "",
"media_url" => "",
"url" => ""
},
[3] {
"display_url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"media_url_https" => "",
"id" => ,
"type" => "",
"media_url" => "",
"url" => ""
}
]
},
"in_reply_to_status_id_str" => nil,
"in_reply_to_status_id" => nil,
"created_at" => "",
"in_reply_to_user_id_str" => nil,
"source" => "",
"retweet_count" => 5,
"retweeted" => false,
"geo" => nil,
"filter_level" => "",
"in_reply_to_screen_name" => nil,
"is_quote_status" => false,
"id_str" => "",
"in_reply_to_user_id" => nil,
"favorite_count" => 10,
"id" => ,
"text" => "",
"place" => nil,
"lang" => "und",
"favorited" => false,
"possibly_sensitive" => false,
"coordinates" => nil,
"truncated" => false,
"entities" => {
"urls" => [],
"hashtags" => [
[0] {
"indices" => [
[0] 24,
[1] 27
],
"text" => "F1"
},
.....


(Magnus Bäck) #4

[entities][hashtags][text] isn't a valid field reference since [entities][hashtags] is an array. Two options:

  • If [entities][hashtags] never contains more than one element you can use [entities][hashtags][0][text] to reference the text subfield of the first element.
  • If you want to lowercase all text subfields of all elements of the [entities][hashtags]array you'll have to use a ruby filter. The lowercase feature does support iterating over the elements of an array, but only arrays of strings. Arrays of objects aren't supported.

(Singa Forever) #5

The solution is using ruby filter.
Thank you so much.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.