Lowercase hashtag twitter field

Hello,
I'm collecting tweets from twitter using ELK 5.X, In order to handle my application, I need to :

 1- Convert all collected twitter hashtags to lowercase : 

I used mutate in logstash file

filter{
 mutate {
             lowercase => [ [entities][hashtags][text] ]
     }
}

but this is not working. ( hashtag value text is an element of the object entities.hashtags)

waiting for your help,
Thank you

Have you tried doublequoting the field name?

         lowercase => [ "[entities][hashtags][text]" ]

If that doesn't help, show what an event looks like. Use a stdout { codec => rubydebug } output.

Hi Magnus,
Thank you for your quick answer, I tried using double-quote but nothing new.
Output :

{
"extended_entities" => {
"media" => [
[0] {
"display_url" => "",
"source_user_id" => ,
"type" => "",
"media_url" => "",
"source_status_id" => ,
"url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"source_status_id_str" => "",
"media_url_https" => "",
"id" => ,
> "source_user_id_str" => ""
},
[1] {
"display_url" => "",
"source_user_id" => ,
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"source_status_id_str" => "",
"media_url_https" => "",
"id" => ,
"source_user_id_str" => ""
},
[2] {
"display_url" => "",
"source_user_id" => ,
"type" => "",
"media_url" => "",
"source_status_id" => ,
"url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"source_status_id_str" => "",
"media_url_https" => "",
"id" => ,
"source_user_id_str" => ""
},
[3] {
"display_url" => "",
"source_user_id" => ,
"type" => "",
"media_url" => "",
"source_status_id" => ,
"url" => "",
"indices" => [
[0] 58,
[1] 81
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"source_status_id_str" => "",
"media_url_https" => "",
"id" => ,
"source_user_id_str" => ""
}
]
},
"in_reply_to_status_id_str" => nil,
"in_reply_to_status_id" => nil,
"created_at" => "",
"in_reply_to_user_id_str" => nil,
"source" => "",
"retweeted_status" => {
"extended_entities" => {
"media" => [
[0] {
"display_url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"media_url_https" => "",
"id" => ,
"type" => "",
"media_url" => "",
"url" => ""
},
[1] {
"display_url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"media_url_https" => "",
"id" => ,
"type" => "",
"media_url" => "",
"url" => ""
},
[2] {
"display_url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"media_url_https" => "",
"id" => ,
"type" => "",
"media_url" => "",
"url" => ""
},
[3] {
"display_url" => "",
"indices" => [
],
"sizes" => {
},
"id_str" => "",
"expanded_url" => "",
"media_url_https" => "",
"id" => ,
"type" => "",
"media_url" => "",
"url" => ""
}
]
},
"in_reply_to_status_id_str" => nil,
"in_reply_to_status_id" => nil,
"created_at" => "",
"in_reply_to_user_id_str" => nil,
"source" => "",
"retweet_count" => 5,
"retweeted" => false,
"geo" => nil,
"filter_level" => "",
"in_reply_to_screen_name" => nil,
"is_quote_status" => false,
"id_str" => "",
"in_reply_to_user_id" => nil,
"favorite_count" => 10,
"id" => ,
"text" => "",
"place" => nil,
"lang" => "und",
"favorited" => false,
"possibly_sensitive" => false,
"coordinates" => nil,
"truncated" => false,
"entities" => {
"urls" => ,
"hashtags" => [
[0] {
"indices" => [
[0] 24,
[1] 27
],
"text" => "F1"
},
.....

[entities][hashtags][text] isn't a valid field reference since [entities][hashtags] is an array. Two options:

  • If [entities][hashtags] never contains more than one element you can use [entities][hashtags][0][text] to reference the text subfield of the first element.
  • If you want to lowercase all text subfields of all elements of the [entities][hashtags]array you'll have to use a ruby filter. The lowercase feature does support iterating over the elements of an array, but only arrays of strings. Arrays of objects aren't supported.

The solution is using ruby filter.
Thank you so much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.