Selecting values from JSON

pastechecker · July 24, 2018, 2:23pm

Hello.
I am parsing twitter with following settings:

input {
twitter {
consumer_key => "XX"
consumer_secret => "XX"
oauth_token => "XX"
oauth_token_secret => "XX"
full_tweet => true
use_samples => true
languages => ["en", "de"]
}
}

output {
elasticsearch {
hosts => ["10.0.20.51:9200"]
index => "tweets-%{+YYYY.MM.dd}"
}
}

I do not need the massive json with more than 900 fields being in my ES.

For example:
~{
"_index": "tweets-2018.07.24",
"_type": "doc",
"_id": "wE6hzGQB2mGdQWLhJvXj",
"_version": 1,
"_score": null,
"_source": {
"entities": {
"hashtags": [],
"urls": [
{
"expanded_url": "https://twitter.com/i/web/status/1021759356671598592",
"display_url": "twitter.com/i/web/status/1…",
"url": "https://t.co/TVqHFnvmUG",
"indices": [
117,
140
]
}
],
"user_mentions": [],
"symbols": []
},
"text": "En todo lo que va del año hasta ahora entraba a gim 10.20 pensando que era ese el horario (y encima llegaba tarde),… https://t.co/TVqHFnvmUG",
"in_reply_to_user_id_str": null,
"extended_tweet": {
"full_text": "En todo lo que va del año hasta ahora entraba a gim 10.20 pensando que era ese el horario (y encima llegaba tarde), hoy me enteré que entrábamos a las 11🤦",
"display_text_range": [
0,
154
],
"entities": {
"hashtags": [],
"urls": [],
"user_mentions": [],
"symbols": []
}
},
"quote_count": 0,
"geo": null,
"timestamp_ms": "1532441388658",
"@timestamp": "2018-07-24T14:09:48.000Z",
"favorited": false,
"reply_count": 0,
"truncated": true,
"contributors": null,
"in_reply_to_status_id_str": null,
"place": null,
"lang": "es",
"is_quote_status": false,
"@version": "1",
"retweet_count": 0,
"favorite_count": 0,
"source": "<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android",
"filter_level": "low"
}

How can I extract only following fields:
"@timestamp":
"lang":
etc.

using filter?

filter {
json {
source => "@timestamp"
}
}

It is so confusing for me.
If anyone could point me to the right place, or show how to filter the given fields, would be amazing.
Thanks!

Badger · July 24, 2018, 2:33pm

I would use mutate+remove_field to remove the unwanted fields.

magnusbaeck · July 25, 2018, 8:18pm

Have a look at the prune filter.

pastechecker · July 26, 2018, 8:09am

Hello, @magnusbaeck thanks for pointing the prune filter.

first I have installed it:
./logstash-plugin install logstash-filter-prune

I have managed to prototype the filter:

filter {
prune {
whitelist_names => [
"entities.hashtags.text",
"entities.user_mentions.name",
"entities.user_mentions.id",
"lang",
"coordinates",
"retweeted_status.entities.hashtags.text",
"retweeted_status.entities.user_mentions.name",
"retweeted_status.entities.user_mentions.id",
"text",
"extended_tweet.full_text",
"extended_tweet.entities.hashtags.text",
"extended_tweet.entities.urls.url",
"extended_tweet.entities.urls.expanded_url",
"@timestamp"
]
}
}

The JSON that I am parsing is:
https://pastebin.com/n69pHb7H

However, I am accessing only not nested objects:

/kibana output
{
"lang": "en",
"coordinates": null,
"text": "RT @liamosaur: I just heard mansplaining referred to as "correctile dysfunction" and I'm pretty shook ",
"@timestamp": "2018-07-26T08:03:23.000Z"
},
"fields": {
"@timestamp": [
"2018-07-26T08:03:23.000Z"
]
},
"sort": [
1532592203000
]
}

I am working on this in the background.
If anyone have have idea how to access those nested objects, please comment.

pastechecker · July 26, 2018, 9:04am

At the moment I am doing following:
filter {
prune {
whitelist_names => [
"^entities$",
"^lang$",
"^coordinates$",
"^retweeted_status$",
"^text$",
"^extended_tweet$",
"^@timestamp$",
"^user$"
]
}
}

The reason to figure out the very accurate filtering is to limit the data load into ES cluster per day to the absolute minimum.

system · August 23, 2018, 9:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash Remove help Logstash	6	2330	July 6, 2017
Parse and extract json fields using logstash Logstash	3	596	June 14, 2020
Inputting JSON object to logstash - unable to remove certain fields or parse as JSON Logstash	5	10639	July 6, 2017
How extract specific fields from JSON input with Logstash filters? Logstash	1	1497	April 17, 2019
Parsing only a limited set of fields in JSON Logstash	5	1303	June 28, 2018

Selecting values from JSON

Related topics