Say I have a tool that lets me take a shortened URL like
{"url":"http://goo.gl/TTX8j5"} and return the expanded URL like {"url":
"http://goo.gl/TTX8j5", "resolved_url": "http://www.elasticsearch.org/"} It
will even do this when I route through, say, goo.gl to bit.ly to tinyurl,
and then finally to the final page. the "expanded_url" field present in the
twitter stream only seems to go one hop.
Say I have a couple hundred million records already loaded that I'd like to
go back and expand the URLs for. I was all set to go start pulling records
out with Python (my preferred language), updating them, and bulk loading
them back in. But then I realized that you can script with the bulk update
function.
Can I, with a data structure like follows, send every entities.urls.url to
my rest call at localhost:8888/urlexpansion, and insert the results for
resolved_url into the entities.urls structure with the update scripting
function?
Further, is there a practical way for me to extract all of the IDs of all
of the records that have objects in the entities.urls object?
"_source": {
"filter_level": "medium",
"contributors": null,
"text": "",
"geo": null,
"retweeted": false,
"in_reply_to_screen_name": null,
"truncated": false,
"lang": "und",
"entities": {
"symbols": [],
"urls": [{
"url": "https://t.co/XdXRudPXH5",
"expanded_url": "https://blog.twitter.com/2013/rich-photo-experience-now-in-embedded-tweets-3",
"display_url": "blog.twitter.com/2013/rich-phot\u2026",
"indices": [80, 103]
}]
"hashtags": [],
"user_mentions": []
},
"in_reply_to_status_id_str": null,
"id":,
"source": "<a href=\"http://twitter.com/download/iphone\"
rel="nofollow">Twitter for iPhone",
"in_reply_to_user_id_str": null,
"favorited": false,
"in_reply_to_status_id": null,
"retweet_count": 0,
"created_at": "Tue Nov 26 01:03:06 +0000 2013",
"in_reply_to_user_id": null,
"favorite_count": 0,
"id_str": "123",
"place": null,
"user": {
...
},
"coordinates": null
}
}
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cf62a331-4e50-46df-a5cd-befff58fbe19%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.