Storing Twitter Data - What would be the best choice?

I am currently working on a project involving Twitter Data.

The team has been advised not to use Elasticsearch to store tweets because it's not a best practice. (they are thinking to use Amazon RDS...)

I've seen that it's either mySQL or MongoDB when searching on Google.

I do not really see the difference between Elasticsearch and MongoDB to store data.
What is your opinion ?

Thanks a lot in advance !

I'd say: it depends. If every single tweet is super important for your use case, then it will be safer to store them in a datastore (whatever it is: a database, a file system, S3...). It's easily doable with Logstash: collect with the twitter input, send tweets to elasticsearch and S3.
You just need to understand that this requirement will cost you more as you are basically storing the data twice. Note that Twitter is somehow a datastore, so you can may be just use it if any tweet is missing at some point.

If the requirements are not that strict, then you can totally use elasticsearch only and avoid lot of additional costs.
Elasticsearch has backups, replication... So that might be enough to limit problems if any.

I'd read this page: https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.