I am working on a project on social media analysis, esp. Twitter. I have imported tweets into daily indices with logstash. Now I need to do some processing in this data, and attach the results to each tweet. For example, sentiment, topic modeling and other NPL-related processings.
Suppose my indices is organized inside elastic such that every tweet is a document, with
status_id, and I have csv files with the following header:
status_id, sentiment_score, num_words, topic_1, topic_2,...
Since I am new to elastic, I wanted a "best practice" advice on how to attach these "extra" data to each document. I understand joining indices with a common field (e.g. status_id in this example) is not a good idea in elastic. So I thought I create a logstash .conf to read the csv and attach the data to the corresponding index/document as a nested field.
I need to create a dashboard over this data, do some aggregations; possibly more "extra" data will be created in the future which needs to be attached to the tweets. So, do you believe this approach would work? Any alternative solutions?