Hey there.
I am working on a project on social media analysis, esp. Twitter. I have imported tweets into daily indices with logstash. Now I need to do some processing in this data, and attach the results to each tweet. For example, sentiment, topic modeling and other NPL-related processings.
Suppose my indices is organized inside elastic such that every tweet is a document, with document_id
= status_id
, and I have csv files with the following header:
status_id, sentiment_score, num_words, topic_1, topic_2,...
Since I am new to elastic, I wanted a "best practice" advice on how to attach these "extra" data to each document. I understand joining indices with a common field (e.g. status_id in this example) is not a good idea in elastic. So I thought I create a logstash .conf to read the csv and attach the data to the corresponding index/document as a nested field.
I need to create a dashboard over this data, do some aggregations; possibly more "extra" data will be created in the future which needs to be attached to the tweets. So, do you believe this approach would work? Any alternative solutions?
Many thanks.