I have the following use cases. We are pushing file, directory, attribute and user data into elasticsearch.
Now we want to do reporting on top of it.
The data needs to be transformed because the initial format does not allow the queries that are necessary.
e.g. we have one script field that calculates the percentage of video files that a user has. The problem is that you can not get the top 100 users based on this scripted field. So the idea is to take the whole index and create a new index with all the data again, but now the scripted field is a "normal" field in the index and I can do the queries I need to. Finally we want to get the information displayed with highcharts.
Since I am coming from old faishoned ETL approached I would like to ask if this is the right approach and ask the following questions:
- performing the data transformation in Elasticsearch by using indexes that are based on other indexes?
- is it better to use the MongoDB for data transformations?
- where would you store the final json doc that contains the data that is used to display the information in Highcharts?
- also considering the transactional issues. The chart should be available all the time even when the indexes are recalculated. Is it true that since the highchart data is stored in one singel document in elasticsearch there will be no issues since it is transactional on a document level?
Thanks a lot for any insight.