I am an elastic noob running on version 6.8 of the stack, who has been searching for a good way, I believe, to aggregate data from 2 or more indices for about 2 days now.
Here is the scenario, I have an index that is acting as a data table that contains internal reference data like so: (Telephone user assignment)
Internal User assignments per telephone number
John B, 2145551212
John B, 2145551213
John B, 2145551214
Rachel B, 2145551215
Rachel B, 2145551216
Then I have an index which holds usage data for each telephone number, however, it is only assigned by telephone number like so:
2145551212, 500 (mins)
2145551213, 200 (mins)
2145551214, 300 (mins)
2145551215, 400 (mins)
2145551216, 400 (mins)
The end result that I want to report against looks like this:
JohnB used 1000 (sum of all lines) minutes in the previous billing month
Rachel B used 800 minutes in the previous billing month
So as you can see I can easily create all different types of graphs for the usage data from the external source. However if I want to map it to the people or entities using the data with internal assignemnts I need a way to join or map the data together. I thought that it would be really easy to create a reference index mapping the phone numbers to the end user, and then query both using an alias in Kibana, but so far I am not having any luck. I was planning to do the same thing to map service plans as well by using the telephone number as the key in another index in order to track costs and revenue as well.
I guess my question is multipart:
Should I be referencing multiple indices or just creating multiple documents within 1 index? (I just found out that multiple documents was possible)
Would this aggregation, if you will, be considered part of the data "normalization" process or should Kibana have the ability to map this data between indices?
(In other words, should my scripts which are formatting all of the externfal data from csv into json to begin with somehow be referencing a SQL database or something to add the fields from the reference phone number first?)
My design looks like this:
'usage_external' 'usage_internal' 'usage_plans' 'usage_cdrs'
'usage_external' has many thousand rows of data uploaded from our upstream carriers to our report server daily; 'usage_internal' and 'usage_plans' are very small mapping or assignment reference index'es; 'usage_cdrs' will be the next source of large data records that I attempt to bring into the mix at a later date. For now I just want to be sure that I am properly and effeciently using elasticsearch to normalize, aggregate, and report against this data.