Another multiple index question, I think

I am an elastic noob running on version 6.8 of the stack, who has been searching for a good way, I believe, to aggregate data from 2 or more indices for about 2 days now.

Here is the scenario, I have an index that is acting as a data table that contains internal reference data like so: (Telephone user assignment)

Internal User assignments per telephone number
John B, 2145551212
John B, 2145551213
John B, 2145551214
Rachel B, 2145551215
Rachel B, 2145551216

Then I have an index which holds usage data for each telephone number, however, it is only assigned by telephone number like so:

2145551212, 500 (mins)
2145551213, 200 (mins)
2145551214, 300 (mins)
2145551215, 400 (mins)
2145551216, 400 (mins)

The end result that I want to report against looks like this:

JohnB used 1000 (sum of all lines) minutes in the previous billing month

JohnB,2145551212,500
JohnB,2145551213,200
JohnB,2145551214,300

Rachel B used 800 minutes in the previous billing month
RachelB,2145551215,400
RachelB,2145551216,400

So as you can see I can easily create all different types of graphs for the usage data from the external source. However if I want to map it to the people or entities using the data with internal assignemnts I need a way to join or map the data together. I thought that it would be really easy to create a reference index mapping the phone numbers to the end user, and then query both using an alias in Kibana, but so far I am not having any luck. I was planning to do the same thing to map service plans as well by using the telephone number as the key in another index in order to track costs and revenue as well.

I guess my question is multipart:

  1. Should I be referencing multiple indices or just creating multiple documents within 1 index? (I just found out that multiple documents was possible)

  2. Would this aggregation, if you will, be considered part of the data "normalization" process or should Kibana have the ability to map this data between indices?
    (In other words, should my scripts which are formatting all of the externfal data from csv into json to begin with somehow be referencing a SQL database or something to add the fields from the reference phone number first?)

My design looks like this:

'usage_external' 'usage_internal' 'usage_plans' 'usage_cdrs'

'usage_external' has many thousand rows of data uploaded from our upstream carriers to our report server daily; 'usage_internal' and 'usage_plans' are very small mapping or assignment reference index'es; 'usage_cdrs' will be the next source of large data records that I attempt to bring into the mix at a later date. For now I just want to be sure that I am properly and effeciently using elasticsearch to normalize, aggregate, and report against this data.

Welcome!

Elasticsearch needs data together to provide the report you want. So what you can look at is having a user index that has the user and phone number, then when you index the usage data into another index, you can add the username to each document by doing a lookup in the user index to fetch it.

As an alternative to using these rolled up values, and if you have the individual calls for each user, it might be easier to just index those and then let Elasticsearch aggregate each individual call to provide the results you want.

If I am interpreting the explanation properly, you're saying that I would continue to injest the usage data into the 'usage_external' index, then query the 'usage_external' index for matching phone numbers against the 'usage_internal' index, and finally roll the resulting data into a 'usage_final' index (which currently does not exist)? And alternatively I should consider processing the internal and the external data at the same time that I would usually normalize only external data before ingesting the complete data into the "usage" index for Kibana to handle this task correctly? Would you mind directing me to an example of option 1? (online documentation or personal experience?) I can most likely figure out how to do option 2 on my own since my current ingestion process is a shell script which models the json document for elasticsearch or alternatively I could build a SQL database to do this as well and then export the results to elasticsearch.

When you ingest the usage data, you could do a lookup into the number index at index time and add that data to the event. That way you don't need 3 indices.

Logstash can do this for you, it could even do a lookup against a phone number file if you had that, or against Elasticsearch if it's in an index.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.