Join multiple Independent Indices


(Venkatesh) #1

Environment: File Beat (1.0.0-rc2) --> Log Stash (2.1.0)--> Elastic Search (2.1.0)--> Kibana (4.3)
Use Case: Real time application transactions logs metrics analysis & monitoring, Search
Domain: Telecom IT
Description:
The Logstash collects logs from multiple applications and indexes to ES in a different Index for each application. For example
Logs from App 1 would be indexed to Index_1
Logs from App 2 would be indexed to Index_2
Logs from App 3 would be indexed to Index_3

These logs can/may not be inserted to the same Index, as these are with different formats.
Each Index has a common filed say subscriber ID.

Requirement:
I want to search for the information of a subscriber joining the data for multiple Indexes. How can I achieve this !! I have gone through few options like parent-child relationship, de normalize at index time.. However those may not be applicable in my use case.

Kindly suggest any approach.

Thanks
Venkatesh


(Vincent Tran) #2

You can absolutely have data in different format / with different fields in the same index. You just need to define them as different types. And since they share common subscriber ID you can query on those.


(Venkatesh) #3

Thanks Vicent for the Idea. I have modeled the data to be in different types for each different source.
However I am not able to breakthrough the following.
Say in the normal RDBMS world, I have 2 tables for 2 sources of data.
Table A - Common ID, Col 1A, Col 2A, Col 3A
Table B - Common ID, Col 1B, Col 2B
To Join them - Select Col 1A, Col 2A, Col 1B, Col 2B From Table A, Table B where A.CommonID =B.Common ID;

How Can I achieve this in Kibana Discover. I have seen numerous discussions on this idea in the internet but I couldn't get a straight forward/simplified solution.

It would great if you can throw some pointers/tips. :slight_smile:


(David Pilato) #4

You can't really join in elasticsearch unless you use parent child but Kibana does not support it.

It's definitely better to model your documents in a different way.
Just index everything in a single doc and forget relations.


(Venkatesh) #5

Thanks David. OK In such case can you give me a high level idea how to do it.

I have multiple source applications from which the data is collected by Logstash using various plugin such as File beat, File, JDBC and currently indexing to ES in different mapping types.

In order to Index all the info to single doc, where and how can Join the data from various sources before indexing to ES. Or is there any way to generate a "global" index with all the data from different indices !!

I am really finding it very interesting to think beyond the typical RDBMS ideas.
Thanks


(David Pilato) #6

Typically index something like:

{
 "Col1A": "",
 "Col2A": "",
 "Col3A": "",
 "Col1B": "",
 "Col2B": ""
}

But you can't really do join in LS IMO or it could be hard to do so. May be you could fetch an existing data from elasticsearch using https://www.elastic.co/guide/en/logstash/2.1/plugins-inputs-http.html but unsure.

If you want to display multiple time series data on Kibana, you should give a look at TimeLion.


(Christian Dahlqvist) #7

When using time-based indices, it is often difficult to use parent-child relationships, as a limitation is that it requires all documents in a hierarchy to be present in the same shard. For time-based data it is therefore generally best to denormalise your data at index time. This typically works well when you have reasonably static data, e.g. customer or subscriber information, that can be added onto the time-based events.

As long as you control the names and mappings for fields that are common across different types of data, e.g. customer id, you can create visualisations in Kibana based on different underlying indices and place these in a single dashboard. You can then filter on these common fields in the dashboard and the filter will be applied to all visualisations, irrespective of underlying index.


(Mark Harwood) #8

See How can I use aggregations to query distinct values across all time grouped by first seen and the links to using an entity-centric indexing approach.

In your case the central "entity" would be a subscriber and you would be fusing data from multiple "event" indices.

Cheers
Mark


(system) #9