Kibana newbie here coming from a traditional SQL / relational DB environment and have some questions about how to best create / configure two separate report types I am looking to make.
Visualization 1:
I have data structured as follows:
ID Name Source Address
123 Joe Phone Book 123 Fake St
123 Joe Facebook 125 Fake St
456 Lisa Passport 50 Imaginary Lane
456 Lisa License 99 No Way
How do I create a visualization that gives me the ID and name of any record where someone has both a "Passport" and "License" entry for source?
Visualization 2:
I have a second master table where data has been amalgamated into a single record with the best quality data has already been selected for each ID / column:
ID Name Source Address
123 Joe Phone Book 123 Fake St
456 Lisa License 99 No Way
How would I create a visualization that compares where there is a delta for the data fields?
For example it would say for "123 - Joe" there is a difference between his address in "Phone Book" and "Facebook"
For Visualization 1, here's the closest idea I have off the top of my head:
Create a Data Table visualization with a Split Rows bucket using a Terms aggregation on Name. This should give you a list like:
Name Count
Joe 2
Lisa 2
Then add a filter on Source where the value is one of: Passport,License.
At this point, you should have a table with all unique Names, and any name with Count > 1 should be what you are looking for.
Now if you go under your Terms bucket and click "Advanced", you can put {"min_doc_count": 2} inside of the "JSON Input" field and it should display what you're looking for.
Now, this idea of doing it based on Count only works on the assumption that there are not duplicate Source entries per-user (e.g. Lisa will have no more than 1 entry with Source=Passport). Not sure if this applies to your situation or not.
As for Visualization 2, I'm not sure if I can think of a good way to achieve what you're looking for with the amalgamated data -- but if you took the original data and did a terms aggregation on the address (so you have a list of all unique addresses), you could then add a second subbucket terms agg on the Name. Then if you sort by Name, any name appearing more than once has a delta for the address.
I'm guessing this only partially answers both of your questions, but there may be some more magic you could do with scripting, perhaps by adding some fields that make querying a bit more straightforward... @ppisljar or @timroes might have some better ideas here that I'm not thinking of.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.