Comapre two excel reports in ES and Kibana

I need to compare two excel reports (old and new), each reports have ~200 columns and each row in the old excel has a corresponding row in new excel sheet, also each row has a unique mapping by a combination of fields.

I am planning to convert these excel reports into a json document and store it in elastic search with an index and type.

But am pretty new to elastic search and Kibana, could someone please help me in comparing the reports.

Basically I need to compare how much percentage each columns matches with old and new and other statistics?

This might be a good question for the ES-forum, since they do more data modeling.

But here's how I start:

  • every row is a document (ie. JSON object)
  • each column is a field (ie. JSON property)

To handle both sets, you could use two approaches:

  • store each collection of documents in a different index (so an index for sheet-old and sheet-new).
  • store in the same index, but add json-property to each document indicating to which excel-sheet it belongs.

@thomasneirynck Thanks for your reply.

I could store two excels in two differnt indices. Each document is unique by a combination of few fields also the same documents will be availble in both documents.

How can I create an aggregate key and compare two indices in Kibana? Is Kibana suitable for this or any other approch for this?

Regards,
Saravana

You can create an aggregate key using 'Scripted Fields'. In Kibana, starting from 5.0, there is a UI to do this: https://www.elastic.co/guide/en/kibana/current/scripted-fields.html.

a) As long as you can distinguish for each document which excell-sheet it belongs to from one of the field-values in the document, in Kibana you will be able to tease out both excell sheet; when designing visualizations, adding a query to the filter bar, etc...

b) If your documents are in different indices, that not a problem. In Kibana, you will need to create an "Index Pattern": https://www.elastic.co/guide/en/kibana/current/tutorial-define-index.html. This index pattern can contain wildcards, so it can match multiple indices at the same time. e.g.

excell-* would catch index excell-a and index excell-b.

Regardless, when you will be running queries and make comparisons between the two, you will still need a field-value per document that establishes which excell sheet they belong to.

So, if you want to e.g. make 2 line-charts that show metirics per excell-sheet, you will need (a). If you also want to have a clean index layout, and run index-level stats, you could opt to split them up with (b) and use an index-pattern with wildcard to run queries over both.

Thanks for your reply.

I have added excel specific property to distinguish the index it is from and also indexed both excels with different type on a same index. indexed both the excels with different type and same index name.

I get both the excel sheet documents in discovery.

now which type of chart I need to use to compare documents from both the types and get the stats of matching percentage?

i.e.

I need to find the correct document in b-sheet by an aggregate key [ uniqueness by combination of columns ] in a-sheet and do field by field comparison, and visualize the percentage of match for all the columns.

which type of chart or approach would be the better choice ?

You can look at scripted fields:

Write a field that compares the two fields. It will return a boolean (match/no match).

You can then use that boolean-field to perform aggregation queries in the visualize module. E.g. use a pie-chart to show the percentages.

Sure, will try and get back.

Thanks and Regards,
Saravana

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.