Compare Two Indexes

Hi,

I have one document in one index and other document in other index. Now I want to compare the fields of these two documents of different indexes. Is it possible to do it, not sure did some couldn't find any satisfactory answer.

Hello @Dinesh_Sharma

Could you provide a sample of the schema for the two indices and explain how you plan to match one to the other?

Thanks,
Matt

Hi @mattkime ,

I have two csv which contains around 10k row each.Now I want to ingest them in two different indexes. Suppose name of the indexes are A and B. In index A, every document contains a field IP1 and in index B, every document contains a field IP2. My aim is to perform IP1==IP2. How can I do it?

CSV1
name,age,IP1

CSV2
name,age,IP2

Here IP1: IP address
IP2: IP Address

Note: I tried to ingest these two CSV in single index and perform IP1==IP2 but since there is no way in this case to do the same. So I am thinking to do this with two different indexes.

The simplest way is to simply get everything into a single index. Could you merge the CSVs before ingesting them? Past that, you might ingest once CSV and then iterate through the data in the second CSV, updating the index.

We have a transform example for something like this. If you only have 10k rows however, you don't need a transform, but you can do it in a single search request. You can use the scripted_metric aggregation from the example in the search request instead of using transform.

I hope that gives you an idea.

Hi,

I tried the transform way on two test indexes. Code is attached.

{

  "id" : "index_compare",

  "source" : { 

    "index" : [

      "test1_index",

      "test2_index"

    ],

    "query" : {

      "match_all" : { }

    }

  },

  "dest" : { 

    "index" : "compare"

  },

  "pivot" : {

    "group_by" : {

      "unique-id" : {

        "terms" : {

          "field" : "<unique-id-field>" 

        }

      }

    },

    "aggregations" : {

      "compare" : { 

        "scripted_metric" : {

          "map_script" : "state.doc = new HashMap(params[\u0027_source\u0027])", 

          "combine_script" : "return state", 

          "reduce_script" : " \n            if (states.size() != 2) {\nreturn \"count_mismatch\"\n            }\n            if (states.get(0).equals(states.get(1))) {\nreturn \"match\"\n            } else {\nreturn \"mismatch\"\n            }"

        }

      }

    }

  }

}

I am not able to understand the unique identifier and do I need to make any other change in the above code. As I want to get the all in document where the IP Field of both the document of the indexes are same.

The content of ingested two csv files are show below:

new1.csv (ingested in test1.index)

image

new2.csv(ingested in test2.index)

image

The group_by defines which field(s) to use for grouping it together:

There you go, if you specify the field name of the ip field, the pivot groups docs with the same IP together. You can rename the output field, e.g:

"group_by" : {
      "ip" : {
        "terms" : {
          "field" : "ip" 
        }
      }

Whatever you specify for field must match the field name that you used when indexing your docs.

Hi @Hendrik_Muhs,

Great! , it is working like charm:)

I have two doubts please assists with those too:

(1) If I ingest these two csv in single index then also is there any way to comapre the IP column as we are doing in two indexes?

(2) The transform code that I wrote above is there any way to visualize it or to see it in discover tab?

Yes, that's possible. For the transform it makes no difference if the data originates from 1 or 2 or more indexes.

If you want to visualize it, you need to create the transform. The example I gave used POST _transform/_preview, that's only the preview endpoint. A real transform is a task that you can create using the transform API's. I suggest to familiarize yourself starting from here.

With continuous transform you can let transform update the destination index as new data comes in.

A real transform just writes the data into a new index, therefore you can use it just as any other index and e.g. run visualization/discover/... on it. If you use the transform API's directly that requires 1 additional step: the creation of a kibana index pattern. But there is also a transform UI, the UI can create the pattern for you.

Created successfully. Thanks:)

Hi @Hendrik_Muhs ,

I used the below code to preview the transform. It is giving wrong result. IP "10.11.1.2" is present in both indexes but it is giving result as mismatch. PFA screenshot attached.

POST /_transform/_preview?pretty
{
  "id": "index_compare",
  "source": {
    "index": [
      "test1_index",
      "test2_index"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "compare"
  },
  "pivot": {
    "group_by": {
      "unique-id": {
        "terms": {
          "field": "IP.keyword"
        }
      }
    },
    "aggregations": {
      "compare": {
        "scripted_metric": {
          "map_script": "state.doc = new HashMap(params['_source'])",
          "combine_script": "return state",
          "reduce_script": """ 
            if (states.size() != 2) {
return "count_mismatch"
            }
            if (states.get(0).equals(states.get(1))) {
return "match"
            } else {
return "mismatch"
            }"""
        }
      }
    }
  }
}

The scripted metric is just an example, it compares to indices and aims check if 2 indexes are equal.

What are you looking for? What should happen if IP1==IP2?

Hi @Hendrik_Muhs ,

If IP1==IP2 then it should simply return "match" but it is returning mismatch. Do I need to modify this script? please suggest any edit in script.

The group by already ensures that the IP's match.

The idea behind the example script is to ensure that the same doc exists in 2 indexes, for that every group of documents must have a count of 2, that's what:

if (states.size() != 2)

does. Next it compares if the full documents are the same, it sounds like you don't want that deep compare, therefore you can simplify the reduce script:

"reduce_script": """ 
            if (states.size() != 2) {
return "count_mismatch"
            }
return "match"
            """
1 Like

Hi @Hendrik_Muhs ,
The above suggested code is working like a charm. Thanks for the same.

(1) If I want to apply the transform on single index then will the below code work for me:

POST /_transform/_preview?pretty
{
  "id": "index_compare",
  "source": {
    "index": [
      **"compare_pim_index"**
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "compare"
  },
  "pivot": {
    "group_by": {
      "unique-id": {
        "terms": {
          "field": "IP.keyword"
        }
      }
    },
    "aggregations": {
      "compare": {
        "scripted_metric": {
          "map_script": "state.doc = new HashMap(params['_source'])",
          "combine_script": "return state",
          "reduce_script": """ 
            if (states.size() != 2) {
return "count_mismatch"
            }
return "match"
            """
        }
      }
    }
  }
}

(2) And If instead of comparing the same field name document , can I compare two document with the two different field name like some document in a index will have IP1.keyword and some will have IP2.keyword then will the below code work:

POST /_transform/_preview?pretty
{
  "id": "index_compare",
  "source": {
    "index": [
      "compare_pim_index"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "compare"
  },
  "pivot": {
    "group_by": {
      "unique-id": {
        "terms": {
          **"field": "IP1.keyword",**
**          "field1":"IP2.keyword"**
        }
      }
    },
    "aggregations": {
      "compare": {
        "scripted_metric": {
          "map_script": "state.doc = new HashMap(params['_source'])",
          "combine_script": "return state",
          "reduce_script": """ 
            if (states.size() != 2) {
return "count_mismatch"
            }
return "match"
            """
        }
      }
    }
  }
}

grouping by terms can only take 1 field, I think you have to stay with the 2-index approach. Is there a reason not to?

1 Like

Hi @Hendrik_Muhs,

Two index approach is working quite fine. I was just trying to do in one index so that we don't need to change our existing setup. But since this test case is quite important to us so two index approach is also fine.

Thanks buddy! for your help during this long conversation:)