Compare two indexes based on more than two fields

I have two Indexes. I want to get the list of matched and unmatched data based on field(s).
I had tried to use Preview transform Api , but it's showing data only up to 100 records. Please let me know ,is there any other approach in Elasticsearch?

Thanks in advance.

Sample (Please Assume more than 100 records):

This is just for my data schema.

curl -XPOST "http://localhost:9200/_bulk" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
{ "index" : { "_index" : "testindex", "_id" : "1" } }
{ "id" :"1", "name" : "Abhihek Gupta", "gender" : "Male","age" : "21","indexname" : "kc_app"}
{ "index" : { "_index" : "testindex", "_id" : "2" } }
{ "id" :"2", "name" : "Kiran Kher", "gender" : "Female","age" : "22","indexname" : "kc_app"}
{ "index" : { "_index" : "testindex", "_id" : "3" } }
{"id" :"3", "name" : "Abhihek Banargee", "gender" : "Male","age" : "23","indexname" : "kc_app"}
{ "index" : { "_index" : "testindex", "_id" : "4" } }
{"id" :"4", "name" : "Kiran Gupta12", "gender" : "Male","age" : "25" ,"indexname" : "kc_app"}
{ "index" : { "_index" : "testindex", "_id" : "5" } }
{"id" :"5", "name" : "Kiran Gupta23", "gender" : "Male","age" : "25" ,"indexname" : "kc_app"}
'

curl -XPOST "http://localhost:9200/_bulk" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
{ "index" : { "_index" : "testindex1", "_id" : "1" } }
{ "id" :"1", "name" : "Abhihek Gupta1", "gender" : "Male","age" : "21","indexname" : "Zero_app"}
{ "index" : { "_index" : "testindex1", "_id" : "2" } }
{ "id" :"2", "name" : "Kiran Kher", "gender" : "Female","age" : "22","indexname" : "Zero_app"}
{ "index" : { "_index" : "testindex1", "_id" : "3" } }
{"id" :"3", "name" : "Abhihek Banargee", "gender" : "Male","age" : "23","indexname" : "Zero_app"}
{ "index" : { "_index" : "testindex1", "_id" : "4" } }
{"id" :"4", "name" : "Kiran Gupta", "gender" : "feMale","age" : "25","indexname" : "Zero_app"}
{ "index" : { "_index" : "testindex1", "_id" : "5" } }
{"id" :"5", "name" : "Kiran Gupta", "gender" : "Male","age" : "25","indexname" : "Zero_app"}
'

I want the compared data based on id and name filed.

The transform preview API is only for checking how the data would look like. To run the full comparison you have to create a transform and run it.

In the documentation we provide the examples using preview, to quickly show the concept of the transform. If you want to use it in action, you have to create and start the transform. The result of the comparison will be in the destination index.

Thank you @Hendrik_Muhs for reply, it's working.

But I have another query. I want to again filter data based on other fields from the data of destinated index.

Data Schema of destinated index after creating and start transform :

{
"_index": "comparebyid",
"_id": "MPGsra40AGzOIw1d4LlG5dkAAAAAAAAA",
"_score": 1,
"_source": {
"id_compare": "005c023e-dee9-402a-8841-f892855ecf3e",
"compare": "match",
"Document": [
{
"doc": {
"clientName": "Domino's",
"phone": "7787888888",
"dOB": "",
"id": "005c023e-dee9-402a-8841-f892855ecf3e",
"typeData": "Zero_Data",
"email": "Dom@kc.com"
}
},
{
"doc": {
"id": "005c023e-dee9-402a-8841-f892855ecf3e",
"typeData": "KC_App",
"clientName": "Domino's",
"dOB": ""
}
}
]
}
},
{
"_index": "comparebyid",
"_id": "MNRSJju9ye_dP5Xyj6Hagc0AAAAAAAAA",
"_score": 1,
"_source": {
"id_compare": "0085b9d3-d711-4230-9bce-f1466a509959",
"compare": "match",
"Document": [
{
"doc": {
"id": "0085b9d3-d711-4230-9bce-f1466a509959",
"typeData": "KC_App",
"clientName": "Royal J",
"dOB": ""
}
},
{
"doc": {
"clientName": "Royal J",
"phone": "7412369859",
"dOB": "",
"id": "0085b9d3-d711-4230-9bce-f1466a509959",
"typeData": "Zero_Data",
"email": "royal@gmail.com"
}
}
]
}
},
{
"_index": "comparebyid",
"_id": "MLkJMJ5-GTvsSHBVLqcSwlQAAAAAAAAA",
"_score": 1,
"_source": {
"id_compare": "010f1e4d-76e6-469c-bdcb-1d02810c9a30",
"compare": "mismatch",
"Document": [
{
"doc": {
"id": "010f1e4d-76e6-469c-bdcb-1d02810c9a30",
"typeData": "KC_App",
"clientName": "Test SMSF 15",
"dOB": ""
}
}
]
}
},
{
"_index": "comparebyid",
"_id": "MArzZXNjUeCpZ4e4ZzCMCDQAAAAAAAAA",
"_score": 1,
"_source": {
"id_compare": "01420872-ab2e-4ede-ac82-2b9efb676be2",
"compare": "mismatch",
"Document": [
{
"doc": {
"clientName": "thiruka test",
"phone": "9606571654",
"dOB": "3/27/1991 12:00:00 AM",
"id": "01420872-ab2e-4ede-ac82-2b9efb676be2",
"typeData": "Zero_Data",
"email": "thirukanaik@gmail.com"
}
}
]
}
},

So I want to again the matched and unmatched result based on 'email ' and 'dob' fields in case of "compare": "mismatch".

Like this :
Document.doc[0].email = Document.doc[1].email Document.doc[0].dob= Document.doc[1].dob =>atlease min 50 % match then =>>
it will give 'matched', otherwise 'mismatch'

I hope , my question is cleared for you, if not please let me know.
I will be eagerly waiting for your reply.

Thank you in advance.

The example provided in the docs is just a starter. It only does exact matches, however you can implement fuzzy matching on your own using the painless scripting language. E.g. you could loop over fields to calculate the number of fields that match. More sophisticated fuzzy matching however is probably tricky to implement efficiently in painless.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.