Hi, I am trying to find an effective way to count unique contacts. Elastic 8.15 
Index structure
{
id, email, phone, first, last
}
 
Data Example
{
1, user@gmail.com, 1111111, Tom, Hanks
},
{
2, user2@gmail.com, 1111111, Tom, Hanks
},
{
3, user@gmail.com, 22222, Tom, Hanks
},
{
4, user4@gmail.com, 22222, Kate, Ragan
},
 
Each record contains something common and there are here 4 potential duplicate records
when I count using each field I receive 
email - 2 
phone - 2, 
FullName -3 
The expected result is 4
phone or email or first + last - could determine contact
In what ways could I use to calculate that? I will be great for any ideas 
Thanks
             
            
               
               
               
            
            
           
          
            
              
                RabBit_BR  
                (andre.coelho)
               
              
                  
                    November 19, 2024,  7:01pm
                   
                   
              2 
               
             
            
              Hi @andreyshiryaev13 
You can use a new field to create a unique identifier using properties of the document. Below, I used a processor to create fields that can be used as a unique key. I hope this helps as a starting point.
POST /_ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "field": "_source.unique_id_email_full_name",
          "value": "{{_source.email}}-{{_source.first}}-{{_source.last}}"
        }
      },
      {
        "set": {
          "field": "_source.unique_id_phone_full_name",
          "value": "{{_source.phone}}-{{_source.first}}-{{_source.last}}"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "1",
      "_source": {
        "email": "user@gmail.com",
        "phone": "1111111",
        "first": "Tom",
        "last": "Hanks"
      }
    },
    {
      "_index": "index",
      "_id": "2",
      "_source": {
        "email": "user2@gmail.com",
        "phone": "1111111",
        "first": "Tom",
        "last": "Hanks"
      }
    },
    {
      "_index": "index",
      "_id": "3",
      "_source": {
        "email": "user@gmail.com",
        "phone": "22222",
        "first": "Tom",
        "last": "Hanks"
      }
    },
    {
      "_index": "index",
      "_id": "4",
      "_source": {
        "email": "user4@gmail.com",
        "phone": "22222",
        "first": "Kate",
        "last": "Ragan"
      }
    }
  ]
}
 
             
            
               
               
               
            
            
           
          
            
            
              thank you for your answer i will try to use that
it looks like a multi-term aggregation but I need another not all 3 fields.
I have 3 criteria(phone or email or name) and each of them can determine a duplicate row.
             
            
               
               
               
            
            
           
          
            
              
                system  
                (system)
                  Closed 
               
              
                  
                    December 17, 2024,  9:00pm
                   
                   
              4 
               
             
            
              This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.