ElasticSearch query to filter out distinct results

Consider a table, where artifacts are mapped to publications.

------------------------------------------------------------------
   id   |   artifact_id |  publication_id |   A   |   B   |   C  |   
------------------------------------------------------------------
   1    |      1        |       1         |  _H_  |  ___  |  ___ |
------------------------------------------------------------------
   2    |      1        |       2         |  ___  |  _H_  |  ___ |
------------------------------------------------------------------
   3    |      2        |       1         |  ___  |  _H_  |  ___ |   
------------------------------------------------------------------

One artifact can have multiple publications, and the table has A, B, C fields(columns).

I'm searching a keyword in A, B, C using query_string.

Using the following query


    {
        "query": {
            "query_string": {
                "query": "keyword_to_be_searched",
                "fields": ["A, B, C"]
            }
        },
       "from" : starting_no,
        "size": results_per_page
    }

if I search for keyword H which matched all the 3 rows (as shown in table).

The response returns all 3 records.

      # For Understanding purpose
       response : {
                "id" : 1,
                "id" : 2,
                "id" : 3
       }           

There are many hits for different publications, but many of them are mapped to the same artifacts.

In the end, I want to display distinct artifacts (no repetition)

      # For Understanding purpose
       Expected response: {
                "id" : 1,
                "id" : 3
       }
        
      OR 

      Expected response: {
                "id" : 2,
                "id" : 3
       }
           

SInce both publication_id : 1 and publication_id : 2 are mappd to same artifact_id : 1

How do I rewrite the query to group_by or distinct by artifact_id along with pagination?

Check out field collapsing, that might help you in this use-case

1 Like

Hey, @spinscale thanks for the response.

I was able to get result with distinct artifacts.

And for pagination, I followed this article Total hits with field collapsing

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.