What should be the elastic query for the below data scenario?


(karan shah) #1

Hello Everyone,
I would like to discuss one scenario for retrieving millions of records from elastic search.
I am indexing Author Model as shown below in elastic search and I am using NEST Client with a .net application.
Below I am explaining my models.

Author
        AuthorKey		                            string
        AuthorName		                    string
        AuthorLastName	                    string
        List<AddressInfo>	                    Nested(i.e List of Address)
        List<Study>		                    Nested(i.e List of Study)
    
AddressInfo
       Address		                             string
       Email			                     string
      EntryDate 		                     date
Study
      PMID			                           int
      PublicationDate	                       date
      PublicationType	                       string
      Content		                       string

We have almost 10 Millions of authors and each author has completed minimum 3 studies.
So There are approximate 30 Millions of records available in the elastic index.
Now I would like to search based on PublicationDate, PublicationType, MeshTerms and Content
and display author data such a way so that data must be sorted in descending order of author's filtered study count of given search criteria

For Example,
Below Is Sample JSON Data as per my structure:

      {
  "Authors": [
    {
      "AuthorKey": "Author1",
      "AuthorName": "karan",
      "AuthorLastName": "shah",
      "AddressInfo": [
        {
          "Address": "Gopipura,Surat",
          "Email": "karan.j.shah@email.com",
           "EntryDate": "2010-01-17T06:32:18.306Z"
        },
        {
          "Address": "vesu,Surat",
          "Email": "shah.karan657@email.com",
           "EntryDate": "2015-01-17T06:32:18.306Z"
        },
        {
          "Address": "Navasari,Surat",
          "Email": "karansh@email.com",
          "EntryDate": "2014-01-17T06:32:18.306Z"
        }
      ],
      "Study": [
        {
          "PMId": 1000,
          "PublicationDate": "2019-01-17T06:35:52.178Z",
           "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        },
        {
          "PMId": 1001,
          "PublicationDate": "2019-01-16T05:55:14.947Z",
          "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        },
        {
          "PMId": 1002,
          "PublicationDate": "2019-01-15T05:55:14.947Z",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical2"
          ]
        },
        {
          "PMId": 1003,
          "PublicationDate": "2011-01-15T05:55:14.947Z",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical3"
          ]
        }
      ]
    },
    {
      "AuthorKey": "Author2",
      "AuthorName": "dharan",
      "AuthorLastName": "shah",
      "AddressInfo": [
        {
          "Address": "Gopipura1,Surat",
          "Email": "dharan.j.shah@email.com",
          "EntryDate": "2014-01-17T06:32:18.306Z"
        },
        {
          "Address": "vesu1,Surat",
          "Email": "dharan.karan657@email.com",
          "EntryDate": "2013-01-17T06:32:18.306Z"
        },
        {
          "Address": "Navasari1,Surat",
          "Email": "dharansh@email.com",
          "EntryDate": "2012-01-17T06:32:18.306Z"
        }
      ],
      "Study": [
        {
          "PMId": 2000,
          "PublicationDate": "2011-01-16T05:55:14.947Z",
          "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        },
        {
          "PMId": 2001,
          "PublicationDate": "2011-01-16T05:55:14.947Z",
          "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        },
        {
          "PMId": 2002,
          "PublicationDate": "2019-01-15T05:55:14.947Z",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical2"
          ]
        },
        {
          "PMId": 2003,
          "PublicationDate": "2015-01-15T05:55:14.947Z",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical3"
          ]
        }
      ]
    }
  ]
}

1. I would like to retrieve all authors along with their address-info(i.e. we only need the last 2 address of each author. for that we can use Entry-date of address-info)
whose study published in the year 2019(i.e. we need to filter based on Publication-date and retrieve all authors with filtered study count)
Expected Output:

    {
  "Authors": [
    {
      "AuthorKey": "Author1",
      "AuthorName": "karan",
      "AuthorLastName": "shah",
      "AddressInfo": [
        {
          "Address": "vesu,Surat",
          "Email": "shah.karan657@email.com",
          "MobileNo": "7405111342",
          "EntryDate": "2015-01-17T06:32:18.306Z"
        },
        {
          "Address": "Navasari,Surat",
          "Email": "karansh@email.com",
          "EntryDate": "2014-01-17T06:32:18.306Z"
        }
      ],
      "StudyCount": 3
    },
    {
      "AuthorKey": "Author2",
      "AuthorName": "dharan",
      "AuthorLastName": "shah",
      "AddressInfo": [
        {
          "Address": "Gopipura1,Surat",
          "Email": "dharan.j.shah@email.com",
          "EntryDate": "2014-01-17T06:32:18.306Z"
        },
        {
          "Address": "vesu1,Surat",
          "Email": "dharan.karan657@email.com",
          "EntryDate": "2013-01-17T06:32:18.306Z"
        }
      ],
      "StudyCount": 1
    }
  ]
}

Please Provide me the suitable elastic query to achieve this expected solution.
Thank You.


(system) closed #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.