Hello Everyone,
I would like to discuss one scenario for retrieving millions of records from Elasticsearch.
I am indexing Author Model as shown below in Elasticsearch and I am using NEST Client with a .net application.
Below I am explaining my models.
Author
AuthorKey string
AuthorName string
AuthorLastName string
List<AddressInfo> Nested(i.e List of Address)
List<Study> Nested(i.e List of Study)
AddressInfo
Address string
Email string
EntryDate date
Study
PMID int
PublicationDate date
PublicationType string
Content string
We have almost 10 Millions of authors and each author has completed minimum 3 studies.
So There are approximate 30 Millions of records available in the elastic index.
Now I would like to search based on PublicationDate, PublicationType, MeshTerms and Content
and display author data such a way so that data must be sorted in descending order of author's filtered study count of given search criteria
For Example,
Below Is Sample JSON Data as per my structure:
{ "Authors": [ { "AuthorKey": "Author1", "AuthorName": "karan", "AuthorLastName": "shah", "AddressInfo": [ { "Address": "Gopipura,Surat", "Email": "karan.j.shah@email.com", "EntryDate": "2010-01-17T06:32:18.306Z" }, { "Address": "vesu,Surat", "Email": "shah.karan657@email.com", "EntryDate": "2015-01-17T06:32:18.306Z" }, { "Address": "Navasari,Surat", "Email": "karansh@email.com", "EntryDate": "2014-01-17T06:32:18.306Z" } ], "Study": [ { "PMId": 1000, "PublicationDate": "2019-01-17T06:35:52.178Z", "PublicationType": [ "ClinicalTrial", "Medical" ] }, { "PMId": 1001, "PublicationDate": "2019-01-16T05:55:14.947Z", "PublicationType": [ "ClinicalTrial", "Medical" ] }, { "PMId": 1002, "PublicationDate": "2019-01-15T05:55:14.947Z", "PublicationType": [ "ClinicalTrial1", "Medical2" ] }, { "PMId": 1003, "PublicationDate": "2011-01-15T05:55:14.947Z", "PublicationType": [ "ClinicalTrial1", "Medical3" ] } ] }, { "AuthorKey": "Author2", "AuthorName": "dharan", "AuthorLastName": "shah", "AddressInfo": [ { "Address": "Gopipura1,Surat", "Email": "dharan.j.shah@email.com", "EntryDate": "2014-01-17T06:32:18.306Z" }, { "Address": "vesu1,Surat", "Email": "dharan.karan657@email.com", "EntryDate": "2013-01-17T06:32:18.306Z" }, { "Address": "Navasari1,Surat", "Email": "dharansh@email.com", "EntryDate": "2012-01-17T06:32:18.306Z" } ], "Study": [ { "PMId": 2000, "PublicationDate": "2011-01-16T05:55:14.947Z", "PublicationType": [ "ClinicalTrial", "Medical" ] }, { "PMId": 2001, "PublicationDate": "2011-01-16T05:55:14.947Z", "PublicationType": [ "ClinicalTrial", "Medical" ] }, { "PMId": 2002, "PublicationDate": "2019-01-15T05:55:14.947Z", "PublicationType": [ "ClinicalTrial1", "Medical2" ] }, { "PMId": 2003, "PublicationDate": "2015-01-15T05:55:14.947Z", "PublicationType": [ "ClinicalTrial1", "Medical3" ] } ] } ] }
1. I would like to retrieve all authors along with their address-info(i.e. we only need the last 2 address of each author. for that we can use Entry-date of address-info)
whose study published in the year 2019(i.e. we need to filter based on Publication-date and retrieve all authors with filtered study count)
Expected Output:
{
"Authors": [
{
"AuthorKey": "Author1",
"AuthorName": "karan",
"AuthorLastName": "shah",
"AddressInfo": [
{
"Address": "vesu,Surat",
"Email": "shah.karan657@email.com",
"MobileNo": "7405111342",
"EntryDate": "2015-01-17T06:32:18.306Z"
},
{
"Address": "Navasari,Surat",
"Email": "karansh@email.com",
"EntryDate": "2014-01-17T06:32:18.306Z"
}
],
"StudyCount": 3
},
{
"AuthorKey": "Author2",
"AuthorName": "dharan",
"AuthorLastName": "shah",
"AddressInfo": [
{
"Address": "Gopipura1,Surat",
"Email": "dharan.j.shah@email.com",
"EntryDate": "2014-01-17T06:32:18.306Z"
},
{
"Address": "vesu1,Surat",
"Email": "dharan.karan657@email.com",
"EntryDate": "2013-01-17T06:32:18.306Z"
}
],
"StudyCount": 1
}
]
}
Please Provide me the suitable elastic query to achieve this expected solution.
Thank You.