I have 2 indices, one contains all the courses and another index contains all the student registrations. Is there a way I can query the course index and get the count of registrations from the registration index in one query?
Thanks!
I have 2 indices, one contains all the courses and another index contains all the student registrations. Is there a way I can query the course index and get the count of registrations from the registration index in one query?
Thanks!
I am not sure I completely understand the use case. As ever, a couple of sample documents from each index would help a lot.
I presume there is there some field in common in both indices, e.g. courseId or similar?
e.g. the course index contains lots more fields/information on that specific course, name/ dates/whatever, but also a courseId field that appears in the other index. And the registration index has information on the students and the courses they registered to take, e.g. name/semester/dates/whatever, and each doc has a courseId that would match the other index.
My feeling is the answer to your question is yes, but would need to see some docs to a) be sure and b) maybe help construct the required query/aggregation.
I think ES|QL + enrich can help here.
I assumed you had something like this, docs in 2 indices, students and courses, and some common field.
Same sort of thing will work if the common field were the student, as long as there is something you can use to effectively JOIN. Depending on how your data is you can enrich either the student or course index with data from the other one, for this data it seemed to me more natural the way I did for this toy problem.
curl -s -k -H "Content-Type: application/x-ndjson" -XPOST -u "${EUSER}:${EPASS}" https://${EHOST}:${EPORT}/students/_bulk --data-binary "@student-index"
curl -s -k -H "Content-Type: application/x-ndjson" -XPOST -u "${EUSER}:${EPASS}" https://${EHOST}:${EPORT}/courses/_bulk --data-binary "@course-index"
% head -6 student-index course-index
==> student-index <==
{ "index" : {} }
{ "studentId" : "101" , "courseId" : "A1" }
{ "index" : {} }
{ "studentId" : "103" , "courseId" : "C3" }
{ "index" : {} }
{ "studentId" : "102" , "courseId" : "A1" }
==> course-index <==
{ "index" : {} }
{ "courseId" : "A1", "courseName" : "Geography" }
{ "index" : {} }
{ "courseId" : "B2", "courseName" : "History" }
{ "index" : {} }
{ "courseId" : "C3", "courseName" : "Math" }
then you can use an enrich processor to effectively do a join on the common field, in this case its courseId. something like:
PUT /_enrich/policy/my-policy
{
"match": {
"indices": "courses",
"match_field": "courseId",
"enrich_fields": ["courseName"]
}
}
and execute that policy with
PUT /_enrich/policy/my-policy/_execute?wait_for_completion=true
Then following ES|QL will work:
FROM students
| STATS perCourseCount = COUNT(*) BY courseId
| ENRICH my-policy
| SORT perCourseCount
| KEEP courseName , perCourseCount , courseId
| LIMIT 100
you will see something like:
This is what I am looking for. Thank you Kevin.
Cool, note that you will need to re-execute your enrich policy if the source index is updated, so to reflect the new values. The special enriched index is viewable in _cat/indices. If the indices are large, you might wish to do equiv of:
PUT /_enrich/policy/my-policy/_execute?wait_for_completion=false
and let it execute in background. If the dataset is massive, the above is not really designed to scale.
There might be further enhancements for JOIN-like stuff in ES|QL going forward, I've read hints about that a few times, but as I dont work for elastic I've no inside info on what/when something might come.
You can close this topic by accepting an answer. I wish you luck with your project.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.