Getting the count of documents in one index based on the ID on another index

kko · February 20, 2025, 5:20pm

I have 2 indices, one contains all the courses and another index contains all the student registrations. Is there a way I can query the course index and get the count of registrations from the registration index in one query?

Thanks!

RainTown · February 20, 2025, 8:57pm

I am not sure I completely understand the use case. As ever, a couple of sample documents from each index would help a lot.

I presume there is there some field in common in both indices, e.g. courseId or similar?

e.g. the course index contains lots more fields/information on that specific course, name/ dates/whatever, but also a courseId field that appears in the other index. And the registration index has information on the students and the courses they registered to take, e.g. name/semester/dates/whatever, and each doc has a courseId that would match the other index.

My feeling is the answer to your question is yes, but would need to see some docs to a) be sure and b) maybe help construct the required query/aggregation.

RainTown · February 21, 2025, 10:58am

I think ES|QL + enrich can help here.

I assumed you had something like this, docs in 2 indices, students and courses, and some common field.

Same sort of thing will work if the common field were the student, as long as there is something you can use to effectively JOIN. Depending on how your data is you can enrich either the student or course index with data from the other one, for this data it seemed to me more natural the way I did for this toy problem.

curl -s -k -H "Content-Type: application/x-ndjson" -XPOST -u "${EUSER}:${EPASS}" https://${EHOST}:${EPORT}/students/_bulk --data-binary "@student-index"

curl -s -k -H "Content-Type: application/x-ndjson" -XPOST -u "${EUSER}:${EPASS}" https://${EHOST}:${EPORT}/courses/_bulk --data-binary "@course-index"

% head -6 student-index course-index
==> student-index <==
{ "index" : {} }
{ "studentId" : "101" , "courseId" : "A1" }
{ "index" : {} }
{ "studentId" : "103" , "courseId" : "C3" }
{ "index" : {} }
{ "studentId" : "102" , "courseId" : "A1" }

==> course-index <==
{ "index" : {} }
{ "courseId" : "A1", "courseName" : "Geography" }
{ "index" : {} }
{ "courseId" : "B2", "courseName" : "History" }
{ "index" : {} }
{ "courseId" : "C3", "courseName" : "Math" }

then you can use an enrich processor to effectively do a join on the common field, in this case its courseId. something like:

PUT /_enrich/policy/my-policy
{
  "match": {
    "indices": "courses",
    "match_field": "courseId",
    "enrich_fields": ["courseName"]
  }
}

and execute that policy with

PUT /_enrich/policy/my-policy/_execute?wait_for_completion=true

Then following ES|QL will work:

FROM students
| STATS perCourseCount = COUNT(*) BY courseId
| ENRICH my-policy
| SORT perCourseCount
| KEEP courseName , perCourseCount , courseId
| LIMIT 100

you will see something like:

kko · February 21, 2025, 1:53pm

This is what I am looking for. Thank you Kevin.

RainTown · February 21, 2025, 2:13pm

Cool, note that you will need to re-execute your enrich policy if the source index is updated, so to reflect the new values. The special enriched index is viewable in _cat/indices. If the indices are large, you might wish to do equiv of:

PUT /_enrich/policy/my-policy/_execute?wait_for_completion=false

and let it execute in background. If the dataset is massive, the above is not really designed to scale.

There might be further enhancements for JOIN-like stuff in ES|QL going forward, I've read hints about that a few times, but as I dont work for elastic I've no inside info on what/when something might come.

You can close this topic by accepting an answer. I wish you luck with your project.

Topic		Replies	Views
Pull data from multiple index Elasticsearch	1	32	October 15, 2024
[Query] Count exist field in elasticsearch Elasticsearch	1	455	March 21, 2020
Querying in Elasticsearch Elasticsearch	4	340	May 23, 2018
How to perform join between two indexes in ES? Elasticsearch	5	309	September 2, 2022
Count in other index based on Current Index field Elasticsearch	6	1646	December 30, 2019

Getting the count of documents in one index based on the ID on another index

Related topics