Elasticsearch querying time

Hi all,

There is a use case, where I need some help from you guys. I wonder how should I do aggregation and which method would be the best.

Let's take an example. May be a school register.

Having 3 fields,
student_name, course_taken, grade_achieved, total_pages. (this is index1) (_id = student_course)
student_name, course_taken, course_book, book_page_name (this is index2) (_id = student_course_page)

I have 2 indices, each having 200,000 documents.
Out of these 200,000, there is one redundant field say name. Its count is around 7K. I want to compute total_pages of index1 out of index2.

Option 1:
So shall I run an 7K aggregations on index2 to compute total pages?
something like

{
"query":{
"match":{
//match student name
//match course_taken
// calculate number of pages
}
}
}

Also, this computation is per hour. So basically, updating index1, index2 and calculating total_pages for 7K students each hour.

Option 2:
I can create a hashmap in python to handle such aggregation on the raw document before ingesting it into elasticsearch, calculating total_pages and inserting the answered data.

Which one should I go for?

Kindly guide. Thanks in advance!

To be clear, you want a total sum of total_pages for each individual student_name?

PS - We aren't all guys :slight_smile:

Yes you are correct.

PS - Sorry about that! I take that back.

Thanks for clarifying!

You should be able to put together a terms aggregation on the name, and then sum the pages.

Thanks for the reply. The blocker is these are 2 different indices, since terms aggregation works when 2 fields belong to the same index, I wonder how to include input from another index.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.