Hi all,
There is a use case, where I need some help from you guys. I wonder how should I do aggregation and which method would be the best.
Let's take an example. May be a school register.
Having 3 fields,
student_name, course_taken, grade_achieved, total_pages. (this is index1) (_id = student_course)
student_name, course_taken, course_book, book_page_name (this is index2) (_id = student_course_page)
I have 2 indices, each having 200,000 documents.
Out of these 200,000, there is one redundant field say name. Its count is around 7K. I want to compute total_pages of index1 out of index2.
Option 1:
So shall I run an 7K aggregations on index2 to compute total pages?
something like
{ "query":{ "match":{ //match student name //match course_taken // calculate number of pages } } }
Also, this computation is per hour. So basically, updating index1, index2 and calculating total_pages for 7K students each hour.
Option 2:
I can create a hashmap in python to handle such aggregation on the raw document before ingesting it into elasticsearch, calculating total_pages and inserting the answered data.
Which one should I go for?
Kindly guide. Thanks in advance!