Elasticsearch querying time

AditiKhalatkar · September 4, 2020, 7:23pm

Hi all,

There is a use case, where I need some help from you guys. I wonder how should I do aggregation and which method would be the best.

Let's take an example. May be a school register.

Having 3 fields,
student_name, course_taken, grade_achieved, total_pages. (this is index1) (_id = student_course)
student_name, course_taken, course_book, book_page_name (this is index2) (_id = student_course_page)

I have 2 indices, each having 200,000 documents.
Out of these 200,000, there is one redundant field say name. Its count is around 7K. I want to compute total_pages of index1 out of index2.

Option 1:
So shall I run an 7K aggregations on index2 to compute total pages?
something like

{
"query":{
"match":{
//match student name
//match course_taken
// calculate number of pages
}
}
}

Also, this computation is per hour. So basically, updating index1, index2 and calculating total_pages for 7K students each hour.

Option 2:
I can create a hashmap in python to handle such aggregation on the raw document before ingesting it into elasticsearch, calculating total_pages and inserting the answered data.

Which one should I go for?

Kindly guide. Thanks in advance!

warkolm · September 7, 2020, 12:21am

To be clear, you want a total sum of total_pages for each individual student_name?

PS - We aren't all guys

AditiKhalatkar · September 8, 2020, 12:40am

Yes you are correct.

PS - Sorry about that! I take that back.

warkolm · September 8, 2020, 12:54am

Thanks for clarifying!

You should be able to put together a terms aggregation on the name, and then sum the pages.

AditiKhalatkar · September 8, 2020, 6:02pm

Thanks for the reply. The blocker is these are 2 different indices, since terms aggregation works when 2 fields belong to the same index, I wonder how to include input from another index.

system · October 6, 2020, 6:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregate counting/sum query Elasticsearch	5	3540	March 3, 2019
Aggregation Sum is very slow Elasticsearch	1	539	October 9, 2018
Multiple aggregation in one request vs one aggregation per request performance Elasticsearch	3	5372	May 8, 2017
Aggregation on most recent document in a group Elasticsearch	5	946	May 22, 2018
Best approach to totals when aggregating many documents Elasticsearch	1	407	April 18, 2017

Elasticsearch querying time

Related topics