Benchmarking sequence of operations

tanenblatt · April 25, 2018, 7:50pm

Being relatively new to using Rally, maybe I'm missing something, but:

Our system is using Elasticsearch to select some documents, then uses data from those documents to select some other documents, and then uses data from those documents to select another set of documents, etc., until we reach some end state. This may require three or four levels of queries. We are trying to understand how different implementations of these queries might impact performance, so Rally seems to be the perfect tool, but we're not seeing how we can run one operation after another within a challenge, instead of one operation n times, then the next operation n times. The concern is how these two different approaches would be impacted by caching: it seems running a sequence of 4 operations n times would give us different results than each of the 4 operations n times individually.

Christian_Dahlqvist · April 26, 2018, 6:22am

For a complex series of operations like that with dependencies, I would probably recommend treating the full sequence of calls as a single operation. You can do that by creating a custom runner that executes the full series of operations and reports on this as a single unit. I did this when I wanted to simulate Kibana queries in my eventuate track. If you need to have control over how the tasks for this runner are created, you can handle this through a custom parameter source. Here is a link to what I used for my Kibana runner.

danielmitterdorfer · April 27, 2018, 6:35am

I think Christian has described quite well how you can implement this currently. I have raised https://github.com/elastic/rally/issues/486 in Rally so maybe we can support this out of the box at some point.

Christian_Dahlqvist · April 27, 2018, 7:04am

As the second phase of queries depend on the results from the first, I suspect a custom runner is the only way to go for this scenario. Being able to sequence operations is however a very useful addition for cases when they can all be driven off the same configuration.

danielmitterdorfer · April 27, 2018, 8:27am

Yes, if you need to feed data from one query to the next, I definitely agree.

tanenblatt · May 1, 2018, 1:42pm

Actually, there are cases where being able to sequence operations would be extremely helpful for us. One way we'd use that would be to run specific sets of canned sequences against a standard data set, to compare against previous sequences that provide the same results (In other words, we might have some set of operations Oa: { op1, op2, op3 } that are replaced by the semantically equivalent Ob: { op1, op2a, op2b, op3 }). We would do this in our testing, to assure that we haven't introduced any delays.

system · May 29, 2018, 1:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query Sequencing Elasticsearch	1	648	July 5, 2017
Is it possible to create a loop around a challenge Elasticsearch rally	7	1838	March 6, 2018
Does anyone know whether Rally would work with Elassandra? Elasticsearch rally	3	662	February 2, 2019
Chaining queries with rally Elasticsearch rally	1	247	September 7, 2023
Benchmarking search performance with randomised queries Elasticsearch rally	2	501	July 29, 2020

Benchmarking sequence of operations

Related topics