Scrolling vs Sizing vs Batching

Hi, I've been reading as much as I can, but I am still lost to a couple of terms and their interaction/difference:

  1. Scroll -> I don't understand this concept and how it interacts with size

  2. size (for an ES input for example) -- If I set a size of 50, how does a scroll of 10s affect that? It will take 10s to go through those results? It will wait 10s before calling ES again? Is it a case of which ever happens first?

Then it adds confusion for me for the pipeline.batch size. When I start logstash and I set the pipeline.batch size to 500, size to 50 and scrolling to 10s, what takes precedence?

I would love to be able to know how to actually manipulate these variables without just arbitrarily choosing. Any guidelines? I've looked all over and I feel like I am just as lost as when I started this quest :sweat_smile:

Scroll -> I don't understand this concept and how it interacts with size

When a query returns thousands or even millions of documents you don't want to fetch all of them in a single HTTP request, so Logstash asks ES to slice the result into smaller chunks (each of no more than size documents). This server-side chunking costs memory since ES needs to remember all active scroll contexts, i.e. which documents where included in the results and how many the client has already fetched. The scroll option controls the maximum time that this context is kept alive before it's removed by ES. The scroll time needs to be sufficiently long for Logstash to be able to process all the documents in the chunk.

Then it adds confusion for me for the pipeline.batch size. When I start logstash and I set the pipeline.batch size to 500, size to 50 and scrolling to 10s, what takes precedence?

Nothing, because you're comparing apples to oranges. These three options operate in different ways.

So if my logstash takes 20s to parse 50 documents, if I set size to 50 but scroll to 10s, I'll get an error?

Yes, if you have more than 50 documents.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.