Very Slow Scrolling

blanks · June 7, 2017, 7:30am

I want to get all my data (logs) out of elasticsearch with the elastic package for r. I use the scrolling api (size 10000), but it takes forever (9 minutes) to get 2,5 million documents. I have one node with a SSD and 16 GB ram. 8GB are reserved for elasticsearch. Indicies are on a monthly basis and one shard per index.
CPU usage is arround 20% and heap usage between 3-4 GB

Any ideas what the problem could be or is this normal ?

Christian_Dahlqvist · June 7, 2017, 7:58am

What does your scroll query look like? What is the size of your documents?

blanks · June 8, 2017, 6:54am

My query is quite long:
NOT uriStem: images AND NOT uriStem: includes AND NOT uriStem: favicon.ico AND NOT uriStem:style AND NOT uriStem: *sta AND NOT uriStem: *png AND NOT uriStem: *zip AND NOT uriStem: *txt AND NOT uriStem: *csv AND NOT uriStem: pdf AND NOT userAgent: www.bla.com AND NOT userAgent: www.blub.com AND NOT userAgent: www.bing.com AND NOT userAgent: www.baidu.com AND NOT uriStem:robots.txt AND requestHost: tada AND NOT uriStem: test AND NOT uriStem: leer.asp AND NOT leer.htm AND NOT uriStem: portal.asp AND NOT uriStem.keyword:"/" AND NOT uriStem: main.asp AND NOT uriStem: Popup AND NOT uriStem: Calendar.asp

But it makes no diffrence if my query is just: serverName: blub
The rest parameter for the scrolling are : scroll = "1m", size=10000

The size of my documents is:
Documents: 39,344,890
Data: 14GB

Christian_Dahlqvist · June 8, 2017, 6:58am

Leading wildcard queries are very, very inefficient, so several of these combined with all the NOT clauses explain your poor performance. If these criteria are known upfront, could you perhaps analyse each record with respect to these at ingest time and add a simple flag in order to simplify this and make it much more efficient?

blanks · June 8, 2017, 12:19pm

Yeah I thought as well that the wildcards are the problem, but why do I get similar performance with a query like this:
serverName: bla
It takes the same time with the same result size.

You meant flag them while processing in logstash ? sadly thats not an option, because those params can change with time

system · July 6, 2017, 12:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic Search Nest Slow Scroll Speed Elasticsearch	5	513	April 26, 2021
How to get data more than 10000 in elasticsearch Elasticsearch	27	21639	January 17, 2018
Scroll/Scan API, pockets of slow responses while scrolling Elasticsearch	5	1808	July 6, 2017
Elastic Search - Scrolling for Not too many documents? Elasticsearch	2	454	March 19, 2019
Scrolling performance Elasticsearch	5	1751	July 6, 2017

Very Slow Scrolling

Related topics