Scrolling using low-level .NET driver

I am a real newbie to elasticsearch :frowning: I'm at a the stage where I copy & paste and try to intelligently edit source code in examples I can find.

I'm having a very hard time find .NET examples for scrolling though results.

I dynamically create a query string (convert a user-specified Boolean search expression into an elasticsearch string) that I want to locate in my documents.

By the time the code invokes the .NET driver, the query is in a variable (queryText).

Here's a sample queryText value:

{
    "_source" : "talentId",
    "query" : {
        "bool" : {
            "must" : [ { "match_phrase" : {  "freeText" : "java"  } }  ]
        }
    }
} 

The result set can be in the millions.

Currently I'm just getting the first 10 results (even I know that there are millions of matching documents).

I want to efficiently scroll through the result.

I have not been able to figure out how to code it.

My non-scrolling version:

StringResponse result = client.LowLevel.Search<StringResponse>(searchIndex, queryText);

I've tried many variations but I get lost (and the online elasticsearch documentation assumes a level of expertise in ES that I simply don't have (yet)):

var scanResults = client.ScrollAll<StringResponse>("1m", 5, s => s.Search(t => t.Index(SearchIndex).Query(q => queryText)))

Help!

-Thanks in advance
David

If you're just starting out, you may want to use the high level client, NEST, to begin, as it maps out all the request and response types for all APIs, making it easier to get started with.

NEST: high level client

Basic scroll usage

Basic scroll usage with the high level client looks like

var client = new ElasticClient(settings);	
var scroll = new Time(1, TimeUnit.Minute);	

var searchResponse = client.Search<dynamic>(s => s
	.Scroll(scroll)
	.Source(sf => sf
		.Includes(f => f
			.Field("talentId")
		)
	)
	.Query(q => q
		.Bool(b => b
			.Must(mu => mu
				.MatchPhrase(mp =>mp
					.Field("freeText")
					.Query("java")
				)
			)
		)
	)
);
	
while (searchResponse.IsValid && searchResponse.Documents.Any())
{
	DoSomething(searchResponse);	
	searchResponse = client.Scroll<dynamic>(scroll, searchResponse.ScrollId);
}

// tell Elasticsearch to clear the scroll now we have finished with it
var clearScrollResponse = client.ClearScroll(c => c.ScrollId(searchResponse.ScrollId));

private static void DoSomething<T>(ISearchResponse<T> response) where T : class
{
	// do something with response
}

First call to the search API with a scroll value will initiate a scrolling. Then, while documents are returned in a response, keep asking for the next batch of scroll results using the scroll API. Do something with the results in DoSomething<T>. You probably want to change dynamic to a POCO type that maps to the _source in the targeted indices.

Using ScrollAll

Basic scroll usage is fine, but perhaps you want to slice a scroll into multiple partitions, allowing you to scroll them concurrently. You can use the ScrollAll observable helper method for this.

Taking the previous example, and converting it over to using ScrollAll

var client = new ElasticClient();	
var scroll = new Time(1, TimeUnit.Minute);

// slices should be set to something meaningful for the target index	,
// based on number of shards
var slices = Environment.ProcessorCount;

var scrollAllObservable = client.ScrollAll<dynamic>(scroll, slices, sc => sc
	.MaxDegreeOfParallelism(slices)
	.Search(s => s
		.Source(sf => sf
			.Includes(f => f
				.Field("talentId")
			)
		)
		.Query(q => q
			.Bool(b => b
				.Must(mu => mu
					.MatchPhrase(mp => mp
						.Field("freeText")
						.Query("java")
					)
				)
			)
		)
	)
);

var waitHandle = new ManualResetEvent(false);
ExceptionDispatchInfo e = null;

var scrollObserver = new ScrollAllObserver<dynamic>(
	next => 
	{
		DoSomething(next.SearchResponse);
	},
	error => 
	{
		e = ExceptionDispatchInfo.Capture(error);
		waitHandle.Set();
	},
	() => waitHandle.Set()
);

// initiate observing
scrollAllObservable.Subscribe(scrollObserver);

// wait until all scrolling is complete.
waitHandle.WaitOne();

// if an exception was captured, throw it
if (e != null)
	e.Throw();

Take a look at the documentation for choosing a number of slices.

Elasticsearch.Net: low level client

Scrolling with the low level client is very similar to the basic usage with the high level client

var client = new ElasticLowLevelClient();
var scroll =  TimeSpan.FromMinutes(1);

var searchRequestParameters = new SearchRequestParameters
{
	Scroll = scroll
};

var searchResponse = client.Search<DynamicResponse>("posts", PostData.Serializable(
new 
{
	_source = "talentId",
	query = new 
	{
		@bool = new
		{
			must = new[] 
			{
				new { match_phrase = new { freeText = "java" } }	
			}
		}
	}
}), searchRequestParameters);

while (searchResponse.Success && ((List<dynamic>)searchResponse.Body["hits"]["hits"]).Any())
{
	DoSomething(searchResponse);
	
	searchResponse = client.Scroll<DynamicResponse>(PostData.Serializable(
	new 
	{
		scroll = "1m",
		scroll_id = searchResponse.Body["_scroll_id"].ToString()
	}));
}

// tell Elasticsearch to clear the scroll now we have finished with it
var clearScrollResponse = client.ClearScroll<DynamicResponse>(PostData.Serializable(
new 
{
	scroll_id = searchResponse.Body["_scroll_id"].ToString()
}));

private static void DoSomething(DynamicResponse response)
{
	// do something with response
}

This is using DynamicResponse, a response implementation that returns the response body as a dynamic type. You could use StringResponse, BytesResponse, etc. and handle the deserialization of the response in a way that better suits your needs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.