Hi,
I am having some issues trying to work out how to join multiple scrolled search results before returning them from a search function. My search function is as follows:
public JsonNode scrollSearch(SearchSourceBuilder searchSourceBuilder) throws IOException {
ObjectMapper mapper = new ObjectMapper();
List data = new LinkedList<>();
SearchRequest searchRequest = new SearchRequest(ES_INDEX)
.scroll(ES_SCROLL);
// Add additional required parameters to query
searchSourceBuilder .timeout(ES_TIMEOUT_MS)
.size(ES_BULK_SIZE);
searchRequest.source(searchSourceBuilder);
// Perform search query
SearchResponse searchResponse = client.search(searchRequest);
String scrollId = searchResponse.getScrollId();
SearchHit[] searchHits = searchResponse.getHits().getHits();
while (searchHits != null && searchHits.length > 0) {
// Add results to List object as they are retrieved
for (SearchHit hit : searchHits) {
data.add(mapper.valueToTree(hit));
}
// Using the Scroll ID, get the next set of results
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId)
.scroll(ES_SCROLL);
searchResponse = client.searchScroll(scrollRequest);
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
}
// Clear Scrolled Query data from Server
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest);
boolean succeeded = clearScrollResponse.isSucceeded();
if (!succeeded) { LOGGER.log( Level.WARNING, "Scroll Query not cleared"); }
return mapper.readTree(data.toString());
}
The problem with the above, is that the returned string is still wrapped up in a single array [{....}]. When I use the low level JAVA API client, I don't have the issue, as it returns a normal response {....}. The reason I am trying to use the high level JAVA API, is that I am not restricted to 10k results with it.
Can anyone suggest a better way to join the results before returning them. This function may be called upon over a 1000 times in a single user process, with upwards of over 100k results. I have been looking at this issue for a couple of days now, and I am sure I am missing something
Thanks,
AoT