JAVA API: Joining Scrolled Search Results

avataroftime · June 18, 2018, 9:29pm

Hi,

I am having some issues trying to work out how to join multiple scrolled search results before returning them from a search function. My search function is as follows:

public JsonNode scrollSearch(SearchSourceBuilder searchSourceBuilder) throws IOException {
        ObjectMapper mapper = new ObjectMapper();
        List data = new LinkedList<>();
        
        SearchRequest searchRequest = new SearchRequest(ES_INDEX)
                .scroll(ES_SCROLL);
        
        // Add additional required parameters to query
        searchSourceBuilder .timeout(ES_TIMEOUT_MS)
                            .size(ES_BULK_SIZE);
        searchRequest.source(searchSourceBuilder);
        
        // Perform search query
        SearchResponse searchResponse = client.search(searchRequest); 
        
        String scrollId = searchResponse.getScrollId();
        SearchHit[] searchHits = searchResponse.getHits().getHits();
        
        while (searchHits != null && searchHits.length > 0) {
            // Add results to List object as they are retrieved
            for (SearchHit hit : searchHits) {
                data.add(mapper.valueToTree(hit));
            }
            
            // Using the Scroll ID, get the next set of results
            SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId)
                    .scroll(ES_SCROLL);
            searchResponse = client.searchScroll(scrollRequest);
            scrollId = searchResponse.getScrollId();
            searchHits = searchResponse.getHits().getHits();
        }

        // Clear Scrolled Query data from Server
        ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); 
        clearScrollRequest.addScrollId(scrollId);
        ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest);
        boolean succeeded = clearScrollResponse.isSucceeded();
        if (!succeeded) { LOGGER.log( Level.WARNING, "Scroll Query not cleared"); }
        
        return mapper.readTree(data.toString());
    }

The problem with the above, is that the returned string is still wrapped up in a single array [{....}]. When I use the low level JAVA API client, I don't have the issue, as it returns a normal response {....}. The reason I am trying to use the high level JAVA API, is that I am not restricted to 10k results with it.

Can anyone suggest a better way to join the results before returning them. This function may be called upon over a 1000 times in a single user process, with upwards of over 100k results. I have been looking at this issue for a couple of days now, and I am sure I am missing something

Thanks,

AoT

dadoonet · June 19, 2018, 6:36am

The problem with the above, is that the returned string is still wrapped up in a single array [{....}].

Do you mean that data.toString() gives you the wrong result? I believe this is expected as you created your data object as a List...

The reason I am trying to use the high level JAVA API, is that I am not restricted to 10k results with it.

That's a wrong assertion. Whatever the API you are using, you are limited by default to 10k results with "classic" search but not with "scroll" search.

Can anyone suggest a better way to join the results before returning them.

I believe that what you are doing looks fine BUT if you have 1 billion of results you are going to most likely blow up your client JVM...
Not sure what you are doing then with the resultset. What is your use case ?

avataroftime · June 19, 2018, 7:10pm

I will say, my normal day job is not programming in Java (I have not done any Java coding for close to 15 years). Can you recommend a more efficient way to consolidate the data before returning it to another function for processing?

Sorry, that was what I was trying to say

Yeah, we did a query today that returns just over 16 million responses, which then crashed the client that was trying to load them. I think I will have to put a hard limit, just to not crash the client. But we did manage to process 250k results in about a minute.

As for the use case, it is for real time analytics of transactional data. We probably should use Hadoop, but others made the decision based on what they knew, and they did not like having to plan their queries and wait for results.

system · July 17, 2018, 7:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scroll vs Search API Elasticsearch	7	11053	July 5, 2017
Scrolling through entire result set is taking too much time Elasticsearch	1	379	July 6, 2017
SearchScroll hangs when dealing with a big amount of hits Elasticsearch	2	848	July 6, 2017
How to scroll aggregation results using Java API Elasticsearch	3	1591	July 5, 2017
Iterator over whole result set (Java API) Elasticsearch	5	6816	July 6, 2017

JAVA API: Joining Scrolled Search Results

Related topics