JAVA API: Joining Scrolled Search Results


#1

Hi,

I am having some issues trying to work out how to join multiple scrolled search results before returning them from a search function. My search function is as follows:

public JsonNode scrollSearch(SearchSourceBuilder searchSourceBuilder) throws IOException {
        ObjectMapper mapper = new ObjectMapper();
        List data = new LinkedList<>();
        
        SearchRequest searchRequest = new SearchRequest(ES_INDEX)
                .scroll(ES_SCROLL);
        
        // Add additional required parameters to query
        searchSourceBuilder .timeout(ES_TIMEOUT_MS)
                            .size(ES_BULK_SIZE);
        searchRequest.source(searchSourceBuilder);
        
        // Perform search query
        SearchResponse searchResponse = client.search(searchRequest); 
        
        String scrollId = searchResponse.getScrollId();
        SearchHit[] searchHits = searchResponse.getHits().getHits();
        
        while (searchHits != null && searchHits.length > 0) {
            // Add results to List object as they are retrieved
            for (SearchHit hit : searchHits) {
                data.add(mapper.valueToTree(hit));
            }
            
            // Using the Scroll ID, get the next set of results
            SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId)
                    .scroll(ES_SCROLL);
            searchResponse = client.searchScroll(scrollRequest);
            scrollId = searchResponse.getScrollId();
            searchHits = searchResponse.getHits().getHits();
        }

        // Clear Scrolled Query data from Server
        ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); 
        clearScrollRequest.addScrollId(scrollId);
        ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest);
        boolean succeeded = clearScrollResponse.isSucceeded();
        if (!succeeded) { LOGGER.log( Level.WARNING, "Scroll Query not cleared"); }
        
        return mapper.readTree(data.toString());
    }

The problem with the above, is that the returned string is still wrapped up in a single array [{....}]. When I use the low level JAVA API client, I don't have the issue, as it returns a normal response {....}. The reason I am trying to use the high level JAVA API, is that I am not restricted to 10k results with it.

Can anyone suggest a better way to join the results before returning them. This function may be called upon over a 1000 times in a single user process, with upwards of over 100k results. I have been looking at this issue for a couple of days now, and I am sure I am missing something :wink:

Thanks,

AoT


(David Pilato) #2

The problem with the above, is that the returned string is still wrapped up in a single array [{....}].

Do you mean that data.toString() gives you the wrong result? I believe this is expected as you created your data object as a List...

The reason I am trying to use the high level JAVA API, is that I am not restricted to 10k results with it.

That's a wrong assertion. Whatever the API you are using, you are limited by default to 10k results with "classic" search but not with "scroll" search.

Can anyone suggest a better way to join the results before returning them.

I believe that what you are doing looks fine BUT if you have 1 billion of results you are going to most likely blow up your client JVM...
Not sure what you are doing then with the resultset. What is your use case ?


#3

I will say, my normal day job is not programming in Java (I have not done any Java coding for close to 15 years). Can you recommend a more efficient way to consolidate the data before returning it to another function for processing?

Sorry, that was what I was trying to say :grin:

Yeah, we did a query today that returns just over 16 million responses, which then crashed the client that was trying to load them. I think I will have to put a hard limit, just to not crash the client. But we did manage to process 250k results in about a minute.

As for the use case, it is for real time analytics of transactional data. We probably should use Hadoop, but others made the decision based on what they knew, and they did not like having to plan their queries and wait for results.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.