Scroll Concept slows the search performance in latest Elastic version by php client

Hi all,

I have migrated my application's elastic search version from 2.4 to 6.3, in which i saw decrease in performance figures. Size parameter in query request in old version was 1000000000, and by this i was getting 2719674 documents in just 42 seconds, but in new version we need to use scroll concept with size=10000, which gives me same result in 426 seconds which is really not acceptable.
and code for that is as follows.

  $client  = $this->_createClient();
  $aParams = $this->_setContentType($aParams);

  $docs = $client->search($aParams);
 
  $iScrollId      = $docs['_scroll_id'];
  $aHits          = $docs['hits']['hits'];
  $iScrollSize    = $docs['hits']['total'];
  $aIDs           = array();

  if (is_array($aHits)) {
    do {
      foreach ($aHits as $aHit) {
        $iID = explode('_', $aHit['_id'])[0];
        if ($this->_aSourceFields != null) {
          $aFieldValues = $aHit['_source'];
          $aIDs[$iID]   = $aFieldValues;
        }
        else {
          array_push($aIDs, $iID);
        }
        if ($this->bHighlight) {
          $aHitHighlights[$iID] = $aHit['highlight'];
        }
      }
      $aScrollResults = array('scroll_id' => $iScrollId, 'scroll' => '1m');
      $aEachScroll    = $client->scroll($aScrollResults);
      $iScrollId      = $aEachScroll['_scroll_id'];
      $aHits          = $aEachScroll['hits']['hits'];
      $iScrollSize    = count($aHits);
    } while ($iScrollSize > 0);
	}
	$client->clearScroll($this->_deleteScrollRequest($iScrollId));

In the first request, scroll value is "10s"Please tell me if something to optimize the concept.

Thanks and regards,
Kshitij Yelpale

You can change index.max_result_window in your index settings to 1000000000 if you want to be able again to retrieve 1000000000 in one request.
Not sure I'd do this though... Memory pressure wise.

Hi David,

Yes exactly that's why elastic give birth to scroll concept.

Bye.

No scroll was there from the begining.

Anyway, if you were happy with 1000000000 and this is still working with your use case, then just use it again.

Is there any other way ?

I don't think there is. What is the problem?

Searching time is more.

With that solution?

Or with the "normal" scroll solution?

if i use index.max_result_window = 1000000000 then it can cause memory issue (is there any way to avoid this), but in the scroll part can you please provide me solution which will boost the performance and give results as older version ?

If you want the old behavior use index settings as I said and take the risk of memory pressure.

If you want to have a safer option use scroll but with the price of more time to read all the content.

Note that you can speed up extraction by using a sort on _doc as stated in documentation : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

Hi David,

Thank you, is it possible for you to tell me how sort by "_doc" improves scroll performance, i mean some internal logic. ?

Bye

It's because elasticsearch does not have to reorder documents while reading them. Just read them as they come. The default sort order on _score may have to reorder the documents which means that the deeper you go, the longer it takes. IIRC

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.