Does ES make documents searchable before they are completely indexed?


(Bjarne Petersen) #1

Ok, I'm a novice so I appologize for silly questions.

When we create a set of documents for bulk indexing and immediately after make a request that should include those documents in php it will trigger a 'json_decode: integer overflow' and I suspect that is because some of the documents have missing values.

If we wait 1-2 minutes everything works as expected and a solution might be along the lines as suggested in https://github.com/ruflin/Elastica/issues/717.

I am curious as to why there is documents with missing values in the index (if it is indeed the case).

When doing bulk creation of documents, are ES creating "placeholders" for those documents and then filling in the values, or are we doing something silly when doing bulk indexing (in PHP)?

  • Bjarne

(Ramy) #2

Hi Bjarne,

Have you tried this Plugin: https://github.com/jprante/elasticsearch-jdbc
This plugin will help you to index your data in a very easy way. But take care about the different supported version.

cheers, Ramy


(Bjarne Petersen) #3

Hi Ramy

The plugin looks nice, but I think it is beyond what I am looking form. Indexing data is not really the issue in my case and is handled through the PHP library, which have been working nicely for several years.

We are adding a new feature where we add a bulk of data (<200) to an existing set of xxx millions across 3 indexes. And shortly after making a request that (only) match those data (in one index). The returned json returns an integer that PHP cannot handle (64 bit vs 32 bit), indicating that the retrieved data are incomplete (entries without values).

1-2 minutes after everything works at expected. I don't think we previously have had a case where we add a bulk of data and retrieves them shortly after.

My question is when indexing a bulk of data does ES make the data searchable before the data-set is complete, or does the PHP library have a glitch or is it our code that does something wrong?

I have been thinking of making the bulk-size smaller, but in our small test the bulk-size have been in the order of 5-10 items.


(Ramy) #4

Did you try to play with the refresh interval? It might help: https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-update-settings.html#bulk


(Bjarne Petersen) #5

Found the problem and it was our code.

Thanks for the help and sorry for the noise.


(system) #6