Does ES make documents searchable before they are completely indexed?

mekanix · August 10, 2015, 12:10pm

Ok, I'm a novice so I appologize for silly questions.

When we create a set of documents for bulk indexing and immediately after make a request that should include those documents in php it will trigger a 'json_decode: integer overflow' and I suspect that is because some of the documents have missing values.

If we wait 1-2 minutes everything works as expected and a solution might be along the lines as suggested in https://github.com/ruflin/Elastica/issues/717.

I am curious as to why there is documents with missing values in the index (if it is indeed the case).

When doing bulk creation of documents, are ES creating "placeholders" for those documents and then filling in the values, or are we doing something silly when doing bulk indexing (in PHP)?

Bjarne

remram · August 10, 2015, 12:27pm

Hi Bjarne,

Have you tried this Plugin: https://github.com/jprante/elasticsearch-jdbc
This plugin will help you to index your data in a very easy way. But take care about the different supported version.

cheers, Ramy

mekanix · August 11, 2015, 7:29am

Hi Ramy

The plugin looks nice, but I think it is beyond what I am looking form. Indexing data is not really the issue in my case and is handled through the PHP library, which have been working nicely for several years.

We are adding a new feature where we add a bulk of data (<200) to an existing set of xxx millions across 3 indexes. And shortly after making a request that (only) match those data (in one index). The returned json returns an integer that PHP cannot handle (64 bit vs 32 bit), indicating that the retrieved data are incomplete (entries without values).

1-2 minutes after everything works at expected. I don't think we previously have had a case where we add a bulk of data and retrieves them shortly after.

My question is when indexing a bulk of data does ES make the data searchable before the data-set is complete, or does the PHP library have a glitch or is it our code that does something wrong?

I have been thinking of making the bulk-size smaller, but in our small test the bulk-size have been in the order of 5-10 items.

remram · August 11, 2015, 12:28pm

Did you try to play with the refresh interval? It might help: https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-update-settings.html#bulk

mekanix · August 12, 2015, 3:05pm

Found the problem and it was our code.

Thanks for the help and sorry for the noise.

Topic		Replies	Views
[SOLVED]: Bulk insert: Can't find the first document inserted Elasticsearch	1	602	July 5, 2017
Elasticsearch bulk index missing some records Elasticsearch	18	3755	August 2, 2018
Elasticsearch refreshing indices, but documents still don't show up in search Elasticsearch	3	222	December 19, 2022
Bulk API Insert Data missing Elasticsearch language-clients	4	1488	October 18, 2021
Issue with reindexing Elasticsearch	1	362	July 6, 2017

Does ES make documents searchable before they are completely indexed?

Related topics