Elasticsearch Data refresh + miss issue

sandsriv · March 2, 2020, 7:06am

Just to give some background:

We have a 3 node elastic cluster running as docker service . Each VM has configuration having CPU 116, RAM 132G
We have a microservice based application which performs the write + read with Elasticsearch store.
2.1 We have a running process flow, where we are streaming ~ 11k parent data + 14k child data from Kafka and these data are further processed and fed to other Microservice application.
During this process, in case child data processing fails because of Parent data streaming or missing issue, we are storing (or writing ) the child data as documents in Elasticsearch nodes by using Spring Boot Repository BULK API.
The bulk sizes are not very overloaded as the above data sizes are just a small sample size test data.
Immediately when there is no further data left to store in Elasticsearch , all the stored data from Elasticsearch is read again by Spring boot repository API and re-processed.
However, it has observed that the full data are never returned or partially returned by elastic search node.
We are using the default Elasticsearch “refresh_interval” of “1s”.
Even, after explicit calling index Refresh API programmatically or having a explicit sleep interval of 1second in program not yielding the expected result.

The index setting : http://{{efk-host}}:9200/myTestindex/_settings
{
"myTestindex": {
"settings": {
"index": {
"number_of_shards": "3",
"provided_name": "myTestindex",
"max_result_window": "1000000",
"creation_date": "1582875738571",
"number_of_replicas": "1",
"uuid": "E3LHbDq6THemXIbcuX_czQ",
"version": {
"created": "6050499"
}
}
}
}
}

We have seen that the count of records during re-try is not syncing with actual data written to Elasticsearch. We are just printing the docs count in application.
e.g : During Retry if the count is e.g : 1000
Just after retry the count we are fetching again is e.g : 1050
This delta varies with different run but having the same data set.
Seems somehow explicit Refresh API invocation is not working or if any configuration issues.

Note: Our application ensures data re-try (reading from Elasticsearch) is performed only after all the child data are stored in Elasticsearch nodes.
The data size is not even 1% of actual Production use case.

system · March 30, 2020, 7:06am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
No efect refresh_interval Elasticsearch	5	491	July 6, 2017
Elasticsearch[7.7] Does not update data or takes time to update the data Elasticsearch	8	650	March 13, 2024
Search api returns no documents Elasticsearch	4	119	May 10, 2024
Slower Query Response Times - Intermittent Elasticsearch	7	497	April 30, 2021
Elasticsearch indexing rate fluctuation Elasticsearch	3	848	July 20, 2017

Elasticsearch Data refresh + miss issue

Just to give some background:

Related topics