Rest client v5.5.0 compatibility with v5.6.x


#1

Are there any known compatibility issues between rest client v5.5.0 trying to index data into elasticsearch 5.6.2? We have been seeing, intermittently, indexing taking more than 10s.

We have our app that is using elastic rest client v5.5.0 and the cluster has recently been upgraded to v5.6.2.


(Mark Walkom) #2

I haven't seen or heard of anything, that is not canon though.

Are you using Monitoring to check the status of your cluster while this happens?


#3

What exactly am I looking for on the monitoring tool? Any specific charts you want me to look at ?


(Mark Walkom) #4

Merges, GC, refreshes, CPU increases.
It's hard to say because this could be anything, but look for anomalies around the time the longer requests take.

Also, you're using _bulk right?


#5

no... our use case is more like a realtime update... so no we are not using _bulk.

But our rate of writes is very low... and this is happening in a test env where we are writing data sequentially... think about an integration test running... indexing a record, reading it and then deleting it... the record size is about 1-2 KB.


(David Pilato) #6

Yeah. This is expected. You are basically doing a fsync on every single operation which is a costly operation.

Look at https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-document-bulk.html#java-rest-high-document-bulk-processor

I love this class.

You could change that "index.translog.durability": "async" index setting at your own risk (but well it's an integration Test here): https://www.elastic.co/guide/en/elasticsearch/reference/5.6/index-modules-translog.html#_translog_settings_2


#7

Thank you @dadoonet .

Additional FYI:

  1. We are using ES rest client v5.5.0. But the cluster is at v5.6.2 - This is a pre-production test cluster.
  2. When do indexing, immediately after that we invoke the refresh API as well. I know it is not the most efficient way to make data available sooner for searching but since our write rates are low, we thought we could live with this inefficiency. If there is a better way to handle this scenario then please let us know.

Based on what you have suggested and my usecase, I cannot use BulkProcesser API. In addition, if, by default, fsync is happening on each write, do I need to invoke refresh API to make the document searchable?

EDIT:
Wanted to add that the behavior I describe above in #2 is actually running live in our production cluster. We were evaluating v5.6.2 so that we can upgrade production to that newer version. For we have a bunch of integration tests that run sequentially. And during that test run, we ran into this issue. We are going to try and create new cluster with v5.5.0 to see if this issue persists or not. We had not seen this issue previously.

Thanks.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.