Hi,
We are testing Elasticsearch with a third party S3 compatible object store per the analyze API and had a question regarding retry behavior. We found this snippet in S3RetryingInputStream.java:
try (AmazonS3Reference clientReference = blobStore.clientReference()) {
final GetObjectRequest getObjectRequest = new GetObjectRequest(blobStore.bucket(), blobKey);
configureRequestForMetrics(getObjectRequest, blobStore, Operation.GET_OBJECT, purpose);
if (currentOffset > 0 || start > 0 || end < Long.MAX_VALUE - 1) {
assert start + currentOffset <= end
: "requesting beyond end, start = " + start + " offset=" + currentOffset + " end=" + end;
getObjectRequest.setRange(Math.addExact(start, currentOffset), end);
}
this.currentStreamFirstOffset = Math.addExact(start, currentOffset);
final S3Object s3Object = SocketAccess.doPrivileged(() -> clientReference.client().getObject(getObjectRequest));
this.currentStreamLastOffset = Math.addExact(currentStreamFirstOffset, getStreamLength(s3Object));
this.currentStream = s3Object.getObjectContent();
return;
This snippet suggests that an offset is maintained within Elasticsearch to retry the GetObjectRequest. However, it is not guaranteed that the object's contents would be the same on a subsequent retry given the lack of preconditions (e.g. If-Match/If-Modified-Since). This seems like it could mix data from an older object and a new object that was written during the fetch by S3RetryingInputStream.
Is there something else in Elasticsearch that would ensure a consistent view of the object here?