Hi,
I'm using ECK on GKE. Almost every time I make any modifications (initial apply, edit) to the elasticsearch object, it times out. But only barely! If I increase the timeout, the command usually succeeds in 30.1s. For example, here's the output of kubectl edit es elasticsearch-eck --request-timeout=60s:
I0212 15:26:57.363492   68694 round_trippers.go:438] PUT https://<master>/apis/elasticsearch.k8s.elastic.co/v1/namespaces/default/elasticsearches/elasticsearch-eck?timeout=1m0s 200 OK in 30093 milliseconds
This isn't so bad locally, because I can easily bump the timeout to kubectl. However, it seems to also affect the elastic-operator, which also experiences the timeouts.
For example, here I'm trying to reduce the number of nodes. But the operator is stuck failing to update the annotation for minimum_master_nodes.
...
E 2020-02-12T18:45:51.190380453Z Updating minimum master nodes 
I 2020-02-12T18:45:51.203436Z Request Body:  <trimmed>
I 2020-02-12T18:45:51.203671Z curl -k -v -XPUT  -H "Accept: application/json, */*" -H "Content-Type: application/json" -H "User-Agent: elastic-operator/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer  <trimmed>" 'https://10.0.16.1:443/apis/elasticsearch.k8s.elastic.co/v1/namespaces/default/elasticsearches/elasticsearch-eck' 
E 2020-02-12T18:45:51.505008214Z Retrieving cluster state 
E 2020-02-12T18:46:01.504865401Z Retrieving cluster state 
E 2020-02-12T18:46:11.505025417Z Retrieving cluster state 
I 2020-02-12T18:46:21.206976Z PUT https://10.0.16.1:443/apis/elasticsearch.k8s.elastic.co/v1/namespaces/default/elasticsearches/elasticsearch-eck 504 Gateway Timeout in 30003 milliseconds 
I 2020-02-12T18:46:21.207016Z Response Headers: 
I 2020-02-12T18:46:21.207022Z     Audit-Id: a9ab3f93-c36c-4c28-a224-f97a03001822 
I 2020-02-12T18:46:21.207027Z     Content-Type: application/json 
I 2020-02-12T18:46:21.207030Z     Content-Length: 187 
I 2020-02-12T18:46:21.207034Z     Date: Wed, 12 Feb 2020 18:46:21 GMT 
I 2020-02-12T18:46:21.207082Z Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Timeout: request did not complete within requested timeout 30s","reason":"Timeout","details":{},"code":504} 
E 2020-02-12T18:46:21.207784068Z Ending reconciliation run 
E 2020-02-12T18:46:21.207793754Z Reconciler error 
...
This repeats for many hours and only occasionally, magically, gets through and makes progress.
Is there a way to bump this timeout? Or is there something else to look into regarding why these operations seem to take exactly the wrong amount of time?
I don't think there's anything terribly fancy in the config:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch-eck
spec:
  version: 6.8.6
  nodeSets:
  - name: default
    count: 3
    config:
      node.master: true
      node.data: true
      node.ingest: true
      processors: 8
      reindex.remote.whitelist: "*:9200"
      thread_pool.index.queue_size: 500
      thread_pool.write.queue_size: 500
      xpack.security.authc:
        anonymous:
          username: anonymous
          roles: superuser
          authz_exception: false
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 44Gi
            limits:
              memory: 44Gi
          env:
          - name: ES_JAVA_OPTS
            value: -Xmx12g -Xms12g -XX:-UseParallelGC -XX:-UseConcMarkSweepGC -XX:+UseG1GC
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: standard
  http:
    service:
      metadata:
        annotations:
          cloud.google.com/load-balancer-type: Internal
      spec:
        type: LoadBalancer
    tls:
      selfSignedCertificate:
        disabled: true
Thanks,
James