Max value for elasticsearch.requestTimeout in kibana.yml configuration


#1

Hi,

I'm getting the following error if I increase the lookup duration to longer than the last 12 hours.

Searching the forum postings here, I found that it's related to the value of elasticsearch.requestTimeout configuration in the /etc/kibana/kibana.yml file.

But, what's the maximum value for this configuration?

I have 100GB of heap memory configured to use for ES on the server.

When I tried couple of long digit numbers, my Kibana GUI took a long time to start up and the GUI buttons are malfunctioning, although the GUI eventually came up.

Thanks in advance!

  • Young

(Lee Drengenberg) #2

I don't know if there's a maximum value for that timeout, but it's rare for users to have to set that timeout higher. I think we should take a look at what is causing your dashboard that long to load.

How many visualizations do you have on the dashboard?

You might want to try opening each visualization that is used and see if one of those is taking a long time to load.

On most visualizations you can click the little arrow near the bottom left (circled below), and then select Statistics (in red box below) to see the Request Duration.
NOTE: This box isn't visible in IE 11 browser (a bug).

Do you have just one large Elasticsearch node, or a cluster?

Regards,
Lee


#3

Hi Lee,

Thanks for your response.

I have only 1 dashboard with 3 graphs in it and my ELK 6.0 is running on one single node with 48 CPU cores and 256 GB of memory.

For each of the visualization, I tried increasing the duration to 24h and each of them was timing out with the same error.

For the statistics, got the following for 12h.

And, got the following error for 24h:

  • Young

(Lee Drengenberg) #4

Hi George,

Is Elasticsearch and Kibana running on local hardware you have? Or is some or all of it in the Cloud? It seems very strange that your 3 different visualizations would all take 25277ms +/- 1ms while the number of hits are orders of magnitude different in size. It really seems like there's some large latency (about 25 seconds) introduced somewhere.

I'm thinking it might help to turn on the slow log on your data index (looks like it might be packetbeat-*). Can you try to follow the steps here;

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-slowlog.html

Thanks,
Lee

Also, any load balancers or proxies anywhere between your browser, Kibana, and Elasticsearch?


(Lee Drengenberg) #5

George, I'm also wondering if you still have the default values in this visualization for Interval, and Size. If not, what are they now? You might have to click on this image to see the whole thing.


#6

Hi Lee,

Yes, both ElasticSearch 6 and Kibana 6 are running on the same local hardware.

I looked up the page you referenced and added the recommended entries in my /etc/elasticsearch/log4j2.properties file, as below:

# cat /etc/elasticsearch/log4j2.properties
status = error

# log action execution errors for easier debugging
logger.action.name = org.elasticsearch.action
logger.action.level = debug

appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n

appender.rolling.type = RollingFile
appender.rolling.name = rolling
appender.rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}.log
appender.rolling.layout.type = PatternLayout
appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%.-10000m%n
appender.rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}-%d{yyyy-MM-dd}-%i.log.gz
appender.rolling.policies.type = Policies
appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.rolling.policies.time.interval = 1
appender.rolling.policies.time.modulate = true
appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
appender.rolling.policies.size.size = 128MB
appender.rolling.strategy.type = DefaultRolloverStrategy
appender.rolling.strategy.fileIndex = nomax
appender.rolling.strategy.action.type = Delete
appender.rolling.strategy.action.basepath = ${sys:es.logs.base_path}
appender.rolling.strategy.action.condition.type = IfFileName
appender.rolling.strategy.action.condition.glob = ${sys:es.logs.cluster_name}-*
appender.rolling.strategy.action.condition.nested_condition.type = IfAccumulatedFileSize
appender.rolling.strategy.action.condition.nested_condition.exceeds = 2GB

rootLogger.level = info
rootLogger.appenderRef.console.ref = console
rootLogger.appenderRef.rolling.ref = rolling

appender.deprecation_rolling.type = RollingFile
appender.deprecation_rolling.name = deprecation_rolling
appender.deprecation_rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_deprecation.log
appender.deprecation_rolling.layout.type = PatternLayout
appender.deprecation_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%.-10000m%n
appender.deprecation_rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_deprecation-%i.log.gz
appender.deprecation_rolling.policies.type = Policies
appender.deprecation_rolling.policies.size.type = SizeBasedTriggeringPolicy
appender.deprecation_rolling.policies.size.size = 1GB
appender.deprecation_rolling.strategy.type = DefaultRolloverStrategy
appender.deprecation_rolling.strategy.max = 4

logger.deprecation.name = org.elasticsearch.deprecation
logger.deprecation.level = warn
logger.deprecation.appenderRef.deprecation_rolling.ref = deprecation_rolling
logger.deprecation.additivity = false

appender.index_search_slowlog_rolling.type = RollingFile
appender.index_search_slowlog_rolling.name = index_search_slowlog_rolling
appender.index_search_slowlog_rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_index_search_slowlog.log
appender.index_search_slowlog_rolling.layout.type = PatternLayout
appender.index_search_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %marker%.-10000m%n
appender.index_search_slowlog_rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_index_search_slowlog-%d{yyyy-MM-dd}.log
appender.index_search_slowlog_rolling.policies.type = Policies
appender.index_search_slowlog_rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.index_search_slowlog_rolling.policies.time.interval = 1
appender.index_search_slowlog_rolling.policies.time.modulate = true

logger.index_search_slowlog_rolling.name = index.search.slowlog
logger.index_search_slowlog_rolling.level = trace
logger.index_search_slowlog_rolling.appenderRef.index_search_slowlog_rolling.ref = index_search_slowlog_rolling
logger.index_search_slowlog_rolling.additivity = false

appender.index_indexing_slowlog_rolling.type = RollingFile
appender.index_indexing_slowlog_rolling.name = index_indexing_slowlog_rolling
appender.index_indexing_slowlog_rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_index_indexing_slowlog.log
appender.index_indexing_slowlog_rolling.layout.type = PatternLayout
appender.index_indexing_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %marker%.-10000m%n
appender.index_indexing_slowlog_rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_index_indexing_slowlog-%d{yyyy-MM-dd}.log
appender.index_indexing_slowlog_rolling.policies.type = Policies
appender.index_indexing_slowlog_rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.index_indexing_slowlog_rolling.policies.time.interval = 1
appender.index_indexing_slowlog_rolling.policies.time.modulate = true

index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms
index.indexing.slowlog.level: info
index.indexing.slowlog.source: 1000

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms

logger.index_indexing_slowlog.name = index.indexing.slowlog.index
logger.index_indexing_slowlog.level = trace
logger.index_indexing_slowlog.appenderRef.index_indexing_slowlog_rolling.ref = index_indexing_slowlog_rolling
logger.index_indexing_slowlog.additivity = false

Thanks.

  • Young

#7

Hi Lee,

I have the following custom interval of 15m instead of auto.

I just tried changing it to "auto" but still got the same timeout error.

Thanks.

  • Young


(Lee Drengenberg) #8

So if that slow log setting worked, and you refresh one of those visualizations, you should see a new log file under /var/log/elasticsearch or your elasticsearch/logs/ directory (depending on how you installed). I think the log file name should contain the index name like packetbeat...log. I'm not really sure if it's going to just show you the same query that we can see by looking at the request in Kibana which really wouldn't help.


If you go to Kibana Management > Advanced Settings you could try lowering this setting;

histogram:barTarget
Default: 50
Attempt to generate around this many bars when using "auto" interval in date histograms

I changed mine from 50 to 25 and that increases the size interval that Auto will select. So for my Last 24 hours timespan the auto interval changed from 30 minutes to 1 hour (as seen in the x-axis legend).


Also, feedback from Elasticsearch developers; a single Elasticsearch node with 100gb heap will be a lot slower than just running multiple nodes (with default heap size) on the same server

I think it has to do with garbage collection on a very large heap.


(Lee Drengenberg) #9

Also, on that particular visualization Top hosts creating traffic have you increased this size field drastically? I'm just trying to think of anything that could explode the results.

image


#10

Decreased from 50 to 25 on the Kibana setting:

Also, changed it to "auto":

Then, restarted all the services but the same error popped up.

Only reason I increased the heap size to 100GB was in case it could help against my errors.

Apparently, it didn't help at all.

No matter what heap size I had, I have been getting these errors throughout the times whenever duration is > 24h.

Thanks.

  • Young

#11

No, I didn't touch the size field and it's been always 5.


#12

No. The local hardware is in our LAN and I connect from my MacBook-Pro laptop directly to Kibana @ http://:5601 using Chrome (v62).


(Lee Drengenberg) #13

Are you running on Ubuntu? I haven't had this problem myself on Ubuntu but here's a post about an issue and a solution;


#14

I'm using CentOS 7.4.


#15

FYI, after I changed the duration to 24h, I see the following lines appearing in the log file:


(Lee Drengenberg) #16

Maybe we should jump back to this for a moment.

Could you please try setting the elasticsearch.requestTimeout to something like 90000 (90 seconds) and see if one of those visualization will load 24 hours of data? I'm thinking that if 12 hours is taking under 30 seconds then 24 hours should take around 60 seconds. We might just inch our way towards you getting everything working.


Another test is to take the request that the "Top hosts creating traffic" visualization is running, and use curl to run that on the Elasticsearch server directly (ssh onto that machine to avoid any network latency, etc.) and see how long it takes. If it's really taking that long, then I think you should start a new post on the Elasticsearch discuss channel because it's not anything that Kibana can do to fix.

time curl http://localhost:9200/packetbeat-*/doc/_search -H "Content-type: application/json" -d '{
  "size": 0,
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h",
        "time_zone": "America/Chicago",
        "min_doc_count": 1
      },
      "aggs": {
        "3": {
          "terms": {
            "field": "source.ip",
            "size": 5,
            "order": {
              "1": "desc"
            }
          },
          "aggs": {
            "1": {
              "sum": {
                "field": "source.stats.net_bytes_total"
              }
            }
          }
        }
      }
    }
  },
  "highlight": {
    "pre_tags": [
      "@kibana-highlighted-field@"
    ],
    "post_tags": [
      "@/kibana-highlighted-field@"
    ],
    "fields": {
      "*": {}
    },
    "require_field_match": false,
    "fragment_size": 2147483647
  },
  "_source": {
    "excludes": []
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    "@timestamp",
    "last_time",
    "start_time",
    "tls.client_certificate.not_after",
    "tls.client_certificate.not_before",
    "tls.server_certificate.not_after",
    "tls.server_certificate.not_before"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        },
        {
          "query_string": {
            "query": "type: flow",
            "analyze_wildcard": true,
            "default_field": "*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1512513175834,
              "lte": 1512599575834,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "filter": [],
      "should": [],
      "must_not": []
    }
  }
}'

time for the response on my little VM was 95ms, but I have much less data than you. Some of the important details are in the beginning part of the response;

{"took":11,"timed_out":false,"_shards":{"total":10,"successful":10,"skipped":0,"failed":0},"hits":{"total":41741,

#17

Just added elasticsearch.requestTimeout: 90000 to /etc/kibana/kibana.yml and it's working now for the last 24h! :slight_smile:

The values I had tried before and didn't work were something like: 9999999999 :blush:

Thank you very much, Lee!

  • Young

#18

I got the following:


(Lee Drengenberg) #19

Good to hear you're getting data for a longer time period!

I'm think that reducing that heap size along with starting up more elasticsearch nodes (even on the same server) is recommended to improve your performance. But again, that's something you would get better information on in the Elasticsearch channel.


(system) #20

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.