Kibana dreaded timeout on dashboard

Hello,

I would like to have help on this "$?$&!!\$#$%@%" problem :smiley:

I run ~70gb daily index size.. - 3 shards 1 replica on two server 64gb RAM, 31.94 heap size, 8 core CPU 3ghz
YET if I try to go over 12hours of data on a dashboard ( 7visualization or less it is really annoying as hell and didnt notice really any "pattern".. ) I get a damn kibana timeout...

Any of you guyz could give me a hand on this ?

Ive run test... HEAP Usage is ok..

apparrentely its really the cpu that goes sky high ( load average 14 15, cpu usage stick a 100% for a bunch of time )
Heap Usage is about 21gb out of 31.94gb ...

nobody, seriously ?

I have a couple of questions about your situation:

  1. What version of Kibana are you on?
    On later versions you have the search profiler which can help you detect which visualization is clogging your stuff.
    https://www.elastic.co/blog/a-profile-a-day-keeps-the-doctor-away-the-elasticsearch-search-profiler
    You can check the statistics

  2. You can also try to increase the timeout in the kibana.yml file:
    elasticsearch.requestTimeout: 30000

This is the default value in milliseconds, you can try to increase it.

already did the "requestTimeout" thing.

unfortunately it was a no go... its literaly as if it was not even taken in consideration...
as for the version of kibana, im running 4.5.4

Thank you.

If you query a volume that returns without a timeout and look at the response, e.g. through Chrome developer tools, is there any visualisation that takes much longer then the others?

sorry but... do you have a little bit more explanation ?

I can get 15minutes 7 visualization. at once. no problem at all. up to 12 hours in fact.. ( most of the time ) BUT cannot go past it ...

What do you want me to do exactly or "provide" ?

If you request a few hours worth of data, you should be able to view the statistics (which includes time the aggregation took to ran) for each visualisation from within Kibana. If you look at the different visualisations and how long they took, is there any that stands out?

ill try to give it a shot.
Using F12 developper mode ? any specific tabs i should provide you information from ?

btw, thank you

In each visualisation there is an upwards arrow in the lower left corner. If you click on this you should see a button that says 'Statistics', which will provide this information.

12 hours



and a last one that is simply a "search view" in my dashboard ( so no stats )

It looks like you have some visualisations that require a lot of computation, especially the bottom one. If you are maxing out all CPU cores on both machines while querying, you may have hit the limit for what your cluster can handle, and may need to either try to make your dashboards less computationally intensive or scale out.

IF scaling out is not an option .. what option left do I have ?
btw. i just "removed" the botton visualisation, and tried to load the last 7days,

its a no go..

24hours, same thing.
12hours same thing .. so right now. cpu is loaded the hell up !

cpu usage 23% one nodes load av. 14, 10% the other one load av. 8

once the load dropped. I could load a view of the last 24hours.

How long is your retention period?

we plan, to be able to visualize up to three month of data. ( one month at once if needed.. but up to 3month active ) 1 year in archives

Upgrading to the latest 5.x release might actually help you. The 5.x releases have improved how caching works for indices covered entirely by the time period queried. If you used indices (with a single primary shard) that covered a smaller time period, e.g. a few hours, a good portion of the indices would be able to cache results for queries spanning longer time periods, resulting in faster response times. There is even a new rollover API that would allow you to create indices of a certain size irrespective of time period which may make this even easier to manage. The drawback with this approach is naturally that you may end up with a larger number of shards, which could become a problem for long term retention.

ok, so .. if I understand correctely . so long so far... im fuck*d ?

so .. how would I calculate adequately what I would need in term of CPU power ? if ever i was to scale horizontaly ... ?
please.

Why is upgrading not an option?

im running Siemonster Stack. its sort of all in one bundle with ossec etc...
so I assume its version dependant + you said that in the long term, I would have a problem with number of shards...

If we assume 4 hours per index with 1 primary and 1 replica shard, you would get 360 shards per month. With that 3 months online retention should be fine with your current setup. You may even be able to stretch it quite a bit further, so I would not worry about that.

Apart from upgrading or making the visualisations more light-weight, I do not really have any good suggestions at the moment. Maybe someone else will chime in?

so my problem right now is how my "shards" are made ?