We are using 0.18.6 in our production ec2 environment. We have three
java web application servers setup and two search servers and two
database servers, all ec2 large instances. We originally had the
database and search servers together, but we were seeing high CPU
usage and couldn't determine the cause so we split them up. What we
are seeing with the search servers is that each day during our peak
ours (8am - 6pm) the search instances are running the CPUs at 60 -
80%. Today we had an issue where we had to restart the search servers.
The web servers were dropping connections to the search servers
causing timeouts for the users and the search servers were at 80-85%.
Here are the details:
We are averaging about 1500 users logged in at a time during our peak
hours.
We have two search instances with 0.18.6 and three search clients (one
on each web server).
Each search instance has a 200GB ebs volume that it stores the data
to.
Our application is a multi-tenant application and currently we have
about 200 tenants.
Each tenant gets its own index with the default settings (5 shards per
index, etc).
Our application logs almost every request that a user makes (database
and search).
We currently have around 10,000,000 documents between all of the
indexes
We are currently at 45,000 open file descriptors.
Our CPU usage is between 60-80% daily on both search servers (I can
send screenshots of the EC2 charts).
My thoughts:
Maybe logging each request that a user makes through our site to
elastic search is causing this?
I don't think that creating a new index for every customer is the way
to go here, but I'm sure what the best way is. We are going to hit our
64k file descriptor limit very soon.
I'm looking for ways to improve our setup. Thoughts?