Production cluster slows down after 15-20 days of starting the services

I'd suggest looking at the stats api - https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html https://www.elastic.co/guide/en/elasticsearch/guide/current/_monitoring_individual_nodes.html

If that doesn't help than perhaps a heap dump + eclipse memory analyzer. Be aware though that you'll need a decent understanding of ES's internals to make sense of the classes and their hierarchy.