I am running a 3-nodes v2.4.0 elasticsearch cluster mainly indexing pdf files. I was using Base64.getEncoder().encodeToString(bytes) for the pdf content. This method is using a deprecated String method which cause some some tika exception; So I change to Base64.getEncoder().encode(bytes) but the index size increase dramatically that run of disk space on my ec2 instance.
Anybody seen this before? OS is Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-74-generic x86_64; java version "1.8.0_91" Java(TM) SE Runtime Environment (build 1.8.0_91-b14),Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)