Elasticsearch version (bin/elasticsearch --version):
5.6.11 Plugins installed:
JVM version (java -version):
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)
OS version (uname -a if on a Unix-like system):
Linux myserver 4.4.0-131-generic #157-Ubuntu SMP Thu Jul 12 15:51:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
Currently im running into a disk space issue on my server and seem to have found the culprit: Supposibly according to some people the size of this index is ~ 40 gb. Which sounds ridiculous because the average list of coords that I save is about 5 elements.
This index has 11k documents.
What I have tried:
Deleted the index and filled it up again, this resulted in a MUCH MUCH smaller index on elasticsearch 40GB -> 14MB?? Which is why im very skeptical as to if this will work at all.
Tried to research trough google without any concrete solutions.
This set up is known to generate big indexes, in particular if you are indexing shapes that cover big areas (compared to the given precision).
This is one of the reason we have introduced from version 6.6 a new indexing strategy. There is still a few limitation but if they do not affect your use case I would recommend to upgrade to the latest version of Elasticsearch and use that strategy instead.
If upgrade is not an option, I would recommend setting the parameterdistance_error_pct or lower the precision of the indexed shapes.
You are indeed correct, somehow ES turned it into a "float" instead of a geo_shape. However when I now make the "new" index under a different name and try to _reindex, it wont let me because of the following error: [polygon] is defined as an object in mapping [my_polygons_1] but this name is already used for a field in other types
So _reindex doesnt work at all in this case and perhaps need to find a different way to see IF filling this new index results in a smaller disk size usage?
I think you are not creating the mapping before trying to reindex and that is the reason it is not working. The reindex command does not copy the mapping of the source index so it is trying to generate a dynamic mapping instead.
Also just did a test with your suggested "distance_error_pct" which is (mildly putting it) a huge increase in performance already.
Which brings me to the question;
Say I add said parameter to my mapping, how much is a "reasonable" amount to set this to? Currently I set it to 0.025 and was a huge increase. Does this mean it allows an error percentage of 2.5% based on my 1meter precision?
The parameter is a bit more complex and it works as follows:
When the parameter distance_error_pct is set, the algorithm computes the length of the bounding box of the provided shape. This length is multiplied by the value of that parameter and the result is the precision used to index the shape.
if the length of the diameter of the bounding box of the shape is 10 meters, then the precision for that shape is 0.025 meters. Because this is lower than the minimum given precision, the final precision will be 1m.
If the length of the diagonal is 100 meters, then the precision will be 2.5m.
If the length of the diagonal is 1000 meters, then the precision will be 25m.
and so on...
In summary, you will end up with your shapes indexed at variable precision depending on the area covered by the bounding box of such shape, with a maximum precision given by the precision parameter.
I applied the mentioned distance_error_pct to an index where the precision isn't all that important for now and the result for a small subset of a 100 documents seems to be massive already. We went from 280 MB to 1MB.
I do wonder if this could be applied elsewhere, I also use large very accurate (lots of coords) polygons but do think after reading trough the documentation isnt a good idea (i need accuracy but still determining whether how accurate it needs to be).
Again this seems to be the fix and am very grateful for your help!
Great to hear that this solution is good enough for you.
Note that if you require precision you will have to upgrade to take advantage of the new indexing strategy. In addition you need to be aware you are running on an unmaintained version as it has reached EOL.