Why is my Elasticsearch index using so much disk space?

Elasticsearch version (bin/elasticsearch --version):
5.6.11
Plugins installed:

JVM version (java -version):
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)

OS version (uname -a if on a Unix-like system):
Linux myserver 4.4.0-131-generic #157-Ubuntu SMP Thu Jul 12 15:51:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
Currently im running into a disk space issue on my server and seem to have found the culprit: Supposibly according to some people the size of this index is ~ 40 gb. Which sounds ridiculous because the average list of coords that I save is about 5 elements.

This index has 11k documents.

What I have tried:

  • Deleted the index and filled it up again, this resulted in a MUCH MUCH smaller index on elasticsearch 40GB -> 14MB?? Which is why im very skeptical as to if this will work at all.
  • Tried to research trough google without any concrete solutions.

The contents i posted were too big, so hereby a pastebin : https://pastebin.com/eMMiK8QD

I really hope someone might have some insight why it could possibly be this big.

I had a look into your mapping and I believe the reason is due to the following field:

             "polygon": {
                "type": "geo_shape",
                "tree": "quadtree",
                "precision": "1.0m"
              }

This set up is known to generate big indexes, in particular if you are indexing shapes that cover big areas (compared to the given precision).

This is one of the reason we have introduced from version 6.6 a new indexing strategy. There is still a few limitation but if they do not affect your use case I would recommend to upgrade to the latest version of Elasticsearch and use that strategy instead.

If upgrade is not an option, I would recommend setting the parameter distance_error_pct or lower the precision of the indexed shapes.

Hope this helps.

2 Likes

Thanks a bunch @Ignacio_Vera,

Do you happen to have an idea why a simple "_reindex" command for this index_1 to index_2 made such a huge difference in size?

Thanks in advance!

I am skeptical that the reindex command has indexed the polygon field as a geo_shape. Could you check the mapping of the second index?

You are indeed correct, somehow ES turned it into a "float" instead of a geo_shape. However when I now make the "new" index under a different name and try to _reindex, it wont let me because of the following error:
[polygon] is defined as an object in mapping [my_polygons_1] but this name is already used for a field in other types

So _reindex doesnt work at all in this case and perhaps need to find a different way to see IF filling this new index results in a smaller disk size usage?

I think you are not creating the mapping before trying to reindex and that is the reason it is not working. The reindex command does not copy the mapping of the source index so it is trying to generate a dynamic mapping instead.

1 Like

Ah! But I have explicitly created the new index to be exactly like the other one. The only difference being the name , the UUID and the creation_date.

Not sure if I follow, could you share the commands you are using for reindex (creation of new index, mapping and reindex command)?

My bad, hopefully the following mappings can give some more insight:
https://pastebin.com/gpJsNx0i

and the part thats currently giving me errors is:

POST /_reindex
{
  "source": {
    "index": "my_polygons_1"
  },
  "dest": {
    "index": "my_polygons_2"
  }
}

which results in the error:

[polygon] is defined as an object in mapping [my_polygons_1] but this name is already used for a field in other types```

Also just did a test with your suggested "distance_error_pct" which is (mildly putting it) a huge increase in performance already.

Which brings me to the question;
Say I add said parameter to my mapping, how much is a "reasonable" amount to set this to? Currently I set it to 0.025 and was a huge increase. Does this mean it allows an error percentage of 2.5% based on my 1meter precision?

Thanks again in advance!

The parameter is a bit more complex and it works as follows:

When the parameter distance_error_pct is set, the algorithm computes the length of the bounding box of the provided shape. This length is multiplied by the value of that parameter and the result is the precision used to index the shape.

if the length of the diameter of the bounding box of the shape is 10 meters, then the precision for that shape is 0.025 meters. Because this is lower than the minimum given precision, the final precision will be 1m.

If the length of the diagonal is 100 meters, then the precision will be 2.5m.

If the length of the diagonal is 1000 meters, then the precision will be 25m.
and so on...

In summary, you will end up with your shapes indexed at variable precision depending on the area covered by the bounding box of such shape, with a maximum precision given by the precision parameter.

Hope it makes sense.

Thanks a lot Ignacio!

I applied the mentioned distance_error_pct to an index where the precision isn't all that important for now and the result for a small subset of a 100 documents seems to be massive already. We went from 280 MB to 1MB.

I do wonder if this could be applied elsewhere, I also use large very accurate (lots of coords) polygons but do think after reading trough the documentation isnt a good idea (i need accuracy but still determining whether how accurate it needs to be).

Again this seems to be the fix and am very grateful for your help!

1 Like

Great to hear that this solution is good enough for you.

Note that if you require precision you will have to upgrade to take advantage of the new indexing strategy. In addition you need to be aware you are running on an unmaintained version as it has reached EOL.

Thanks and currently this do, and i'm aware of the EOL. But luckily the upgrade to the newer ES has been talked about and will (eventually) happen :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.