Upgrade from elastic 1.3.2 to 2.3.1 and more space for the indexes

saiprasad_mishra · May 6, 2016, 8:56pm

Hi All

I am seeing double the space when we migrated data for one index in es 1.3.2 version to 2.3.1 version even with doc_values false for our non analyzed fields

The index size was 772 GB in 1.3.2 and it became 1.3 TB in 2.3.1 version.
I see the norm files are much bigger now

Any idea if this is expected ...then we need to factor more capacity for the new version

Regards
Sai

warkolm · May 7, 2016, 1:46am

Did you run a force merge after you had finished indexing?

saiprasad_mishra · May 7, 2016, 2:31am

Thanks for the reply Mark

The size I mentioned in just after the import from knapsack was done.
I just finished running force merge and still the size has not reduced rather it has increased to 1.8TB

I will dig further. Pls. lmk if you have seen this kind of issue before

Regards
Sai

s1monw · May 9, 2016, 9:21am

hey, did you flush and refresh after the force merge? Unless you refresh you won't see the old space used by segments that existed prior to the force merge and that will likely result in higher disk usage? Also can you tell us how you are measure/get the size, is that an API call on ES (if so please post the result here) if not please show the output of your command.

saiprasad_mishra · May 9, 2016, 8:12pm

Thanks Simon for your time
I am doing a du -h on data directory of elastic in env1(1.3.2) and env2 new cluster(2.3.1) and below is the example for shard 17.

on env1 1.3.2 version

32G /data/elasticsearch/CDQARTH-ES-PROD/nodes/0/indices/products_t0_v9/17/index

on env2 2.3.1 version

97G ./data/elasticsearch/xxx.prod.xxx.com-data2/CDQARTH-ES-PROD/nodes/0/indices/products_t0_v9/17/index

Here one nvd file which is 63 GB

-rw-r--r--. 1 elasticsearch elasticsearch 63G May 6 22:27 _2nn.nvd

saiprasad_mishra · May 9, 2016, 8:17pm

Yes refresh is auto enabled every 15 mins
Also I did refresh manually but not flush after the force_merge to 1 segment

curl -XPOST 'http://search01.prod.elasticsearch.xxx.xxx.com/products_t0_v9/_forcemerge?max_num_segments=1'

Regards
Sai

dom · May 9, 2016, 11:20pm

I have the same issue, coming from 1.7.2 to 2.3.1.
The space usage has gone very high in the new upgraded environment, I am disabling the "_all" field for all the types in the index even though I wasn't doing that in the old version.

In may case I think the space has gone more than 10 fold up, I've geo_point and geo_shape fields for what it worth, which was the case in the old version.

In the old index/version the space usage was about 14KB per doc.
In the new index/version the space usage is about 1MB per doc, even though I disabled indexing a lot of fields that were indexed in the old version.

dom · May 10, 2016, 11:25am

I created a test set, old version vs new version, same mapping same documents set, below are the stats coming from head plugin.

ES 2.3.1
test-local-01
size: 8.43Gi (8.43Gi)
docs: 5,029 (6,762)

ES 1.7.2
test-local-01
size: 34.8Mi (34.8Mi)
docs: 5,029 (5,172)

I will hold on the upgrade till this is fixed.

warkolm · May 10, 2016, 8:48pm

Can you post/link to your mappings and a data sample?

dom · May 10, 2016, 11:02pm

Sure, it is also worth mentioning that indexing is prohibitively much slower in the new version.
In the old version it would take no more than 2 mins to index those 5029 documents, in the new version it took a few hours, we have about 8 mil docs in our production index, so it is impossible for us to upgrade at the moment.

Here is our mapping gist, how do you want the data? one doc sample for each type?

Thanks for your help.

warkolm · May 11, 2016, 4:01am

That's not right at all.
I know Simon has commented on here so let us take a look and get back to you.

connor · May 13, 2016, 11:22am

We seem to be having the same issue.

Importing data from Couchbase 4.0 Community to Elastic 2.3.1 the document size seem to have increased by nearly 4 fold, and the size multiple was growing the more it was importing.

We are currently running Couchbase 3.0 Community importing into Elastic 1.5.2. Our documents are typically very small (between 100 - 150 Bytes per doc). On ES 1.5.2 the avg doc size is about 150B. On ES 2.3.1 after only 2.5M docs imported the avg size per doc was 500B.

Also, import speed has slowed down significantly, but not sure where the issue for import speed would reside.

connor · May 13, 2016, 11:49am

Out of interest for anyone running into the same issues, we also have an issue logged on the elasticsearch-transport-couchbase github page:

saiprasad_mishra · May 16, 2016, 6:41pm

Hi All
My apology for the late reply
Just to follow up with my observation

ES 1.3.2 index size was 877 GB
ES 2.3.1 index size just after import became 1.13 TB (which may be because of doc values default to true as i have 2 fields with specific mapping which i did not set to false for doc_values so this is explainable, all other not_analyzed fields are false though. also _all is disabled )

but after this import i started running force_merge (_forcemerge?max_num_segments=1') then the size increased to 1.78 TB which clearly means older segments(all the files related to segements) are not getting deleted. So now this idex size remainsat 1.78 TB.

As I told earlier its the nvd file which takes the max space in a given shard
I see only one segments_2g file in each shard.

Regards
Sai

s1monw · May 17, 2016, 7:29am

apparently it's really hard to figure out what is going on without any way to reproduce it. Can somebody try and provide a reproduction that we can try on our end without downloading tons of data?

also can somebody provide segments stats for this index curl -XGET 'http://localhost:9200/_segments?verbose=true'

saiprasad_mishra · May 19, 2016, 12:07am

Hi Simon

Pls. find the segment details in the below gist

1.3.2 segments

gist.github.com

https://gist.github.com/saiprasadmishra/b2deb792c4c21b6cbb3fb3ad3efa6098

segment details of my index 132

{
  "_shards": {
    "total": 48,
    "successful": 48,
    "failed": 0
  },
  "indices": {
    "products_t0_v9": {
      "shards": {
        "0": [

This file has been truncated. show original

2.3.1 segments

gist.github.com

https://gist.github.com/saiprasadmishra/c57eb22793ff7e125d7d6f5b2660facf

segment details of my index 231

{
  "_shards": {
    "total": 24,
    "successful": 24,
    "failed": 0
  },
  "indices": {
    "products_t0_v9": {
      "shards": {
        "0": [

This file has been truncated. show original

s1monw · May 19, 2016, 7:34am

hey, thanks for the stats. I wonder if you can provide the output with ?verbose=true since I wanted to see the lower level details?! I also wonder why all these shards have more than one segment if you ran a force merge with commit=true&wait_for_completion=true.

There are also difference in the number of segments compared to how many segments are committed. This indicates you didn't refresh after the force merge? Old segment files will be deleted once they go out of scope so there might be readers open holding on to the segments.

While looking at it I realized that distribution of documents might be totally different since we change the hash function which won't allow shard by shard comparison so it's possible that shard X is much bigger in 2.x than in 1.x just due to different distributions. I think what we need is more insight into what takes space can you do run force_merge again and flush AND refresh so we can really tell what the differences are? Also please get us indices stats too https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html

s1monw · May 19, 2016, 7:37am

so 2 min for 5029 is already nuts - can you explain who you index your data? ie. I am interested in things like:

bulk vs. doc-a-time indexing
index settings
do you call any APIs per indexing request

I can see some geo fields in your mapping, do you have complex polygons you are indexing? Can you provide some data that takes so long including the mapping etc. so we can try to reproduce?

dom · May 19, 2016, 3:09pm

Simon
We are using couchbase transport plugin, it takes docs from couchbase and indexes them into Elasticsearch.

I am guessing the plugin uses bulk indexing, as things get slightly better when increasing the threadppol.bulk.queue_size (still slow that it is not usable).

The CB plugin does all the indexing, I never index directly in ES, we create/update in CB then the plugin uses XDCR replication to replicate into ES.

Our shapes are all circles, nothing complex, no specific index settings apart from the mapping, difference is startk as I have the same mapping running in old and new version (note that there is a different CB transport plugin version for each ES version).

The easiest way I can show you this is to have an online meeting or something to show the real deal in our test machines.

s1monw · May 24, 2016, 1:28pm

dom,

I need you to take variables out of the picture, can you try to manually index stuff into ES without the Couchbase plugin? It's crucial to me to figure out what is going on? Can you also paste your index settings please?

Topic		Replies	Views
Elasticsearch upgrade from 1.7.1 to 2.3.2 then create index very slow Elasticsearch	36	4600	July 5, 2017
ES Index performance Elasticsearch	26	1044	July 6, 2017
Index Size explosion (17 GB -> 840 GB) Elasticsearch	8	474	July 6, 2017
Increase in CPU usage and indexing time after upgrade from 0.19.2 -> 0.19.10 Elasticsearch	3	882	July 6, 2017
How many available disk space need for normal system running? Elasticsearch	7	587	July 6, 2017

Upgrade from elastic 1.3.2 to 2.3.1 and more space for the indexes

Related topics