Extreme memory pressure

Hello I've started importing documents to my server. I will be close to 400.000 when finished.

While importing using the Bulk-api (batches of 10000). Memory goes up to 99%. I manage to do a few of these bulk imports and they index but after a while the server crashes and restarts. Memory doesn't go down from what I can see. Or how long after indexing does it take to free up memory?

I started with the free dev server. But tried to upgrade it to 4gb memory but same problem.

I'm starting to think there is something wrong with my analyzer that creates too many tokens which uses too much memory.

I want to be able to search on "horse_name". Horse names are max 18 characters long and can contain white-space and characters like . , ` '

Could anyone point me in the direction of this problem with importing documents and look at the analyzer if its correct for my case?

I think this should be kind of basic and not comsume so much memory demanding an expensive server. Documents will not increase much maybe 10- or 20000 a year. Updates will be more frequent to existing documents.

"analysis": {
        "filter": {
          "horse_information_ngram_filter": {
            "token_chars": [
              "letter",
              "whitespace",
              "punctuation"
            ],
            "min_gram": "2",
            "type": "nGram",
            "max_gram": "8"
          },
          "filter_shingle": {
            "max_shingle_size": "4",
            "min_shingle_size": "2",
            "type": "shingle"
          }
        },
        "analyzer": {
          "horse_information_ngram_analyzer": {
            "filter": [
              "lowercase",
              "filter_shingle",
              "horse_information_ngram_filter"
            ],
            "type": "custom",
            "tokenizer": "standard"
          }
        }
      }
    }

cluser id is 404f6f

Hi,

This is more of a generic question on how to reduce memory usage in Elasticsearch, which is better fit for the Elasticsearch forum - so I've moved the question there.

That said, it looks like you should reduce your bulk size a lot, as they get queued and consume all the memory.

– Alex

Well I guess this thread contains several questions. Some generic related to Elastic Search but some of this is related to ES Cloud.

Sorry I'm just starting this so I'm kind of fresh :slight_smile:

The problem right now is that my cluster is down and I can't find a way in the Cloud console to bring it up.

The more generic question is how can my documents which is a bit more than 1.5gb in file space consume equal or just as much of memory? I must have configured something wrong… And what can I do to reduce it? The server is not under any sort of traffic pressure yet..

I don't understand what you mean about bulk size sorry!

Thanks for pointing me in the right direction!

Can anyone help me optimize this in any way? :innocent:

@slackday, I have a few questions:

  1. How many elastic search nodes are running in your cluster?
  2. How much memory is allocated to each node?
  3. You mention a batch of 10000, is that 10000 documents in a single bulk index request?
  4. Do you have multiple threads doing the indexing, or just one?

Hello @am87

  1. I have one node running in the cluster. Running the default ES Cloud.
  2. I started with the free tier server which has 1gb memory. I had to scale up from the free 2 week server to second tier with 2gb memory since the free one kept hanging after i indexed my documents.
  3. Yes I did batches of 10000 documents. It worked fine and my documents are indexed.
  4. I did it one thread.

Indexing was fine. Problem is server uses a lot of memory after the indexing. My documents take 1.64gb on disc and the amount of memory consumption is equal. Which make me suspect there is something wrong with my indexing. The server comes with 2gb memory and 64gb disc space. So document size and memory should be 1=1 ?

Else I won't be able to scale this anymore because upgrading more is costly.

I'd suggest you try to lower you're bulk size to 1000 and see if that makes any difference.

I know its indexing successfully but ES does a lot of stuff in the background with your data, sometimes even after you've indexed that document. I think reducing the bulk size should resolve this. Just try it. If it works but you don't like the throughput you can add some concurrency.

I am not an expert in ngram analyzers but I don't think thats the issue here.

Ok thanks I will try that.

And I will also try and debug my analyzer. If I look at the results using the _analyze API

/_analyze?analyzer=horse_information_ngram_analyzer&text=srumbs prima donna

it produces like 238 tokens for a 18 character name (18 chars is maximum length). Most of these are useless like

      {
         "token": "a d",
         "start_offset": 0,
         "end_offset": 18,
         "type": "word",
         "position": 0
      },
      {
         "token": "a do",
         "start_offset": 0,
         "end_offset": 18,
         "type": "word",
         "position": 0
      },
      {
         "token": "a don",
         "start_offset": 0,
         "end_offset": 18,
         "type": "word",
         "position": 0
      },
      {
         "token": "a donn",
         "start_offset": 0,
         "end_offset": 18,
         "type": "word",
         "position": 0
      },
      {
         "token": "a donna",
         "start_offset": 0,
         "end_offset": 18,
         "type": "word",
         "position": 0
      },

I should limit it to beginning of words atleast. If I understand correctly all tokens are kept in memory and that could be part of the explanation why these documents use so much memory.

Thanks for your input :slight_smile: