Hello I've started importing documents to my server. I will be close to 400.000 when finished.
While importing using the Bulk-api (batches of 10000). Memory goes up to 99%. I manage to do a few of these bulk imports and they index but after a while the server crashes and restarts. Memory doesn't go down from what I can see. Or how long after indexing does it take to free up memory?
I started with the free dev server. But tried to upgrade it to 4gb memory but same problem.
I'm starting to think there is something wrong with my analyzer that creates too many tokens which uses too much memory.
I want to be able to search on "horse_name". Horse names are max 18 characters long and can contain white-space and characters like . , ` '
Could anyone point me in the direction of this problem with importing documents and look at the analyzer if its correct for my case?
I think this should be kind of basic and not comsume so much memory demanding an expensive server. Documents will not increase much maybe 10- or 20000 a year. Updates will be more frequent to existing documents.
This is more of a generic question on how to reduce memory usage in Elasticsearch, which is better fit for the Elasticsearch forum - so I've moved the question there.
That said, it looks like you should reduce your bulk size a lot, as they get queued and consume all the memory.
Well I guess this thread contains several questions. Some generic related to Elastic Search but some of this is related to ES Cloud.
Sorry I'm just starting this so I'm kind of fresh
The problem right now is that my cluster is down and I can't find a way in the Cloud console to bring it up.
The more generic question is how can my documents which is a bit more than 1.5gb in file space consume equal or just as much of memory? I must have configured something wrong… And what can I do to reduce it? The server is not under any sort of traffic pressure yet..
I don't understand what you mean about bulk size sorry!
I have one node running in the cluster. Running the default ES Cloud.
I started with the free tier server which has 1gb memory. I had to scale up from the free 2 week server to second tier with 2gb memory since the free one kept hanging after i indexed my documents.
Yes I did batches of 10000 documents. It worked fine and my documents are indexed.
I did it one thread.
Indexing was fine. Problem is server uses a lot of memory after the indexing. My documents take 1.64gb on disc and the amount of memory consumption is equal. Which make me suspect there is something wrong with my indexing. The server comes with 2gb memory and 64gb disc space. So document size and memory should be 1=1 ?
Else I won't be able to scale this anymore because upgrading more is costly.
I'd suggest you try to lower you're bulk size to 1000 and see if that makes any difference.
I know its indexing successfully but ES does a lot of stuff in the background with your data, sometimes even after you've indexed that document. I think reducing the bulk size should resolve this. Just try it. If it works but you don't like the throughput you can add some concurrency.
I am not an expert in ngram analyzers but I don't think thats the issue here.
I should limit it to beginning of words atleast. If I understand correctly all tokens are kept in memory and that could be part of the explanation why these documents use so much memory.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.