ElasticSearch with CouchDB and memory consumption


(yojimbo87) #1

Hi folks! Let's say that I have a data in CouchDB which are indexed
for searching by ElasticSearch through River. I want to ask:

  1. How ES deals with a situation when the dataset stored in CouchDB
    and indexed by ES does not fit into memory?

  2. Does ES need to store the whole dataset for indexing purpose into
    memory or it only stores part of the original data?

Thanks for your time and help.


(David Pilato) #2

Hi

Not sure I understand your question so forgive me if I answer to another
question :wink:

ES will index the full document you provide from couchDB to ES.
But, you can define a mapping before starting the river to ignore fields.

Hope it answers to your question.

David.

Le 21 novembre 2011 à 11:30, yojimbo87 bosak.tomas@gmail.com a écrit :

Hi folks! Let's say that I have a data in CouchDB which are indexed
for searching by ElasticSearch through River. I want to ask:

  1. How ES deals with a situation when the dataset stored in CouchDB
    and indexed by ES does not fit into memory?

  2. Does ES need to store the whole dataset for indexing purpose into
    memory or it only stores part of the original data?

Thanks for your time and help.
--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(yojimbo87) #3

Thanks David, this answered my second question, however I would also
like to know what happens in case my dataset doesn't fit into memory.
Is it still possible to use ES functionality when there is not enough
RAM to hold all documents and index them?

On Nov 21, 12:11 pm, "da...@pilato.fr" da...@pilato.fr wrote:

Hi

Not sure I understand your question so forgive me if I answer to another
question :wink:

ES will index the full document you provide from couchDB to ES.
But, you can define a mapping before starting the river to ignore fields.

Hope it answers to your question.

David.

Le 21 novembre 2011 à 11:30, yojimbo87 bosak.to...@gmail.com a écrit :

Hi folks! Let's say that I have a data in CouchDB which are indexed
for searching by ElasticSearch through River. I want to ask:

  1. How ES deals with a situation when the dataset stored in CouchDB
    and indexed by ES does not fit into memory?
  1. Does ES need to store the whole dataset for indexing purpose into
    memory or it only stores part of the original data?

Thanks for your time and help.

--
David Pilatohttp://dev.david.pilato.fr/
Twitter : @dadoonet


(David Pilato) #4

I don't know if you are talking about individual size of each document you get
from couchDb or global sizeof your ES index.

You are talking about memory. Are you meaning disk space ?

Let me say that I never see ES having problems to manage individuals documents
even with large ones (more than 1000 elements in an array with more than hundred
fields each).

That said, I was running out of disk space in my production cluster last week
and ES handle it very well :

  • Sending information back to the client that the document has not been indexed
  • Let users performs searches without any problem

Not sure I answered to your fears...

Cheers
David.

Le 21 novembre 2011 à 16:25, yojimbo87 bosak.tomas@gmail.com a écrit :

Thanks David, this answered my second question, however I would also
like to know what happens in case my dataset doesn't fit into memory.
Is it still possible to use ES functionality when there is not enough
RAM to hold all documents and index them?


(yojimbo87) #5

By memory I meant RAM - data fit into disk, but not into RAM. For
example my couchdb dataset is 10 GB, but I have only 2 GB of RAM - how
ES deals with this situation when only ~1/5 of the original dataset
can fit into RAM.

On Nov 21, 5:01 pm, "da...@pilato.fr" da...@pilato.fr wrote:

I don't know if you are talking about individual size of each document you get
from couchDb or global sizeof your ES index.

You are talking about memory. Are you meaning disk space ?

Let me say that I never see ES having problems to manage individuals documents
even with large ones (more than 1000 elements in an array with more than hundred
fields each).

That said, I was running out of disk space in my production cluster last week
and ES handle it very well :

  • Sending information back to the client that the document has not been indexed
  • Let users performs searches without any problem

Not sure I answered to your fears...

Cheers
David.

Le 21 novembre 2011 à 16:25, yojimbo87 bosak.to...@gmail.com a écrit :

Thanks David, this answered my second question, however I would also
like to know what happens in case my dataset doesn't fit into memory.
Is it still possible to use ES functionality when there is not enough
RAM to hold all documents and index them?


(David Pilato) #6

I suppose it will depends on the complexity/size of each document.
Also, if you are not using sorting and facets, ES will handle it very well.

I mean that I was able to manage 3 million documents on a single laptop
(with a size of about 100 Mb of datas) with only 1.5 Mb RAM allocated to the
jvm.

But memory problems begins when I start to use facets with a match_all
query... (bad/mad idea for sure !)

So it really depends on how complex are your datas and what will be your use
cases.

David.

-----Message d'origine-----
De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de yojimbo87
Envoyé : lundi 21 novembre 2011 22:14
À : elasticsearch
Objet : Re: ElasticSearch with CouchDB and memory consumption

By memory I meant RAM - data fit into disk, but not into RAM. For
example my couchdb dataset is 10 GB, but I have only 2 GB of RAM - how
ES deals with this situation when only ~1/5 of the original dataset
can fit into RAM.

On Nov 21, 5:01 pm, "da...@pilato.fr" da...@pilato.fr wrote:

I don't know if you are talking about individual size of each document you
get
from couchDb or global sizeof your ES index.

You are talking about memory. Are you meaning disk space ?

Let me say that I never see ES having problems to manage individuals
documents
even with large ones (more than 1000 elements in an array with more than
hundred
fields each).

That said, I was running out of disk space in my production cluster last
week
and ES handle it very well :

  • Sending information back to the client that the document has not been
    indexed
  • Let users performs searches without any problem

Not sure I answered to your fears...

Cheers
David.

Le 21 novembre 2011 à 16:25, yojimbo87 bosak.to...@gmail.com a écrit :

Thanks David, this answered my second question, however I would also
like to know what happens in case my dataset doesn't fit into memory.
Is it still possible to use ES functionality when there is not enough
RAM to hold all documents and index them?


(Gabriel Farrell) #7

On Mon, Nov 21, 2011 at 5:49 PM, David Pilato david@pilato.fr wrote:

I suppose it will depends on the complexity/size of each document.
Also, if you are not using sorting and facets, ES will handle it very well.

I mean that I was able to manage 3 million documents on a single laptop
(with a size of about 100 Mb of datas) with only 1.5 Mb RAM allocated to the
jvm.

1.5MB allocated to the JVM? Are you sure? That's awfully small.

But memory problems begins when I start to use facets with a match_all
query... (bad/mad idea for sure !)

So it really depends on how complex are your datas and what will be your use
cases.

David.

-----Message d'origine-----
De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de yojimbo87
Envoyé : lundi 21 novembre 2011 22:14
À : elasticsearch
Objet : Re: ElasticSearch with CouchDB and memory consumption

By memory I meant RAM - data fit into disk, but not into RAM. For
example my couchdb dataset is 10 GB, but I have only 2 GB of RAM - how
ES deals with this situation when only ~1/5 of the original dataset
can fit into RAM.

On Nov 21, 5:01 pm, "da...@pilato.fr" da...@pilato.fr wrote:

I don't know if you are talking about individual size of each document you
get
from couchDb or global sizeof your ES index.

You are talking about memory. Are you meaning disk space ?

Let me say that I never see ES having problems to manage individuals
documents
even with large ones (more than 1000 elements in an array with more than
hundred
fields each).

That said, I was running out of disk space in my production cluster last
week
and ES handle it very well :

  • Sending information back to the client that the document has not been
    indexed
  • Let users performs searches without any problem

Not sure I answered to your fears...

Cheers
David.

Le 21 novembre 2011 à 16:25, yojimbo87 bosak.to...@gmail.com a écrit :

Thanks David, this answered my second question, however I would also
like to know what happens in case my dataset doesn't fit into memory.
Is it still possible to use ES functionality when there is not enough
RAM to hold all documents and index them?


(David Pilato) #8

Yes sure as I was running Windows 32 bits.

David :wink:
@dadoonet

Le 22 nov. 2011 à 03:29, Gabriel Farrell gsf747@gmail.com a écrit :

On Mon, Nov 21, 2011 at 5:49 PM, David Pilato david@pilato.fr wrote:

I suppose it will depends on the complexity/size of each document.
Also, if you are not using sorting and facets, ES will handle it very well.

I mean that I was able to manage 3 million documents on a single laptop
(with a size of about 100 Mb of datas) with only 1.5 Mb RAM allocated to the
jvm.

1.5MB allocated to the JVM? Are you sure? That's awfully small.

But memory problems begins when I start to use facets with a match_all
query... (bad/mad idea for sure !)

So it really depends on how complex are your datas and what will be your use
cases.

David.

-----Message d'origine-----
De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de yojimbo87
Envoyé : lundi 21 novembre 2011 22:14
À : elasticsearch
Objet : Re: ElasticSearch with CouchDB and memory consumption

By memory I meant RAM - data fit into disk, but not into RAM. For
example my couchdb dataset is 10 GB, but I have only 2 GB of RAM - how
ES deals with this situation when only ~1/5 of the original dataset
can fit into RAM.

On Nov 21, 5:01 pm, "da...@pilato.fr" da...@pilato.fr wrote:

I don't know if you are talking about individual size of each document you
get
from couchDb or global sizeof your ES index.

You are talking about memory. Are you meaning disk space ?

Let me say that I never see ES having problems to manage individuals
documents
even with large ones (more than 1000 elements in an array with more than
hundred
fields each).

That said, I was running out of disk space in my production cluster last
week
and ES handle it very well :

  • Sending information back to the client that the document has not been
    indexed
  • Let users performs searches without any problem

Not sure I answered to your fears...

Cheers
David.

Le 21 novembre 2011 à 16:25, yojimbo87 bosak.to...@gmail.com a écrit :

Thanks David, this answered my second question, however I would also
like to know what happens in case my dataset doesn't fit into memory.
Is it still possible to use ES functionality when there is not enough
RAM to hold all documents and index them?


(Shay Banon) #9

Lucene, and ElasticSearch requiers certain amount of memory to operate. It
starts with Lucene to hold parts of the inverted index in memory to improve
search performance (can be controlled), and ElasticSearch for things like
faceting on fields. If there isn't enough memory, then you will usually get
a failure logged (OutOfMemoryException) and you need to make sure to
allocated more memory. The nodes info and nodes stats gives statistics
regarding memory usage and boundaries.

On Mon, Nov 21, 2011 at 11:13 PM, yojimbo87 bosak.tomas@gmail.com wrote:

By memory I meant RAM - data fit into disk, but not into RAM. For
example my couchdb dataset is 10 GB, but I have only 2 GB of RAM - how
ES deals with this situation when only ~1/5 of the original dataset
can fit into RAM.

On Nov 21, 5:01 pm, "da...@pilato.fr" da...@pilato.fr wrote:

I don't know if you are talking about individual size of each document
you get
from couchDb or global sizeof your ES index.

You are talking about memory. Are you meaning disk space ?

Let me say that I never see ES having problems to manage individuals
documents
even with large ones (more than 1000 elements in an array with more than
hundred
fields each).

That said, I was running out of disk space in my production cluster last
week
and ES handle it very well :

  • Sending information back to the client that the document has not been
    indexed
  • Let users performs searches without any problem

Not sure I answered to your fears...

Cheers
David.

Le 21 novembre 2011 à 16:25, yojimbo87 bosak.to...@gmail.com a écrit :

Thanks David, this answered my second question, however I would also
like to know what happens in case my dataset doesn't fit into memory.
Is it still possible to use ES functionality when there is not enough
RAM to hold all documents and index them?


(yojimbo87) #10

So if I understand it correctly - I can have CouchDB which will
durably persist my data on disk, and size of this dataset can be
greater than amount of RAM (to some extent or limit of course) which
will be used by ES to provide search/ad-hoc query functionality on my
dataset. What I need is an ad-hoc querying for my CouchDB dataset, but
I was worried what would happen if dataset stored in CouchDB on disk
would be greater than amount of RAM which can be assigned to ES for
managing search/query functionality on top of my dataset.

On Nov 22, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Lucene, and ElasticSearch requiers certain amount of memory to operate. It
starts with Lucene to hold parts of the inverted index in memory to improve
search performance (can be controlled), and ElasticSearch for things like
faceting on fields. If there isn't enough memory, then you will usually get
a failure logged (OutOfMemoryException) and you need to make sure to
allocated more memory. The nodes info and nodes stats gives statistics
regarding memory usage and boundaries.

On Mon, Nov 21, 2011 at 11:13 PM, yojimbo87 bosak.to...@gmail.com wrote:

By memory I meant RAM - data fit into disk, but not into RAM. For
example my couchdb dataset is 10 GB, but I have only 2 GB of RAM - how
ES deals with this situation when only ~1/5 of the original dataset
can fit into RAM.

On Nov 21, 5:01 pm, "da...@pilato.fr" da...@pilato.fr wrote:

I don't know if you are talking about individual size of each document
you get
from couchDb or global sizeof your ES index.

You are talking about memory. Are you meaning disk space ?

Let me say that I never see ES having problems to manage individuals
documents
even with large ones (more than 1000 elements in an array with more than
hundred
fields each).

That said, I was running out of disk space in my production cluster last
week
and ES handle it very well :

  • Sending information back to the client that the document has not been
    indexed
  • Let users performs searches without any problem

Not sure I answered to your fears...

Cheers
David.

Le 21 novembre 2011 à 16:25, yojimbo87 bosak.to...@gmail.com a écrit :

Thanks David, this answered my second question, however I would also
like to know what happens in case my dataset doesn't fit into memory.
Is it still possible to use ES functionality when there is not enough
RAM to hold all documents and index them?


(yojimbo87) #11

So if I understand it correctly - I can have CouchDB which will
durably persist my data on disk, and size of this dataset can be
greater than amount of RAM (to some extent or limit of course) which
will be used by ES to provide search/ad-hoc query functionality on my
dataset. What I need is an ad-hoc querying for my CouchDB dataset, but
I was worried what would happen if dataset stored in CouchDB on disk
would be greater than amount of RAM which can be assigned to ES for
managing search/query functionality on top of my dataset.

On Nov 22, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Lucene, and ElasticSearch requiers certain amount of memory to operate. It
starts with Lucene to hold parts of the inverted index in memory to improve
search performance (can be controlled), and ElasticSearch for things like
faceting on fields. If there isn't enough memory, then you will usually get
a failure logged (OutOfMemoryException) and you need to make sure to
allocated more memory. The nodes info and nodes stats gives statistics
regarding memory usage and boundaries.

On Mon, Nov 21, 2011 at 11:13 PM, yojimbo87 bosak.to...@gmail.com wrote:

By memory I meant RAM - data fit into disk, but not into RAM. For
example my couchdb dataset is 10 GB, but I have only 2 GB of RAM - how
ES deals with this situation when only ~1/5 of the original dataset
can fit into RAM.

On Nov 21, 5:01 pm, "da...@pilato.fr" da...@pilato.fr wrote:

I don't know if you are talking about individual size of each document
you get
from couchDb or global sizeof your ES index.

You are talking about memory. Are you meaning disk space ?

Let me say that I never see ES having problems to manage individuals
documents
even with large ones (more than 1000 elements in an array with more than
hundred
fields each).

That said, I was running out of disk space in my production cluster last
week
and ES handle it very well :

  • Sending information back to the client that the document has not been
    indexed
  • Let users performs searches without any problem

Not sure I answered to your fears...

Cheers
David.

Le 21 novembre 2011 à 16:25, yojimbo87 bosak.to...@gmail.com a écrit :

Thanks David, this answered my second question, however I would also
like to know what happens in case my dataset doesn't fit into memory.
Is it still possible to use ES functionality when there is not enough
RAM to hold all documents and index them?


(David Pilato) #12

Just want to add something :

ES will not search directly within your dataset.
You will have to index all of your datas in ES (manually, with the couchDb
River, ...)

So, when your datas will be indexed, even if you shutdown couchdb, you will be
able to search your datas.

Not sure that's what you imagine by having a "search/ad-hoc query functionality
on your dataset".

David

Le 22 novembre 2011 à 17:03, yojimbo87 bosak.tomas@gmail.com a écrit :

So if I understand it correctly - I can have CouchDB which will
durably persist my data on disk, and size of this dataset can be
greater than amount of RAM (to some extent or limit of course) which
will be used by ES to provide search/ad-hoc query functionality on my
dataset. What I need is an ad-hoc querying for my CouchDB dataset, but
I was worried what would happen if dataset stored in CouchDB on disk
would be greater than amount of RAM which can be assigned to ES for
managing search/query functionality on top of my dataset.

On Nov 22, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Lucene, and ElasticSearch requiers certain amount of memory to operate. It
starts with Lucene to hold parts of the inverted index in memory to improve
search performance (can be controlled), and ElasticSearch for things like
faceting on fields. If there isn't enough memory, then you will usually get
a failure logged (OutOfMemoryException) and you need to make sure to
allocated more memory. The nodes info and nodes stats gives statistics
regarding memory usage and boundaries.

On Mon, Nov 21, 2011 at 11:13 PM, yojimbo87 bosak.to...@gmail.com wrote:

By memory I meant RAM - data fit into disk, but not into RAM. For
example my couchdb dataset is 10 GB, but I have only 2 GB of RAM - how
ES deals with this situation when only ~1/5 of the original dataset
can fit into RAM.

On Nov 21, 5:01 pm, "da...@pilato.fr" da...@pilato.fr wrote:

I don't know if you are talking about individual size of each document
you get
from couchDb or global sizeof your ES index.

You are talking about memory. Are you meaning disk space ?

Let me say that I never see ES having problems to manage individuals
documents
even with large ones (more than 1000 elements in an array with more than
hundred
fields each).

That said, I was running out of disk space in my production cluster last
week
and ES handle it very well :

  • Sending information back to the client that the document has not been
    indexed
  • Let users performs searches without any problem

Not sure I answered to your fears...

Cheers
David.

Le 21 novembre 2011 à 16:25, yojimbo87 bosak.to...@gmail.com a écrit :

Thanks David, this answered my second question, however I would also
like to know what happens in case my dataset doesn't fit into memory.
Is it still possible to use ES functionality when there is not enough
RAM to hold all documents and index them?
--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(yojimbo87) #13

Thanks David for having patience with me.
Let's say I'm in this situation:- CouchDB is responsible for adding/
updating/deleting data and keep it durable- my dataset in CouchDB
takes about 10 GB of disk space- server has 2 GB of RAM- CouchDB
doesn't support dynamic ad-hoc querying and mapreduce doesn't suit my
needs- I need to be able to search/query my entire dataset dynamically
for documents based on their field values (that's why I would like to
evaluate ES for this functionality)- I need ES only for search
functionality among the dataset documents - add/edit/delete would be
taken care of by CouchDB
My concern is:
I understand that ES needs to index the entire dataset from CouchDB
before I can start searching/querying the data, but if my CouchDB
dataset takes 10 GB of disk space, wouldn't ES need ~10 GB of RAM to
index these documents (assuming that I don't want to ignore any
fields)? To be more clear, I would like to know how ES indexes data -
if it stores them only in RAM for fast access or also on disk (in case
the dataset can't fit into RAM). I guess the latter is how ES works,
so now I would have 10 GB of data in CouchDB and ~10 GB of data
indexed by ES (some data in RAM and most data on disk). Sorry if I'm
too annoying with my concern, but I would like to make things clear in
my head.
On Nov 22, 5:23 pm, "da...@pilato.fr" da...@pilato.fr wrote:

Just want to add something :

ES will not search directly within your dataset.
You will have to index all of your datas in ES (manually, with the couchDb
River, ...)

So, when your datas will be indexed, even if you shutdown couchdb, you will be
able to search your datas.

Not sure that's what you imagine by having a "search/ad-hoc query functionality
on your dataset".

David

Le 22 novembre 2011 à 17:03, yojimbo87 bosak.to...@gmail.com a écrit :

So if I understand it correctly - I can have CouchDB which will
durably persist my data on disk, and size of this dataset can be
greater than amount of RAM (to some extent or limit of course) which
will be used by ES to provide search/ad-hoc query functionality on my
dataset. What I need is an ad-hoc querying for my CouchDB dataset, but
I was worried what would happen if dataset stored in CouchDB on disk
would be greater than amount of RAM which can be assigned to ES for
managing search/query functionality on top of my dataset.

On Nov 22, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Lucene, and ElasticSearch requiers certain amount of memory to operate. It
starts with Lucene to hold parts of the inverted index in memory to improve
search performance (can be controlled), and ElasticSearch for things like
faceting on fields. If there isn't enough memory, then you will usually get
a failure logged (OutOfMemoryException) and you need to make sure to
allocated more memory. The nodes info and nodes stats gives statistics
regarding memory usage and boundaries.

On Mon, Nov 21, 2011 at 11:13 PM, yojimbo87 bosak.to...@gmail.com wrote:

By memory I meant RAM - data fit into disk, but not into RAM. For
example my couchdb dataset is 10 GB, but I have only 2 GB of RAM - how
ES deals with this situation when only ~1/5 of the original dataset
can fit into RAM.

On Nov 21, 5:01 pm, "da...@pilato.fr" da...@pilato.fr wrote:

I don't know if you are talking about individual size of each document
you get
from couchDb or global sizeof your ES index.

You are talking about memory. Are you meaning disk space ?

Let me say that I never see ES having problems to manage individuals
documents
even with large ones (more than 1000 elements in an array with more than
hundred
fields each).

That said, I was running out of disk space in my production cluster last
week
and ES handle it very well :

  • Sending information back to the client that the document has not been
    indexed
  • Let users performs searches without any problem

Not sure I answered to your fears...

Cheers
David.

Le 21 novembre 2011 à 16:25, yojimbo87 bosak.to...@gmail.com a écrit :

Thanks David, this answered my second question, however I would also
like to know what happens in case my dataset doesn't fit into memory.
Is it still possible to use ES functionality when there is not enough
RAM to hold all documents and index them?

--
David Pilatohttp://dev.david.pilato.fr/
Twitter : @dadoonet


(David Pilato) #14

We've got about the same project. Datas in CouchDB. Java Batch to fetch
datas from couchDb using _changes API and each couchDB Doc is sent to ES.
We don't use the couchDb river but I recommend to use it to start evaluate
ES as it's really easy to setup.

River manage add/update/delete so it will be very easy for you.

What I can suggest is to test it and make your own opinion of ES. I'm pretty
sure you're going to love it :smiley:

So build a "small platform", 2Gb RAM and less than 100 Gb disk space and go
for it.
As I told you before, you can run ES on a laptop.

ES use RAM to store its indexes only if you ask for (see
http://www.elasticsearch.org/guide/reference/index-modules/store.html ). By
default, ES use the local file system to store Lucene indexes.

ES use lot of RAM if you are doing faceting or sorting. So for tests
purpose, you can start with 2Gb RAM.
But, you will be more comfortable to go in production if you have more than
one node with fast disks (SSD) and lot of memory.

HTH
David

-----Message d'origine-----
De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de yojimbo87
Envoyé : mardi 22 novembre 2011 18:32
À : elasticsearch
Objet : Re: ElasticSearch with CouchDB and memory consumption

Thanks David for having patience with me.
Let's say I'm in this situation:- CouchDB is responsible for adding/
updating/deleting data and keep it durable- my dataset in CouchDB
takes about 10 GB of disk space- server has 2 GB of RAM- CouchDB
doesn't support dynamic ad-hoc querying and mapreduce doesn't suit my
needs- I need to be able to search/query my entire dataset dynamically
for documents based on their field values (that's why I would like to
evaluate ES for this functionality)- I need ES only for search
functionality among the dataset documents - add/edit/delete would be
taken care of by CouchDB
My concern is:
I understand that ES needs to index the entire dataset from CouchDB
before I can start searching/querying the data, but if my CouchDB
dataset takes 10 GB of disk space, wouldn't ES need ~10 GB of RAM to
index these documents (assuming that I don't want to ignore any
fields)? To be more clear, I would like to know how ES indexes data -
if it stores them only in RAM for fast access or also on disk (in case
the dataset can't fit into RAM). I guess the latter is how ES works,
so now I would have 10 GB of data in CouchDB and ~10 GB of data
indexed by ES (some data in RAM and most data on disk). Sorry if I'm
too annoying with my concern, but I would like to make things clear in
my head.
On Nov 22, 5:23 pm, "da...@pilato.fr" da...@pilato.fr wrote:

Just want to add something :

ES will not search directly within your dataset.
You will have to index all of your datas in ES (manually, with the couchDb
River, ...)

So, when your datas will be indexed, even if you shutdown couchdb, you
will be
able to search your datas.

Not sure that's what you imagine by having a "search/ad-hoc query
functionality
on your dataset".

David

Le 22 novembre 2011 à 17:03, yojimbo87 bosak.to...@gmail.com a écrit :

So if I understand it correctly - I can have CouchDB which will
durably persist my data on disk, and size of this dataset can be
greater than amount of RAM (to some extent or limit of course) which
will be used by ES to provide search/ad-hoc query functionality on my
dataset. What I need is an ad-hoc querying for my CouchDB dataset, but
I was worried what would happen if dataset stored in CouchDB on disk
would be greater than amount of RAM which can be assigned to ES for
managing search/query functionality on top of my dataset.

On Nov 22, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Lucene, and ElasticSearch requiers certain amount of memory to
operate. It

starts with Lucene to hold parts of the inverted index in memory to
improve

search performance (can be controlled), and ElasticSearch for things
like

faceting on fields. If there isn't enough memory, then you will
usually get

a failure logged (OutOfMemoryException) and you need to make sure to
allocated more memory. The nodes info and nodes stats gives statistics
regarding memory usage and boundaries.

On Mon, Nov 21, 2011 at 11:13 PM, yojimbo87 bosak.to...@gmail.com
wrote:

By memory I meant RAM - data fit into disk, but not into RAM. For
example my couchdb dataset is 10 GB, but I have only 2 GB of RAM -
how

ES deals with this situation when only ~1/5 of the original dataset
can fit into RAM.

On Nov 21, 5:01 pm, "da...@pilato.fr" da...@pilato.fr wrote:

I don't know if you are talking about individual size of each
document

you get

from couchDb or global sizeof your ES index.

You are talking about memory. Are you meaning disk space ?

Let me say that I never see ES having problems to manage
individuals

documents

even with large ones (more than 1000 elements in an array with
more than

hundred

fields each).

That said, I was running out of disk space in my production
cluster last

week

and ES handle it very well :

  • Sending information back to the client that the document has not
    been

indexed

  • Let users performs searches without any problem

Not sure I answered to your fears...

Cheers
David.

Le 21 novembre 2011 à 16:25, yojimbo87 bosak.to...@gmail.com a
écrit :

Thanks David, this answered my second question, however I would
also

like to know what happens in case my dataset doesn't fit into
memory.

Is it still possible to use ES functionality when there is not
enough

RAM to hold all documents and index them?

--
David Pilatohttp://dev.david.pilato.fr/
Twitter : @dadoonet


(system) #15