Elastic search

Mohit_Anchlia · August 7, 2011, 7:04pm

I am new to elastic search and trying to understand the concept. I am
trying to find the information:

about how it distributes, replicates data for HA.
Where does it store the data?
Optimization techniques

Paul_Loy · August 7, 2011, 10:56pm

ES shards and replicates indexes. It is what I would call 'statically
sharded' - that is you specify up front the number of shards and replicas
you want and that's how many there will be. Shards and replicas are then
allocated to nodes in your cluster.
Up to you:
Elastic — The Search AI Company | Elastic
Depends upon your use case. Everyone's data and everyone's indexes will
be different.

On Sun, Aug 7, 2011 at 8:04 PM, Mo mohitanchlia@gmail.com wrote:

I am new to Elasticsearch and trying to understand the concept. I am
trying to find the information:

about how it distributes, replicates data for HA.

Where does it store the data?

Optimization techniques

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Mohit_Anchlia · August 8, 2011, 5:34am

On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy keteracel@gmail.com wrote:

ES shards and replicates indexes. It is what I would call 'statically
sharded' - that is you specify up front the number of shards and replicas
you want and that's how many there will be. Shards and replicas are then
allocated to nodes in your cluster.

Is there a link where I can read how to configure that? Also, does it
make it HA for eg: if on enode goes down then it doesn't impact the
searching?

Up to you:
Elasticsearch Platform — Find real-time answers at scale | Elastic

How to decide which one to use? I also see it integrates with CouchDB.
When having TBs of data is it ok to keep on the file system?

Depends upon your use case. Everyone's data and everyone's indexes will
be different.

Are there any general guidelines that might be applicable to everyone
or at least gives litte more thought processing into design this
efficiently?

On Sun, Aug 7, 2011 at 8:04 PM, Mo mohitanchlia@gmail.com wrote:

I am new to Elasticsearch and trying to understand the concept. I am
trying to find the information:

about how it distributes, replicates data for HA.

Where does it store the data?

Optimization techniques

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Paul_Loy · August 8, 2011, 11:19am

On Mon, Aug 8, 2011 at 6:34 AM, Mohit Anchlia mohitanchlia@gmail.comwrote:

On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy keteracel@gmail.com wrote:

ES shards and replicates indexes. It is what I would call 'statically
sharded' - that is you specify up front the number of shards and replicas
you want and that's how many there will be. Shards and replicas are then
allocated to nodes in your cluster.

Is there a link where I can read how to configure that? Also, does it
make it HA for eg: if on enode goes down then it doesn't impact the
searching?

Basic configuration will be the index settings where you can set the number
of shards and the number of replicas of an index.

What's awesome with ES is that you can specify this on a per index basis. So
more critical indices can have a higher number of replicas.

Regarding HA, the was I understand it (and Shay can probably correct me if
I'm wrong), there is a 'master' node for a shard. If that node dies, another
node with a replica is voted the 'master'. So searches should not be
impacted if a node goes down. Obviously if you had enough nodes for one per
shard and a node goes down then one node will now have to do 2 shards of
searches and so may be slower. So while you can still run searches, you'll
need to think about redundancy in your cluster.

Up to you:
Elastic — The Search AI Company | Elastic

How to decide which one to use? I also see it integrates with CouchDB.
When having TBs of data is it ok to keep on the file system?

This will be better answered by one of the guys on this list that also
pushes TBs of data. I'm only at the GBs size so I use S3 for a gateway just
to be sure. I guess the quick answer is you can scale out to meet your
needs! If FS is a bottleneck you can add more nodes!?

Depends upon your use case. Everyone's data and everyone's indexes
will
be different.

Are there any general guidelines that might be applicable to everyone
or at least gives litte more thought processing into design this
efficiently?

Lots, and it really is dependent on your data and how you want to search it.
Some tips I've used are to use filters as much as possible, which seems to
have given us a very stable, low latency ES cluster.

On Sun, Aug 7, 2011 at 8:04 PM, Mo mohitanchlia@gmail.com wrote:

I am new to Elasticsearch and trying to understand the concept. I am
trying to find the information:

about how it distributes, replicates data for HA.

Where does it store the data?

Optimization techniques

--

Paul Loy
paul@keteracel.com
Paul Loy - Amihan Entertainment | LinkedIn

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Mohit_Anchlia · August 10, 2011, 3:51pm

Are there any recommendation as to when to use DB compared to file system?

Our use case is simple:

We have tons of column name and values in NoSQL column families
that we need to have search capabilities on since NoSQL cassandra
isn't really very good when you need lots of indexes. These are mostly
distinct values.
We have xml docs that have attributes that we need to search for.
These have low cardinality.

On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy keteracel@gmail.com wrote:

ES shards and replicates indexes. It is what I would call 'statically
sharded' - that is you specify up front the number of shards and replicas
you want and that's how many there will be. Shards and replicas are then
allocated to nodes in your cluster.

Up to you:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Depends upon your use case. Everyone's data and everyone's indexes will
be different.

On Sun, Aug 7, 2011 at 8:04 PM, Mo mohitanchlia@gmail.com wrote:

I am new to Elasticsearch and trying to understand the concept. I am
trying to find the information:

about how it distributes, replicates data for HA.

Where does it store the data?

Optimization techniques

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

kimchy · August 13, 2011, 4:45pm

What kind of recommendations are you after? Not sure I understand the
question properly to answer it...

On Wed, Aug 10, 2011 at 6:51 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

Are there any recommendation as to when to use DB compared to file system?

Our use case is simple:

We have tons of column name and values in NoSQL column families
that we need to have search capabilities on since NoSQL cassandra
isn't really very good when you need lots of indexes. These are mostly
distinct values.

We have xml docs that have attributes that we need to search for.
These have low cardinality.

On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy keteracel@gmail.com wrote:

ES shards and replicates indexes. It is what I would call 'statically
sharded' - that is you specify up front the number of shards and replicas
you want and that's how many there will be. Shards and replicas are then
allocated to nodes in your cluster.

Up to you:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Depends upon your use case. Everyone's data and everyone's indexes
will
be different.

On Sun, Aug 7, 2011 at 8:04 PM, Mo mohitanchlia@gmail.com wrote:

I am new to Elasticsearch and trying to understand the concept. I am
trying to find the information:

about how it distributes, replicates data for HA.

Where does it store the data?

Optimization techniques

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Mohit_Anchlia · August 13, 2011, 5:25pm

On Sat, Aug 13, 2011 at 9:45 AM, Shay Banon kimchy@gmail.com wrote:

What kind of recommendations are you after? Not sure I understand the
question properly to answer it...

How to decide to use File system or CouchDB? What would be the reason
people would chose one over other? Is it just because you can see data
in some form directly in the DB?

On Wed, Aug 10, 2011 at 6:51 PM, Mohit Anchlia mohitanchlia@gmail.com
wrote:

Are there any recommendation as to when to use DB compared to file system?

Our use case is simple:

We have tons of column name and values in NoSQL column families
that we need to have search capabilities on since NoSQL cassandra
isn't really very good when you need lots of indexes. These are mostly
distinct values.

We have xml docs that have attributes that we need to search for.
These have low cardinality.

On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy keteracel@gmail.com wrote:

ES shards and replicates indexes. It is what I would call 'statically
sharded' - that is you specify up front the number of shards and
replicas
you want and that's how many there will be. Shards and replicas are then
allocated to nodes in your cluster.

Up to you:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Depends upon your use case. Everyone's data and everyone's indexes
will
be different.

On Sun, Aug 7, 2011 at 8:04 PM, Mo mohitanchlia@gmail.com wrote:

I am new to Elasticsearch and trying to understand the concept. I am
trying to find the information:

about how it distributes, replicates data for HA.

Where does it store the data?

Optimization techniques

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

kimchy · August 13, 2011, 7:43pm

Still not understanding... . Use a file system or use couchdb? How does
relate to elasticsearch? If not, I can still try and help :), but need more
info, you want to store blobs on the file system?

On Sat, Aug 13, 2011 at 8:25 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

On Sat, Aug 13, 2011 at 9:45 AM, Shay Banon kimchy@gmail.com wrote:

What kind of recommendations are you after? Not sure I understand the
question properly to answer it...

How to decide to use File system or CouchDB? What would be the reason
people would chose one over other? Is it just because you can see data
in some form directly in the DB?

On Wed, Aug 10, 2011 at 6:51 PM, Mohit Anchlia mohitanchlia@gmail.com
wrote:

Are there any recommendation as to when to use DB compared to file
system?

Our use case is simple:

We have tons of column name and values in NoSQL column families
that we need to have search capabilities on since NoSQL cassandra
isn't really very good when you need lots of indexes. These are mostly
distinct values.

We have xml docs that have attributes that we need to search for.
These have low cardinality.

On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy keteracel@gmail.com wrote:

ES shards and replicates indexes. It is what I would call
'statically
sharded' - that is you specify up front the number of shards and
replicas
you want and that's how many there will be. Shards and replicas are
then
allocated to nodes in your cluster.

Up to you:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Depends upon your use case. Everyone's data and everyone's indexes
will
be different.

On Sun, Aug 7, 2011 at 8:04 PM, Mo mohitanchlia@gmail.com wrote:

I am new to Elasticsearch and trying to understand the concept. I am
trying to find the information:

about how it distributes, replicates data for HA.

Where does it store the data?

Optimization techniques

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Mohit_Anchlia · August 14, 2011, 4:38am

On Sat, Aug 13, 2011 at 12:43 PM, Shay Banon kimchy@gmail.com wrote:

Still not understanding... . Use a file system or use couchdb? How does
relate to elasticsearch? If not, I can still try and help :), but need more
info, you want to store blobs on the file system?

From what I understand indexes are stored somewhere on the disk. And
from the link Elasticsearch Platform — Find real-time answers at scale | Elastic
it looks like you have various options. So I am trying to understand
if it should be stored on the file system or some DB like couchDB?

Doesn't elasticsearch store indexed data somewhere?

On Sat, Aug 13, 2011 at 8:25 PM, Mohit Anchlia mohitanchlia@gmail.com
wrote:

On Sat, Aug 13, 2011 at 9:45 AM, Shay Banon kimchy@gmail.com wrote:

What kind of recommendations are you after? Not sure I understand the
question properly to answer it...

How to decide to use File system or CouchDB? What would be the reason
people would chose one over other? Is it just because you can see data
in some form directly in the DB?

On Wed, Aug 10, 2011 at 6:51 PM, Mohit Anchlia mohitanchlia@gmail.com
wrote:

Are there any recommendation as to when to use DB compared to file
system?

Our use case is simple:

We have tons of column name and values in NoSQL column families
that we need to have search capabilities on since NoSQL cassandra
isn't really very good when you need lots of indexes. These are mostly
distinct values.

We have xml docs that have attributes that we need to search for.
These have low cardinality.

On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy keteracel@gmail.com wrote:

ES shards and replicates indexes. It is what I would call
'statically
sharded' - that is you specify up front the number of shards and
replicas
you want and that's how many there will be. Shards and replicas are
then
allocated to nodes in your cluster.

Up to you:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Depends upon your use case. Everyone's data and everyone's indexes
will
be different.

On Sun, Aug 7, 2011 at 8:04 PM, Mo mohitanchlia@gmail.com wrote:

I am new to Elasticsearch and trying to understand the concept. I
am
trying to find the information:

about how it distributes, replicates data for HA.

Where does it store the data?

Optimization techniques

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

James_Cook · August 14, 2011, 1:00pm

Hi Mo,

There seems to be a disconnect with your questions and some fundamental
understanding of how ES (and Lucene) works. I think you need to read the
website a bit more, especially take a look at the video:
http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.html

Index storage is under the control of Lucene, and the store pagehttp://www.elasticsearch.org/guide/reference/index-modules/store.htmlyou link to describes your options with simplefs, niofs being file-based,
memory being memory-based, and mmapfs being a hybrid of the two. I'm not
sure where you got the idea that indexes can also be stored in a DB like
CouchDB.

There is the concept of a River which is a bridge between CouchDB (and
others) and ES. A River will receive push changes or periodically will pull
changes from a source (like CouchDB, not sure if CouchDB River pushes or
pulls) and index the data it receives. This is a technique that can be used
to put things for searching into ES without the developer having to
specifically index documents into ES. It has nothing to do with how data is
stored in ES.

-- jim

Mohit_Anchlia · August 14, 2011, 10:41pm

On Sun, Aug 14, 2011 at 6:00 AM, James Cook jcook@tracermedia.com wrote:

Hi Mo,
There seems to be a disconnect with your questions and some fundamental
understanding of how ES (and Lucene) works. I think you need to read the
website a bit more, especially take a look at the video:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Index storage is under the control of Lucene, and the store page you link to
describes your options with simplefs, niofs being file-based, memory being
memory-based, and mmapfs being a hybrid of the two. I'm not sure where you
got the idea that indexes can also be stored in a DB like CouchDB.
There is the concept of a River which is a bridge between CouchDB (and
others) and ES. A River will receive push changes or periodically will pull
changes from a source (like CouchDB, not sure if CouchDB River pushes or
pulls) and index the data it receives. This is a technique that can be used
to put things for searching into ES without the developer having to
specifically index documents into ES. It has nothing to do with how data is
stored in ES.

Thanks for clarifying. I will go through that presentation.

-- jim

Topic		Replies	Views
Questions from a newbie Elasticsearch	15	440	July 6, 2017
Elasticsearch as a atabase Elasticsearch	10	403	July 6, 2017
Elastic Search configuration Elasticsearch	12	655	July 6, 2017
elasticSearch as a document database Elasticsearch	16	1468	July 6, 2017
ElasticSearch vs NoSQL Elasticsearch	22	1158	July 6, 2017

Elastic search

--

--

--

--

--

--

--

--

--

Related topics