Have a couple of questions on ES

Subramanian_Narayana · October 18, 2012, 5:14am

Hi,

I am new to ES. I am using ES via logstash "embedded" setting for
standalone logstash setup.

However i am planning to use ES in a centralized logstash setup. Hence I
need to install ES in a seperate cluster.

On these lines,I have the following couple of questions.

Can i install ES in a hadoop cluster havind HDFS. Can ES use HDFS as the
file system instead of plain disk. What are the pros n cons of using HDFS
as the filesystem for ES. If its a performance issue, what is the impact.
Is it 3x or 10x etc.
I saw from an older thread, by selecing HDFS gateway, we can replicate
and store the ES index in HDFS. Does index mean all "data" or its a index
like a mysql index.
How do we delete data from ES. Can you give pointer to doc/tutorial on
how to delete data indexed from ES.

Thanks in advance.

Subbu

--

radu_gheorghe · October 18, 2012, 9:16am

Hello Subramanian,

On Thu, Oct 18, 2012 at 8:14 AM, Subramanian Narayanan
ping2sriram@gmail.com wrote:

Hi,

I am new to ES. I am using ES via logstash "embedded" setting for standalone
logstash setup.

However i am planning to use ES in a centralized logstash setup. Hence I
need to install ES in a seperate cluster.

On these lines,I have the following couple of questions.

Can i install ES in a hadoop cluster havind HDFS. Can ES use HDFS as the
file system instead of plain disk. What are the pros n cons of using HDFS as
the filesystem for ES. If its a performance issue, what is the impact. Is it
3x or 10x etc.

I haven't done any benchmarks, but I would use local gateway if this
would be an option. It's the more recommended and more tested option.

I saw from an older thread, by selecing HDFS gateway, we can replicate
and store the ES index in HDFS. Does index mean all "data" or its a index
like a mysql index.

"Index" in the context of Elasticsearch usually refers to something
like "database" in mysql. It might contain the source - which is
default and basically means all data - or not - in which case you only
have the inverted index, like in mysql index terminology. More
information on "source" here:

How do we delete data from ES. Can you give pointer to doc/tutorial on
how to delete data indexed from ES.

If it applies to your usecase, the recommended option is to have
rolling indices (eg: one index per day, or per week, month, etc) and
remove old data by simply deleting old indices. Like:

curl -XDELETE localhost:9200/old_index

This is very fast, basically like removing the corresponding files from disk.

If you don't have that option, you can use TTL:

and old data will automatically deleted after the specified time.

Or, you can manually delete all documents that match a certain query:

Please note that when documents are deleted from an index (as opposed
to when you delete a whole index), they're only marked for deletion,
and will be physically removed when segments are merged. The way it
actually happens depends on the merge policy:

Either way, merging implies quite a heavy I/O activity, which is why
it's better for performance to have rolling indices.

Thanks in advance.

Subbu

--

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

Topic		Replies	Views
Hadoop / Elasticsearch functionality Elasticsearch es-hadoop	20	3236	July 6, 2017
Elasticsearch and Hadoop Questions Elasticsearch	10	377	July 6, 2017
Store indexes in ES while the data stays in HDFS Elasticsearch es-hadoop	4	965	July 6, 2017
How should I search data in hdfs Elasticsearch es-hadoop	3	1875	July 6, 2017
How is Hadoop and ES typically used? Elasticsearch es-hadoop	8	1713	July 6, 2017

Have a couple of questions on ES

Best regards, Radu

Related topics

Best regards,
Radu