Why is index not written to hdfs?


(Mohit Anchlia) #1

I have hadoop plugin with hdfs gateway but what I am seeing is that indexes
are still being written locally. Can you please help me understand why it's
being written locally?

ls -ltr data/elasticsearch/nodes/0/indices/twitter/0/index/

total 12

-rw-r--r-- 1 root root 0 May 14 15:33 write.lock

-rw-r--r-- 1 root root 20 May 14 15:33 segments.gen

-rw-r--r-- 1 root root 58 May 14 15:33 segments_1

-rw-r--r-- 1 root root 8 May 14 15:33 _checksums-1337034789114

hadoop fs -ls /elasticsearch

#returns nothing

config:

gateway.type: hdfs

gateway.hdfs.uri: hdfs://db1:54310

gateway.hdfs.path: elasticsearch


(Mohit Anchlia) #2

I finally got this working. Does anyone know when elasticsearch writes data
to hdfs? Is it almost real time.

If I kill the node can I lose some data?

On Mon, May 14, 2012 at 3:36 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

I have hadoop plugin with hdfs gateway but what I am seeing is that
indexes are still being written locally. Can you please help me understand
why it's being written locally?

ls -ltr data/elasticsearch/nodes/0/indices/twitter/0/index/

total 12

-rw-r--r-- 1 root root 0 May 14 15:33 write.lock

-rw-r--r-- 1 root root 20 May 14 15:33 segments.gen

-rw-r--r-- 1 root root 58 May 14 15:33 segments_1

-rw-r--r-- 1 root root 8 May 14 15:33 _checksums-1337034789114

hadoop fs -ls /elasticsearch

#returns nothing

config:

gateway.type: hdfs

gateway.hdfs.uri: hdfs://db1:54310

gateway.hdfs.path: elasticsearch


(Berkay Mollamustafaoglu-2) #3

There are two ways to configure ES for persistence (for the data to survive
full cluster restart)

  1. Local gateway, where the data persists on the servers
  2. Shared or central gateway (S3, Hadoop, or shared file system) where data
    is stored elsewhere.

In either case, data is still stored locally. With the shared gateway, data
is restored from that data store when a node restarts. For more
information, highly recommend reading the docs thoroughly.
http://www.elasticsearch.org/guide/reference/modules/gateway/

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, May 14, 2012 at 6:36 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

I have hadoop plugin with hdfs gateway but what I am seeing is that
indexes are still being written locally. Can you please help me understand
why it's being written locally?

ls -ltr data/elasticsearch/nodes/0/indices/twitter/0/index/

total 12

-rw-r--r-- 1 root root 0 May 14 15:33 write.lock

-rw-r--r-- 1 root root 20 May 14 15:33 segments.gen

-rw-r--r-- 1 root root 58 May 14 15:33 segments_1

-rw-r--r-- 1 root root 8 May 14 15:33 _checksums-1337034789114

hadoop fs -ls /elasticsearch

#returns nothing

config:

gateway.type: hdfs

gateway.hdfs.uri: hdfs://db1:54310

gateway.hdfs.path: elasticsearch


(Mohit Anchlia) #4

Yes I read that and also have done recovery testing too, which seems to
recover everything. My question was when does elasticsearch writes/commits
data to Hadoop? Is it synchronously or async? Should I expect to lose any
data that might be in elasticsearch memory? Just trying to understand the
basics.

On Mon, May 14, 2012 at 4:23 PM, Berkay Mollamustafaoglu
mberkay@gmail.comwrote:

There are two ways to configure ES for persistence (for the data to
survive full cluster restart)

  1. Local gateway, where the data persists on the servers
  2. Shared or central gateway (S3, Hadoop, or shared file system) where
    data is stored elsewhere.

In either case, data is still stored locally. With the shared gateway,
data is restored from that data store when a node restarts. For more
information, highly recommend reading the docs thoroughly.
http://www.elasticsearch.org/guide/reference/modules/gateway/

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, May 14, 2012 at 6:36 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

I have hadoop plugin with hdfs gateway but what I am seeing is that
indexes are still being written locally. Can you please help me understand
why it's being written locally?

ls -ltr data/elasticsearch/nodes/0/indices/twitter/0/index/

total 12

-rw-r--r-- 1 root root 0 May 14 15:33 write.lock

-rw-r--r-- 1 root root 20 May 14 15:33 segments.gen

-rw-r--r-- 1 root root 58 May 14 15:33 segments_1

-rw-r--r-- 1 root root 8 May 14 15:33 _checksums-1337034789114

hadoop fs -ls /elasticsearch

#returns nothing

config:

gateway.type: hdfs

gateway.hdfs.uri: hdfs://db1:54310

gateway.hdfs.path: elasticsearch


(Shay Banon) #5

By default, elasticsearch will snapshot the data to HDFS every 10 seconds,
I answered in another thread you posted regarding using the local gateway.

On Tue, May 15, 2012 at 2:40 AM, Mohit Anchlia mohitanchlia@gmail.comwrote:

Yes I read that and also have done recovery testing too, which seems to
recover everything. My question was when does elasticsearch writes/commits
data to Hadoop? Is it synchronously or async? Should I expect to lose any
data that might be in elasticsearch memory? Just trying to understand the
basics.

On Mon, May 14, 2012 at 4:23 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

There are two ways to configure ES for persistence (for the data to
survive full cluster restart)

  1. Local gateway, where the data persists on the servers
  2. Shared or central gateway (S3, Hadoop, or shared file system) where
    data is stored elsewhere.

In either case, data is still stored locally. With the shared gateway,
data is restored from that data store when a node restarts. For more
information, highly recommend reading the docs thoroughly.
http://www.elasticsearch.org/guide/reference/modules/gateway/

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, May 14, 2012 at 6:36 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

I have hadoop plugin with hdfs gateway but what I am seeing is that
indexes are still being written locally. Can you please help me understand
why it's being written locally?

ls -ltr data/elasticsearch/nodes/0/indices/twitter/0/index/

total 12

-rw-r--r-- 1 root root 0 May 14 15:33 write.lock

-rw-r--r-- 1 root root 20 May 14 15:33 segments.gen

-rw-r--r-- 1 root root 58 May 14 15:33 segments_1

-rw-r--r-- 1 root root 8 May 14 15:33 _checksums-1337034789114

hadoop fs -ls /elasticsearch

#returns nothing

config:

gateway.type: hdfs

gateway.hdfs.uri: hdfs://db1:54310

gateway.hdfs.path: elasticsearch


(system) #6