Newbie help

Hi there,

   I'm quite new to ES and I'm currently doing some experiments to

evaluate it.

   The experiment consisted of the following:

            1. Only one ES node up using a shared fs gateway dir

(the only config I did).
2. A process doing live twitter indexing (using it's
filtered streaming api).
3. Used Win7 and only one hard disk.

   After ~6h indexing, I did a hard reboot of my computer and

restarted ES.

   In the end, I noticed the following:

           1. The gateway data dir had ~7000 files totaling ~9GB.
           2. It took ~2h for the ES node to become available.
           3. There were ~45k tweets indexed (not all tweets were

indexed due to applied filters).

   So, with so few documents indexed, why the cluster recovery

took so long? What configuration affects this behavior? And finally,
why there were so many files in gateway dir? Any way to compact them?
(maybe this slowed down recovery).

Thanks in advance,
Thiago Souza

Hi,

The number of files in the gateway is very strange, it should not be that
high. Where do you store the gateway?

-shay.banon

On Thu, Sep 16, 2010 at 6:24 PM, thiago tcostasouza@gmail.com wrote:

Hi there,

  I'm quite new to ES and I'm currently doing some experiments to

evaluate it.

  The experiment consisted of the following:

           1. Only one ES node up using a shared fs gateway dir

(the only config I did).
2. A process doing live twitter indexing (using it's
filtered streaming api).
3. Used Win7 and only one hard disk.

  After ~6h indexing, I did a hard reboot of my computer and

restarted ES.

  In the end, I noticed the following:

          1. The gateway data dir had ~7000 files totaling ~9GB.
          2. It took ~2h for the ES node to become available.
          3. There were ~45k tweets indexed (not all tweets were

indexed due to applied filters).

  So, with so few documents indexed, why the cluster recovery

took so long? What configuration affects this behavior? And finally,
why there were so many files in gateway dir? Any way to compact them?
(maybe this slowed down recovery).

Thanks in advance,
Thiago Souza

Hi Shay,

  Thanks for you reply!

  I used fs (in a dir). This number is the sum of all data dir

subdirectorie (there were 5 of them). Each one had ~1500 files.

Regards,
Thiago Souza

On Thu, Sep 16, 2010 at 15:58, Shay Banon shay.banon@elasticsearch.comwrote:

Hi,

The number of files in the gateway is very strange, it should not be
that high. Where do you store the gateway?

-shay.banon

On Thu, Sep 16, 2010 at 6:24 PM, thiago tcostasouza@gmail.com wrote:

Hi there,

  I'm quite new to ES and I'm currently doing some experiments to

evaluate it.

  The experiment consisted of the following:

           1. Only one ES node up using a shared fs gateway dir

(the only config I did).
2. A process doing live twitter indexing (using it's
filtered streaming api).
3. Used Win7 and only one hard disk.

  After ~6h indexing, I did a hard reboot of my computer and

restarted ES.

  In the end, I noticed the following:

          1. The gateway data dir had ~7000 files totaling ~9GB.
          2. It took ~2h for the ES node to become available.
          3. There were ~45k tweets indexed (not all tweets were

indexed due to applied filters).

  So, with so few documents indexed, why the cluster recovery

took so long? What configuration affects this behavior? And finally,
why there were so many files in gateway dir? Any way to compact them?
(maybe this slowed down recovery).

Thanks in advance,
Thiago Souza

That fs location is also on the local drive? Have you deleted the work dir
in elasticsearch before restarting (you shouldn't for fast recovery)? Ohh,
and one more thing, when you say it took 2 Hr for the nodes to become
available, what is available mean for you? The fact that you can search on
them, or a GREEN status in the cluster health API?

One more thing, if you can recreate it, then you can use the indices status
API and it will give you details on how long the recovery took, what data
was reused from the local work dir. It would be great if you can gist it.

-shay.banon

On Thu, Sep 16, 2010 at 10:03 PM, Thiago Souza tcostasouza@gmail.comwrote:

Hi Shay,

  Thanks for you reply!

  I used fs (in a dir). This number is the sum of all data dir

subdirectorie (there were 5 of them). Each one had ~1500 files.

Regards,
Thiago Souza

On Thu, Sep 16, 2010 at 15:58, Shay Banon shay.banon@elasticsearch.comwrote:

Hi,

The number of files in the gateway is very strange, it should not be
that high. Where do you store the gateway?

-shay.banon

On Thu, Sep 16, 2010 at 6:24 PM, thiago tcostasouza@gmail.com wrote:

Hi there,

  I'm quite new to ES and I'm currently doing some experiments to

evaluate it.

  The experiment consisted of the following:

           1. Only one ES node up using a shared fs gateway dir

(the only config I did).
2. A process doing live twitter indexing (using it's
filtered streaming api).
3. Used Win7 and only one hard disk.

  After ~6h indexing, I did a hard reboot of my computer and

restarted ES.

  In the end, I noticed the following:

          1. The gateway data dir had ~7000 files totaling ~9GB.
          2. It took ~2h for the ES node to become available.
          3. There were ~45k tweets indexed (not all tweets were

indexed due to applied filters).

  So, with so few documents indexed, why the cluster recovery

took so long? What configuration affects this behavior? And finally,
why there were so many files in gateway dir? Any way to compact them?
(maybe this slowed down recovery).

Thanks in advance,
Thiago Souza

Hi Shay,

That fs location is also on the local drive?
Yes

Have you deleted the work dir in elasticsearch before restarting?
No

What is available mean for you?
I mean that I could not access

http://localhost:9200/_search?q=*:*(browser didn't respond)

I didn't know that the status API would give me these details, I'll

check this.

Cheers

On Thu, Sep 16, 2010 at 20:32, Shay Banon shay.banon@elasticsearch.comwrote:

That fs location is also on the local drive? Have you deleted the work dir
in elasticsearch before restarting (you shouldn't for fast recovery)? Ohh,
and one more thing, when you say it took 2 Hr for the nodes to become
available, what is available mean for you? The fact that you can search on
them, or a GREEN status in the cluster health API?

One more thing, if you can recreate it, then you can use the indices status
API and it will give you details on how long the recovery took, what data
was reused from the local work dir. It would be great if you can gist it.

-shay.banon

On Thu, Sep 16, 2010 at 10:03 PM, Thiago Souza tcostasouza@gmail.comwrote:

Hi Shay,

  Thanks for you reply!

  I used fs (in a dir). This number is the sum of all data dir

subdirectorie (there were 5 of them). Each one had ~1500 files.

Regards,
Thiago Souza

On Thu, Sep 16, 2010 at 15:58, Shay Banon shay.banon@elasticsearch.comwrote:

Hi,

The number of files in the gateway is very strange, it should not be
that high. Where do you store the gateway?

-shay.banon

On Thu, Sep 16, 2010 at 6:24 PM, thiago tcostasouza@gmail.com wrote:

Hi there,

  I'm quite new to ES and I'm currently doing some experiments to

evaluate it.

  The experiment consisted of the following:

           1. Only one ES node up using a shared fs gateway dir

(the only config I did).
2. A process doing live twitter indexing (using it's
filtered streaming api).
3. Used Win7 and only one hard disk.

  After ~6h indexing, I did a hard reboot of my computer and

restarted ES.

  In the end, I noticed the following:

          1. The gateway data dir had ~7000 files totaling ~9GB.
          2. It took ~2h for the ES node to become available.
          3. There were ~45k tweets indexed (not all tweets were

indexed due to applied filters).

  So, with so few documents indexed, why the cluster recovery

took so long? What configuration affects this behavior? And finally,
why there were so many files in gateway dir? Any way to compact them?
(maybe this slowed down recovery).

Thanks in advance,
Thiago Souza