Turning on shared fs gateway

Hi there. After an index is created, if you turn on shared fs as gateway,
and restart the node, I "loose" all indices. Does it means that gateway
will always override the cluster state?

Regards

--

You can't change a gateway for indices that have been created already.

Best regards,

Jörg

On Friday, November 30, 2012 11:58:39 PM UTC+1, Vinicius Carvalho wrote:

Hi there. After an index is created, if you turn on shared fs as gateway,
and restart the node, I "loose" all indices. Does it means that gateway
will always override the cluster state?

Regards

--

Thanks Jorg,

that will save me some time :slight_smile:

I'm trying to come up with a solution for a problem we are having right
now. Our index is only updated once in a day. We re-index around 5-10% of
the index size (all data that was changed in a day).

The problem, is that indexing on the live cluster, is taking over 5x more
time than on a separate node for instance.

I was trying to isolate a node, index on it, and then plug it back in the
cluster. I was trying to do that by using a shared gateway.

I wonder if there's an easier, or better way of doing this.

I'm still experimenting with different scenarios.

Regards

On Friday, November 30, 2012 7:36:04 PM UTC-5, Jörg Prante wrote:

You can't change a gateway for indices that have been created already.

Best regards,

Jörg

On Friday, November 30, 2012 11:58:39 PM UTC+1, Vinicius Carvalho wrote:

Hi there. After an index is created, if you turn on shared fs as gateway,
and restart the node, I "loose" all indices. Does it means that gateway
will always override the cluster state?

Regards

--

How large is your index volume? How long does your update indexing take? My
suggestion, create an alias for the existing index. Add updates to a fresh
second index, that is faster than modifying an index. Then, after second
index with updates is ready, extend the alias to the second index. Next
day, repeat the process. Drop old indexes on your demand. This is an
Elasticsearch technique known as "rolling indices".

Cheers,

Jörg

--

Hi Jorg,

thanks for all the help.

Our index is around 100gb. Updates takes 1hr on a single node, or if it's
executed against a live node at the cluster (4 nodes) takes around 5hrs.

I've just checked and we are actually re-indexing all data every day, its a
flaw on our end that we need to fix, since only 10% of it or less, is
updated every day. Part of the problem is that the data comes from a legacy
SGDB that does not have version columns for the entries.

We tried the alias, it did not work.

We first created the alias: all and pointed it to an index named:
all-yyyy-mm-dd

we served all requests to the all index

next day we created an index named: all-yyyy-mm-dd+1

Once indexing process was finished, we pointed the all alias to the latest
index

What we saw, is that clients could not connect to the node (We start having
noNodeAvailableException on clients)

Regards

BTW: ES 0.19.11

On Friday, November 30, 2012 7:51:38 PM UTC-5, Jörg Prante wrote:

How large is your index volume? How long does your update indexing take?
My suggestion, create an alias for the existing index. Add updates to a
fresh second index, that is faster than modifying an index. Then, after
second index with updates is ready, extend the alias to the second index.
Next day, repeat the process. Drop old indexes on your demand. This is an
Elasticsearch technique known as "rolling indices".

Cheers,

Jörg

--

NoNodeAvailableException is a message indicating your clients are not
properly connecting to the cluster nodes, which is unrelated to index
aliasing. There might be other issues under the hood, check your log file
for this.

However, the rolling index strategy is the one I would choose. You are
right, you are struggling with a total re-indexing strategy, a challenge
which is due to your data source, because it adds significant overhead.

What surprises me it takes 5hrs on 4 nodes and 1hr on 1 node. It should be
much faster with 4 nodes than with 1 node with the same data volume. What
is the load on the 4 node system? How do you connect to the 4 node system,
to all of the nodes? Which client do you use? Are query response times
suffering while indexing?

Please note, it is recommendable to create the new index with refresh set
to -1 and without replica shards. Add replica levels and reset refresh
setting later, after indexing completed. This should reduce the load and
speed up the indexing process a little bit, though it does not minimize the
90% overhead.

Best regards,

Jörg

On Saturday, December 1, 2012 2:18:44 AM UTC+1, Vinicius Carvalho wrote:

Hi Jorg,

thanks for all the help.

Our index is around 100gb. Updates takes 1hr on a single node, or if it's
executed against a live node at the cluster (4 nodes) takes around 5hrs.

I've just checked and we are actually re-indexing all data every day, its
a flaw on our end that we need to fix, since only 10% of it or less, is
updated every day. Part of the problem is that the data comes from a legacy
SGDB that does not have version columns for the entries.

We tried the alias, it did not work.

We first created the alias: all and pointed it to an index named:
all-yyyy-mm-dd

we served all requests to the all index

next day we created an index named: all-yyyy-mm-dd+1

Once indexing process was finished, we pointed the all alias to the latest
index

What we saw, is that clients could not connect to the node (We start
having noNodeAvailableException on clients)

Regards

BTW: ES 0.19.11

On Friday, November 30, 2012 7:51:38 PM UTC-5, Jörg Prante wrote:

How large is your index volume? How long does your update indexing take?
My suggestion, create an alias for the existing index. Add updates to a
fresh second index, that is faster than modifying an index. Then, after
second index with updates is ready, extend the alias to the second index.
Next day, repeat the process. Drop old indexes on your demand. This is an
Elasticsearch technique known as "rolling indices".

Cheers,

Jörg

--