DOS attack Elasticsearch with Mappings

So an Elasticsearch clusters I help run had an interesting issue last week
around mappings and I wanted to get the communities thoughts about how to
handle it.

Issue:
Our cluster one morning went into utter chaos for no apparent reason. We
had nodes dropping constantly (master and data type nodes) and lots of
network exceptions in a our log files. The cluster kept going red from all
the dropped nodes and the cluster was totally unresponsive to external
commands.

Some Backgound:
Our cluster is fairly open to our users, meaning they can index what ever
they want without needing approval (this may have to change based on what
happened). The content stored is usually generated from .Net objects and
serialized using the Netwonsoft json serializer.

Cause:
After 6hrs of investigation while trying to get our cluster stable, this is
what we found:

We had a new document type (around 30,000 documents) indexed into the
cluster over a 1 hour window containing the .Net equivalent of a dictionary
in json format. When a dictionary is serialized to json, it ends up with a
json object containing a list of properties and values. The current
behavior of Elasticsearch is to generate a mapping definition for each
field name in a json object. So when you serialize a dictionary, it means
every 'key' in the dictionary gets its own mapping definition. It turns out
this can lead to nasty consequences when indexed in Elasticsearch...

Essentially, every document contained its own list of unique keys which
resulted in Elasticsearch generating mapping definitions for all the keys.
We found this out by noticing that the json type with the dictionary
continuously kept having is mappings updated (based on the master node log
files). The continual updating of the mappings (which is part of the
overall state file) caused the master nodes to lock up on the updates,
effectively stopping all other cluster operations. The state file upon
further investigation was over 70MB large by the time we ended up stopping
the cluster. Stopping the cluster was the only way to stop updates to the
mappings. The large mapping file we suspect was one of the major reasons
for nodes dropping; connections would timeout during the large file copy
(i'm assuming the state is passed around the nodes in the cluster).

Solution:
As previously mentioned we had to stop the cluster. We then had to make
sure that all indexing operations were stopped. Upon restarting the cluster
we deleted all documents of the poisonous document type (which took a
while). This resulted is a much smaller state file and a stable cluster.

Prevention:
So this is my real question for the community, what is the correct action
for preventing this in the future (or does it already exist). We could
obviously start more closely reviewing what goes into our cluster, but
should there be a feature in Elasticsearch to prevent this (assuming it
doesn't already exist)? I'm assuming that there are a number of users who
have clusters where they don't review everything that goes into their
cluster. So would it make sense to have Elasticsearch provide some feature
to prevent this issue, which is the equivalent to a DOS attack on the
cluster?

Thanks for reading this and I look forward to your responses!

-Josh Montgomery

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

If the cluster is that open to users I don't think it'd be easy to prevent
a malicious user from intentionally DOSing it. But in this case I think you
could make the default for all fields be non-dynamic. That way users have
to intentionally send all mapping updates. It'd prevent this short of
unintentional DOS.

I think this is a setting that you can change and I think that it would
only effect new indexes but I admit to not having done it and going from a
vague memory of seeing a setting somewhere.

Nik
On Aug 24, 2014 11:08 PM, "Joshua Montgomery" josh1s4live@gmail.com wrote:

So an Elasticsearch clusters I help run had an interesting issue last week
around mappings and I wanted to get the communities thoughts about how to
handle it.

Issue:
Our cluster one morning went into utter chaos for no apparent reason. We
had nodes dropping constantly (master and data type nodes) and lots of
network exceptions in a our log files. The cluster kept going red from all
the dropped nodes and the cluster was totally unresponsive to external
commands.

Some Backgound:
Our cluster is fairly open to our users, meaning they can index what ever
they want without needing approval (this may have to change based on what
happened). The content stored is usually generated from .Net objects and
serialized using the Netwonsoft json serializer.

Cause:
After 6hrs of investigation while trying to get our cluster stable, this
is what we found:

We had a new document type (around 30,000 documents) indexed into the
cluster over a 1 hour window containing the .Net equivalent of a dictionary
in json format. When a dictionary is serialized to json, it ends up with a
json object containing a list of properties and values. The current
behavior of Elasticsearch is to generate a mapping definition for each
field name in a json object. So when you serialize a dictionary, it means
every 'key' in the dictionary gets its own mapping definition. It turns out
this can lead to nasty consequences when indexed in Elasticsearch...

Essentially, every document contained its own list of unique keys which
resulted in Elasticsearch generating mapping definitions for all the keys.
We found this out by noticing that the json type with the dictionary
continuously kept having is mappings updated (based on the master node log
files). The continual updating of the mappings (which is part of the
overall state file) caused the master nodes to lock up on the updates,
effectively stopping all other cluster operations. The state file upon
further investigation was over 70MB large by the time we ended up stopping
the cluster. Stopping the cluster was the only way to stop updates to the
mappings. The large mapping file we suspect was one of the major reasons
for nodes dropping; connections would timeout during the large file copy
(i'm assuming the state is passed around the nodes in the cluster).

Solution:
As previously mentioned we had to stop the cluster. We then had to make
sure that all indexing operations were stopped. Upon restarting the cluster
we deleted all documents of the poisonous document type (which took a
while). This resulted is a much smaller state file and a stable cluster.

Prevention:
So this is my real question for the community, what is the correct action
for preventing this in the future (or does it already exist). We could
obviously start more closely reviewing what goes into our cluster, but
should there be a feature in Elasticsearch to prevent this (assuming it
doesn't already exist)? I'm assuming that there are a number of users who
have clusters where they don't review everything that goes into their
cluster. So would it make sense to have Elasticsearch provide some feature
to prevent this issue, which is the equivalent to a DOS attack on the
cluster?

Thanks for reading this and I look forward to your responses!

-Josh Montgomery

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0b-Q1y1152vA%3D%2BCYERGZxuk92iLDG3U-0L18q1oc1oxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

So you can modify the dynamic mapping setting to be off or strict. But that
means everything that goes into the cluster would have to have a review
process which is very time consuming. The primary goal of our system is
provide a general purpose backend for search that users can have access to
immediately. If we had to turn off dynamic mapping, we couldn't offer this
primary goal. Maybe a potential solution is to have a setting which limits
the number of fields that could be indexed for a type?

On Sunday, August 24, 2014 8:21:51 PM UTC-7, Nikolas Everett wrote:

If the cluster is that open to users I don't think it'd be easy to prevent
a malicious user from intentionally DOSing it. But in this case I think you
could make the default for all fields be non-dynamic. That way users have
to intentionally send all mapping updates. It'd prevent this short of
unintentional DOS.

I think this is a setting that you can change and I think that it would
only effect new indexes but I admit to not having done it and going from a
vague memory of seeing a setting somewhere.

Nik
On Aug 24, 2014 11:08 PM, "Joshua Montgomery" <josh1...@gmail.com
<javascript:>> wrote:

So an Elasticsearch clusters I help run had an interesting issue last
week around mappings and I wanted to get the communities thoughts about how
to handle it.

Issue:
Our cluster one morning went into utter chaos for no apparent reason. We
had nodes dropping constantly (master and data type nodes) and lots of
network exceptions in a our log files. The cluster kept going red from all
the dropped nodes and the cluster was totally unresponsive to external
commands.

Some Backgound:
Our cluster is fairly open to our users, meaning they can index what ever
they want without needing approval (this may have to change based on what
happened). The content stored is usually generated from .Net objects and
serialized using the Netwonsoft json serializer.

Cause:
After 6hrs of investigation while trying to get our cluster stable, this
is what we found:

We had a new document type (around 30,000 documents) indexed into the
cluster over a 1 hour window containing the .Net equivalent of a dictionary
in json format. When a dictionary is serialized to json, it ends up with a
json object containing a list of properties and values. The current
behavior of Elasticsearch is to generate a mapping definition for each
field name in a json object. So when you serialize a dictionary, it means
every 'key' in the dictionary gets its own mapping definition. It turns out
this can lead to nasty consequences when indexed in Elasticsearch...

Essentially, every document contained its own list of unique keys which
resulted in Elasticsearch generating mapping definitions for all the keys.
We found this out by noticing that the json type with the dictionary
continuously kept having is mappings updated (based on the master node log
files). The continual updating of the mappings (which is part of the
overall state file) caused the master nodes to lock up on the updates,
effectively stopping all other cluster operations. The state file upon
further investigation was over 70MB large by the time we ended up stopping
the cluster. Stopping the cluster was the only way to stop updates to the
mappings. The large mapping file we suspect was one of the major reasons
for nodes dropping; connections would timeout during the large file copy
(i'm assuming the state is passed around the nodes in the cluster).

Solution:
As previously mentioned we had to stop the cluster. We then had to make
sure that all indexing operations were stopped. Upon restarting the cluster
we deleted all documents of the poisonous document type (which took a
while). This resulted is a much smaller state file and a stable cluster.

Prevention:
So this is my real question for the community, what is the correct action
for preventing this in the future (or does it already exist). We could
obviously start more closely reviewing what goes into our cluster, but
should there be a feature in Elasticsearch to prevent this (assuming it
doesn't already exist)? I'm assuming that there are a number of users who
have clusters where they don't review everything that goes into their
cluster. So would it make sense to have Elasticsearch provide some feature
to prevent this issue, which is the equivalent to a DOS attack on the
cluster?

Thanks for reading this and I look forward to your responses!

-Josh Montgomery

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/04ccd657-14d5-4f9b-acb6-f8715ef51440%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Joshua,

Was the issue tied to the byte size of the mappings or the fact that they
contained lots of fields? I'm asking because there was a performance
inefficiency in versions < 1.3.0 that caused every field introduction to
perform in quadratic time[1]. It probably doesn't solve your problem but
I'm wondering if it could be related.

[1] Improve performance for many new fields introduction in mapping by kimchy · Pull Request #6707 · elastic/elasticsearch · GitHub

On Mon, Aug 25, 2014 at 5:40 AM, Joshua Montgomery josh1s4live@gmail.com
wrote:

So you can modify the dynamic mapping setting to be off or strict. But
that means everything that goes into the cluster would have to have a
review process which is very time consuming. The primary goal of our system
is provide a general purpose backend for search that users can have access
to immediately. If we had to turn off dynamic mapping, we couldn't offer
this primary goal. Maybe a potential solution is to have a setting which
limits the number of fields that could be indexed for a type?

On Sunday, August 24, 2014 8:21:51 PM UTC-7, Nikolas Everett wrote:

If the cluster is that open to users I don't think it'd be easy to
prevent a malicious user from intentionally DOSing it. But in this case I
think you could make the default for all fields be non-dynamic. That way
users have to intentionally send all mapping updates. It'd prevent this
short of unintentional DOS.

I think this is a setting that you can change and I think that it would
only effect new indexes but I admit to not having done it and going from a
vague memory of seeing a setting somewhere.

Nik
On Aug 24, 2014 11:08 PM, "Joshua Montgomery" josh1...@gmail.com wrote:

So an Elasticsearch clusters I help run had an interesting issue last
week around mappings and I wanted to get the communities thoughts about how
to handle it.

Issue:
Our cluster one morning went into utter chaos for no apparent reason. We
had nodes dropping constantly (master and data type nodes) and lots of
network exceptions in a our log files. The cluster kept going red from all
the dropped nodes and the cluster was totally unresponsive to external
commands.

Some Backgound:
Our cluster is fairly open to our users, meaning they can index what
ever they want without needing approval (this may have to change based on
what happened). The content stored is usually generated from .Net objects
and serialized using the Netwonsoft json serializer.

Cause:
After 6hrs of investigation while trying to get our cluster stable, this
is what we found:

We had a new document type (around 30,000 documents) indexed into the
cluster over a 1 hour window containing the .Net equivalent of a dictionary
in json format. When a dictionary is serialized to json, it ends up with a
json object containing a list of properties and values. The current
behavior of Elasticsearch is to generate a mapping definition for each
field name in a json object. So when you serialize a dictionary, it means
every 'key' in the dictionary gets its own mapping definition. It turns out
this can lead to nasty consequences when indexed in Elasticsearch...

Essentially, every document contained its own list of unique keys which
resulted in Elasticsearch generating mapping definitions for all the keys.
We found this out by noticing that the json type with the dictionary
continuously kept having is mappings updated (based on the master node log
files). The continual updating of the mappings (which is part of the
overall state file) caused the master nodes to lock up on the updates,
effectively stopping all other cluster operations. The state file upon
further investigation was over 70MB large by the time we ended up stopping
the cluster. Stopping the cluster was the only way to stop updates to the
mappings. The large mapping file we suspect was one of the major reasons
for nodes dropping; connections would timeout during the large file copy
(i'm assuming the state is passed around the nodes in the cluster).

Solution:
As previously mentioned we had to stop the cluster. We then had to make
sure that all indexing operations were stopped. Upon restarting the cluster
we deleted all documents of the poisonous document type (which took a
while). This resulted is a much smaller state file and a stable cluster.

Prevention:
So this is my real question for the community, what is the correct
action for preventing this in the future (or does it already exist). We
could obviously start more closely reviewing what goes into our cluster,
but should there be a feature in Elasticsearch to prevent this (assuming it
doesn't already exist)? I'm assuming that there are a number of users who
have clusters where they don't review everything that goes into their
cluster. So would it make sense to have Elasticsearch provide some feature
to prevent this issue, which is the equivalent to a DOS attack on the
cluster?

Thanks for reading this and I look forward to your responses!

-Josh Montgomery

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/04ccd657-14d5-4f9b-acb6-f8715ef51440%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/04ccd657-14d5-4f9b-acb6-f8715ef51440%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7NiXEP9x7gHaw0OOXuEVWXX6dfV5c2MV2RPF0MRxC8jg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.