Impact of very large number of aliases


(Gordon Tillman) #1

Greetings All,

Are there any known performance issues to consider in the event that we use
a very large number of aliases in an ES cluster? The use case that
motivates this question is a follows.

Here is an extremely- simplified representation of some data we want to
index:

{
"parent": "",
"name": "",

   "type": "file" | "container",

}

The is a UUID. Our thoughts are to have an
over-sharded index to start with for storing all of these objects in, using
the "parent" as a route. If a given parent container ends up with enough
objects in it such that searches for the objects inside that container
start to become non-performant, we can re-index that container's
information into its own index and update the alias for that container
accordingly.

This means that any time we create a container, we create alias to use when
searching that container's list of children, and the number of containers
in the system will get large.

Thanks in advance for any feedback you may have.

--gordon

--


(Igor Motov) #2

It primarily depends on your definition of a very large number and how
often you are going to update the aliases. The major issue that you might
run into here is that the entire list is a part of the cluster state and it
is sent to all nodes on each cluster state update (which happens when you
add or change an alias for example). So, I would suggest testing creation
and deletion of aliases to determine if you can get acceptable response
times.

On Monday, November 5, 2012 12:06:41 PM UTC-5, Gordon Tillman wrote:

Greetings All,

Are there any known performance issues to consider in the event that we
use a very large number of aliases in an ES cluster? The use case that
motivates this question is a follows.

Here is an extremely- simplified representation of some data we want to
index:

{
"parent": "",
"name": "",

   "type": "file" | "container",

}

The is a UUID. Our thoughts are to have an
over-sharded index to start with for storing all of these objects in, using
the "parent" as a route. If a given parent container ends up with enough
objects in it such that searches for the objects inside that container
start to become non-performant, we can re-index that container's
information into its own index and update the alias for that container
accordingly.

This means that any time we create a container, we create alias to use
when searching that container's list of children, and the number of
containers in the system will get large.

Thanks in advance for any feedback you may have.

--gordon

--


(Gordon Tillman) #3

Igor thank you very much for the information. I appreciate your time and
trouble.

-- gordon

On Monday, November 5, 2012 5:40:45 PM UTC-6, Igor Motov wrote:

It primarily depends on your definition of a very large number and how
often you are going to update the aliases. The major issue that you might
run into here is that the entire list is a part of the cluster state and it
is sent to all nodes on each cluster state update (which happens when you
add or change an alias for example). So, I would suggest testing creation
and deletion of aliases to determine if you can get acceptable response
times.

On Monday, November 5, 2012 12:06:41 PM UTC-5, Gordon Tillman wrote:

Greetings All,

Are there any known performance issues to consider in the event that we
use a very large number of aliases in an ES cluster? The use case that
motivates this question is a follows.

Here is an extremely- simplified representation of some data we want to
index:

{
"parent": "",
"name": "",

   "type": "file" | "container",

}

The is a UUID. Our thoughts are to have an
over-sharded index to start with for storing all of these objects in, using
the "parent" as a route. If a given parent container ends up with enough
objects in it such that searches for the objects inside that container
start to become non-performant, we can re-index that container's
information into its own index and update the alias for that container
accordingly.

This means that any time we create a container, we create alias to use
when searching that container's list of children, and the number of
containers in the system will get large.

Thanks in advance for any feedback you may have.

--gordon

--


(system) #4