Index performace with large arrays


(SteveM) #1

Hi there. I'm new to ES and would appreciate some advice on design concepts
around large arrays.

I am writing a help tip feature that pops up a message each time a user
logs in. The user can flag a checkbox if they do not want to see this
particular tip again.

After playing with ElasticSearch the solution I came up with involved using
a HelpTip document which contains an array of UserIds (identifying the
users who have flagged that they do not want to see this tip again).

Example1:
HelpTip
{
"title": "Need help getting started?",
"text": "Watch our overview video",
"userArray": ["id1", "id2"]
}

I know ES can cope with large arrays but I wonder if there would be
performance issues if this array grew to 4000+ IDs. This record would be
regularly re-indexed (each time a new user ID is added to the array). would
there be performance issues when indexing a document containing a large
array field?

Is this a sensible approach or would I be better using a relational model
and holding the Help Tip info and the list of users in separate documents,
then parsing them using two separate calls from my application?
Example 2:
HelpTip
{
"title": "Need help getting started?",
"text": "Watch our overview video"
}

HelpTipUserFlags
{
HelpTipId: "1",
UserId: "ID1"
}

Hope this makes sense. Thanks in advance for any help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9cba2c32-6266-4b87-b708-83ee64499dbf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

If the user can opt out, I assume you have fewer opt outs than opt ins,
then you should use opt outs for an andnot filter :slight_smile:

In that case, I would create an opt out index in the form index/type/id

users/optouts/

with docs containing a quite short array of opt outs

{ "optouts: [ "id1", "id2", ... "idn" ] }

so you can get the doc, read the opt out array, and add it as an "and not"
filter to your help tip query.

You could also add this optouts array to the user index, but this depends
on your overall design. If you want to remove the opt outs, you could
simply drop the optputs mapping type.

Regarding the array length, you can add as much values as you like, ES can
handle that. If the docs get long (I mean thousands of entries), they will
take substantial time just for fetching them, so I think you should prefer
a model with data as short as possible.

Jörg

On Mon, Jul 21, 2014 at 4:58 PM, Steve Mee steve@genialgenetics.com wrote:

Hi there. I'm new to ES and would appreciate some advice on design
concepts around large arrays.

I am writing a help tip feature that pops up a message each time a user
logs in. The user can flag a checkbox if they do not want to see this
particular tip again.

After playing with ElasticSearch the solution I came up with involved
using a HelpTip document which contains an array of UserIds (identifying
the users who have flagged that they do not want to see this tip again).

Example1:
HelpTip
{
"title": "Need help getting started?",
"text": "Watch our overview video",
"userArray": ["id1", "id2"]
}

I know ES can cope with large arrays but I wonder if there would be
performance issues if this array grew to 4000+ IDs. This record would be
regularly re-indexed (each time a new user ID is added to the array). would
there be performance issues when indexing a document containing a large
array field?

Is this a sensible approach or would I be better using a relational model
and holding the Help Tip info and the list of users in separate documents,
then parsing them using two separate calls from my application?
Example 2:
HelpTip
{
"title": "Need help getting started?",
"text": "Watch our overview video"
}

HelpTipUserFlags
{
HelpTipId: "1",
UserId: "ID1"
}

Hope this makes sense. Thanks in advance for any help.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9cba2c32-6266-4b87-b708-83ee64499dbf%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9cba2c32-6266-4b87-b708-83ee64499dbf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFw7P8FQu9%2Bc2ajj0Vg2wNvbpz%3D%2Bo9Af8R-5p9Cj8-7FQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(SteveM) #3

Thanks for the response Jörg. That tells me exactly what I need to know...
stay away from very large arrays here in my design :slight_smile:

Cheers - Steve

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f06b544f-8aec-4c44-aa38-ce53e5f0be74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4