Array type limitations?


(michael-4) #1

Hey guys.

I'm curious to know what are the limitations of an array type field? I'm
using ES to store an array of social-network follower IDs for each of my
users, and this can sometimes get big (10M+ items). Is this "okay" with
arrays? Or should I be using something else like a nested type? My mapping
is as followers:

"follower_ids": {
"type": "string",
"index_name": "follower_id",
"norms": {
"enabled": false
},
"index": "no",
"index_options": "docs"
}

Worth mentioning that I'm also using a "terms" path filter on this array
field.

Your input/feedback is much appreciated!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8a5c2f5b-1292-409f-9196-be8f0a00282f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

If you want to use more than 1024 terms, you will hit the Lucene max clause
limit.

Managing array is not a good idea with 10M+ items. You'd have to iterate it
by yourself for appending/modifying which will take a very long time (and
space).

Maybe you find this interesting for your model design

Jörg

On Wed, Jul 23, 2014 at 6:36 PM, michael@modernmast.com wrote:

Hey guys.

I'm curious to know what are the limitations of an array type field? I'm
using ES to store an array of social-network follower IDs for each of my
users, and this can sometimes get big (10M+ items). Is this "okay" with
arrays? Or should I be using something else like a nested type? My mapping
is as followers:

"follower_ids": {
"type": "string",
"index_name": "follower_id",
"norms": {
"enabled": false
},
"index": "no",
"index_options": "docs"
}

Worth mentioning that I'm also using a "terms" path filter on this array
field.

Your input/feedback is much appreciated!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8a5c2f5b-1292-409f-9196-be8f0a00282f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8a5c2f5b-1292-409f-9196-be8f0a00282f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFaP71emRvd1tAPEeKTMBcuKvk_wCOTFATNe%3DbMgRQERQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(michael-4) #3

Thanks for the response, Jörg.

When I filter by follower_ids, I actually use Elasticsearch's terms lookup
feature, so I never run into the 1024 max clause limit.

That said, because I append 5k IDs to that field at a time, you are correct
-- appending 5k IDs to an array with millions of elements can be costly
(>5s).

Judging by the link you shared, do you suggest I use a nested object of
arrays to store the IDs? Something like filter_ids: [ { id: 1}, { id: 2}, {
id: 3}, ... ]? If so, would appending 5k to a nested array containing
millions of objects not be as costly as appending to arrays are?

On Wednesday, July 23, 2014 6:21:12 PM UTC-4, Jörg Prante wrote:

If you want to use more than 1024 terms, you will hit the Lucene max
clause limit.

Managing array is not a good idea with 10M+ items. You'd have to iterate
it by yourself for appending/modifying which will take a very long time
(and space).

Maybe you find this interesting for your model design

http://de.slideshare.net/martijnvg/document-relations

Jörg

On Wed, Jul 23, 2014 at 6:36 PM, <mic...@modernmast.com <javascript:>>
wrote:

Hey guys.

I'm curious to know what are the limitations of an array type field? I'm
using ES to store an array of social-network follower IDs for each of my
users, and this can sometimes get big (10M+ items). Is this "okay" with
arrays? Or should I be using something else like a nested type? My mapping
is as followers:

"follower_ids": {
"type": "string",
"index_name": "follower_id",
"norms": {
"enabled": false
},
"index": "no",
"index_options": "docs"
}

Worth mentioning that I'm also using a "terms" path filter on this array
field.

Your input/feedback is much appreciated!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8a5c2f5b-1292-409f-9196-be8f0a00282f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8a5c2f5b-1292-409f-9196-be8f0a00282f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ab06e64c-0458-4e68-aa63-6e106a876bfc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4