I have finally got a use-case for per-user indexing. I was wondering what
peoples' opinions are on the best way to check and then create an index and
type mapping on-the-fly.
I have finally got a use-case for per-user indexing. I was wondering
what peoples' opinions are on the best way to check and then create an
index and type mapping on-the-fly.
Don't forget that each shard is a Lucene instance, so if you have a
million users, you will need a LOT of boxes and memory to cope with
that.
i.e.
1. get data
2. check if index exists and create if not
3. check if mapping exists and create if not
4. index data
How should I go about 2 and 3 in an optimal way.
You can get a list of all known indices, but that may not be terribly
efficient.
It may be better to just try it and catch the error, eg:
create the index -> catch the error if it already exists
put the mapping -> should just work, if the mapping doesn't exist
or hasn't changed
Thanks Clint. As always, answers tend to create more questions !
I have finally got a use-case for per-user indexing. I was wondering
what peoples' opinions are on the best way to check and then create an
index and type mapping on-the-fly.
Don't forget that each shard is a Lucene instance, so if you have a
million users, you will need a LOT of boxes and memory to cope with
that.
Hmm... so are per user indexes not a good idea? We *are *expecting lots of
users. Will ES definitely keep an instance running for each index even if
that index has not been written to or read from for a while?
Don't forget that each shard is a Lucene instance, so if you
have a
million users, you will need a LOT of boxes and memory to cope
with
that.
Hmm... so are per user indexes not a good idea? We are expecting lots
of users. Will ES definitely keep an instance running for each index
even if that index has not been written to or read from for a while?
"not a good idea" depends on your application, really, but you say that
you will have lots of users, so:
My understanding is that yes, at least one primary shard must be alive
for each index. I say "at least one" although by default you would have
more than one. (you can specify this at index creation time)
Why not just store a user_id in each document that needs to be filtered
by user? It will be way more efficient.
Another thought (and apologies for confusing the conversation by
adding another Paul), if you need per user mapping is to have a single
index w/ as many shards as needed, and each user can be their own
document type, as mappings are defined per document type.
Don't forget that each shard is a Lucene instance, so if you
have a
million users, you will need a LOT of boxes and memory to cope
with
that.
Hmm... so are per user indexes not a good idea? We are expecting lots
of users. Will ES definitely keep an instance running for each index
even if that index has not been written to or read from for a while?
"not a good idea" depends on your application, really, but you say that
you will have lots of users, so:
My understanding is that yes, at least one primary shard must be alive
for each index. I say "at least one" although by default you would have
more than one. (you can specify this at index creation time)
Why not just store a user_id in each document that needs to be filtered
by user? It will be way more efficient.
Another thought (and apologies for confusing the conversation by
adding another Paul), if you need per user mapping is to have a single
index w/ as many shards as needed, and each user can be their own
document type, as mappings are defined per document type.
Don't forget that each shard is a Lucene instance, so if you
have a
million users, you will need a LOT of boxes and memory to cope
with
that.
Hmm... so are per user indexes not a good idea? We are expecting lots
of users. Will ES definitely keep an instance running for each index
even if that index has not been written to or read from for a while?
"not a good idea" depends on your application, really, but you say that
you will have lots of users, so:
My understanding is that yes, at least one primary shard must be alive
for each index. I say "at least one" although by default you would have
more than one. (you can specify this at index creation time)
Why not just store a user_id in each document that needs to be filtered
by user? It will be way more efficient.
Its basically the same, the multi type support within an index is done by
using an _type field to each indexed document and automatically filtering by
it when applicable.
Another thought (and apologies for confusing the conversation by
adding another Paul), if you need per user mapping is to have a single
index w/ as many shards as needed, and each user can be their own
document type, as mappings are defined per document type.
Don't forget that each shard is a Lucene instance, so if you
have a
million users, you will need a LOT of boxes and memory to cope
with
that.
Hmm... so are per user indexes not a good idea? We are expecting lots
of users. Will ES definitely keep an instance running for each index
even if that index has not been written to or read from for a while?
"not a good idea" depends on your application, really, but you say that
you will have lots of users, so:
My understanding is that yes, at least one primary shard must be alive
for each index. I say "at least one" although by default you would have
more than one. (you can specify this at index creation time)
Why not just store a user_id in each document that needs to be filtered
by user? It will be way more efficient.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.