How to create user indexes on the fly

Paul_Loy · August 11, 2010, 3:12pm

Hi,

I have finally got a use-case for per-user indexing. I was wondering what
peoples' opinions are on the best way to check and then create an index and
type mapping on-the-fly.

i.e.

get data
check if index exists and create if not
check if mapping exists and create if not
index data

How should I go about 2 and 3 in an optimal way.

Thanks in advance,

Paul

(PS I never know when to use indexes or indices)

--

Paul Loy
paul@keteracel.com
http://justgiving.com/thetrafalgarway - 300 miles, 2 bicycles, 36 hours

Clinton_Gormley · August 11, 2010, 3:29pm

Hiya

I have finally got a use-case for per-user indexing. I was wondering
what peoples' opinions are on the best way to check and then create an
index and type mapping on-the-fly.

Don't forget that each shard is a Lucene instance, so if you have a
million users, you will need a LOT of boxes and memory to cope with
that.

i.e.
1. get data
2. check if index exists and create if not
3. check if mapping exists and create if not
4. index data
How should I go about 2 and 3 in an optimal way.

You can get a list of all known indices, but that may not be terribly
efficient.

It may be better to just try it and catch the error, eg:

create the index -> catch the error if it already exists
put the mapping -> should just work, if the mapping doesn't exist
or hasn't changed

clint

Paul_Loy · August 11, 2010, 3:46pm

Thanks Clint. As always, answers tend to create more questions !

I have finally got a use-case for per-user indexing. I was wondering

what peoples' opinions are on the best way to check and then create an
index and type mapping on-the-fly.

Don't forget that each shard is a Lucene instance, so if you have a
million users, you will need a LOT of boxes and memory to cope with
that.

Hmm... so are per user indexes not a good idea? We *are *expecting lots of
users. Will ES definitely keep an instance running for each index even if
that index has not been written to or read from for a while?

clint

--

Paul Loy
paul@keteracel.com
Edward Burge is fundraising for Help for Heroes - 300 miles, 2 bicycles, 36 hours

Clinton_Gormley · August 11, 2010, 3:54pm

Hi Paul

    Don't forget that each shard is a Lucene instance, so if you
    have a
    million users, you will need a LOT of boxes and memory to cope
    with
    that.
Hmm... so are per user indexes not a good idea? We are expecting lots
of users. Will ES definitely keep an instance running for each index
even if that index has not been written to or read from for a while?

"not a good idea" depends on your application, really, but you say that
you will have lots of users, so:

My understanding is that yes, at least one primary shard must be alive
for each index. I say "at least one" although by default you would have
more than one. (you can specify this at index creation time)

Why not just store a user_id in each document that needs to be filtered
by user? It will be way more efficient.

clint

ppearcy · August 11, 2010, 4:05pm

Another thought (and apologies for confusing the conversation by
adding another Paul), if you need per user mapping is to have a single
index w/ as many shards as needed, and each user can be their own
document type, as mappings are defined per document type.

Regards,
Paul P

On Aug 11, 9:54 am, Clinton Gormley clin...@iannounce.co.uk wrote:

Hi Paul
    Don't forget that each shard is a Lucene instance, so if you
    have a
    million users, you will need a LOT of boxes and memory to cope
    with
    that.
Hmm... so are per user indexes not a good idea? We are expecting lots
of users. Will ES definitely keep an instance running for each index
even if that index has not been written to or read from for a while?

"not a good idea" depends on your application, really, but you say that
you will have lots of users, so:

My understanding is that yes, at least one primary shard must be alive
for each index. I say "at least one" although by default you would have
more than one. (you can specify this at index creation time)

Why not just store a user_id in each document that needs to be filtered
by user? It will be way more efficient.

clint

Paul_Loy · August 11, 2010, 4:15pm

Ah, that's interesting. Any overheads doing this rather than the adding a
userId suggestion?

I guess this will make searches quick as by using the type you're
essencially pre-filtering.

Thanks Paul and Clint!

On Wed, Aug 11, 2010 at 5:05 PM, Paul ppearcy@gmail.com wrote:

Another thought (and apologies for confusing the conversation by
adding another Paul), if you need per user mapping is to have a single
index w/ as many shards as needed, and each user can be their own
document type, as mappings are defined per document type.

Regards,
Paul P

On Aug 11, 9:54 am, Clinton Gormley clin...@iannounce.co.uk wrote:
Hi Paul
    Don't forget that each shard is a Lucene instance, so if you
    have a
    million users, you will need a LOT of boxes and memory to cope
    with
    that.
Hmm... so are per user indexes not a good idea? We are expecting lots
of users. Will ES definitely keep an instance running for each index
even if that index has not been written to or read from for a while?

"not a good idea" depends on your application, really, but you say that
you will have lots of users, so:

My understanding is that yes, at least one primary shard must be alive
for each index. I say "at least one" although by default you would have
more than one. (you can specify this at index creation time)

Why not just store a user_id in each document that needs to be filtered
by user? It will be way more efficient.

clint

--

Paul Loy
paul@keteracel.com
Edward Burge is fundraising for Help for Heroes - 300 miles, 2 bicycles, 36 hours

kimchy · August 11, 2010, 5:04pm

Hi,

Its basically the same, the multi type support within an index is done by
using an _type field to each indexed document and automatically filtering by
it when applicable.

-shay.banon

On Wed, Aug 11, 2010 at 7:15 PM, Paul Loy keteracel@gmail.com wrote:

Ah, that's interesting. Any overheads doing this rather than the adding a
userId suggestion?

I guess this will make searches quick as by using the type you're
essencially pre-filtering.

Thanks Paul and Clint!

On Wed, Aug 11, 2010 at 5:05 PM, Paul ppearcy@gmail.com wrote:
Another thought (and apologies for confusing the conversation by
adding another Paul), if you need per user mapping is to have a single
index w/ as many shards as needed, and each user can be their own
document type, as mappings are defined per document type.

Regards,
Paul P

On Aug 11, 9:54 am, Clinton Gormley clin...@iannounce.co.uk wrote:
Hi Paul
    Don't forget that each shard is a Lucene instance, so if you
    have a
    million users, you will need a LOT of boxes and memory to cope
    with
    that.
Hmm... so are per user indexes not a good idea? We are expecting lots
of users. Will ES definitely keep an instance running for each index
even if that index has not been written to or read from for a while?

"not a good idea" depends on your application, really, but you say that
you will have lots of users, so:

My understanding is that yes, at least one primary shard must be alive
for each index. I say "at least one" although by default you would have
more than one. (you can specify this at index creation time)

Why not just store a user_id in each document that needs to be filtered
by user? It will be way more efficient.

clint
--

Paul Loy
paul@keteracel.com
Edward Burge is fundraising for Help for Heroes - 300 miles, 2 bicycles, 36 hours