A couple of questions about keys and routing


(Chris Berry-2) #1

Greetings,

I have a couple of questions about keys and routing.

Let’s imagine that I have

  1. A set of time-based Indexes. The indexes are time-based because the
    overall set is unbounded (it grows by over 1M/day) . Thus, we'll have, say,
    an Index per Quarter to keep them bounded individually and therefore
    maintain predictable performance.
  2. The documents stored in these Indexes are “keyed” by MID and "routed
    to” by PID.
  3. Every document has a unique MID (it’s id_) and contains a PID field
    that is unique to the document but not across all documents.
  4. The common use case is to want all the MIDs for a given PID.

So no problem there. (Hopefully that makes sense…)

My first question; when I lookup by MID in a given Index, and I do not have
a PID (routing key), is that an inefficient lookup?
I.e. will it have to scan all of the Shards to find it??

And my second question is really more about design.
I also need to be able to lookup a document by it’s CID and I don’t
necessarily know which time-based Index I will find it in. (E.g. it may
have been inactive and then be resurrected)
I was hoping to avoid having some sort of metadata Index that could yield
this info for me (I.e. given a CID, return it’s PID and current Index)
because it will end up being unbounded (we are talking billions of entries
eventually).
Perhaps this won’t really matter because it is only looked-up by key, which
should be fast as it would go to the correct Shard.
And the data set is only a few bytes?? But still, it seems like a I would
be creating an eventual problem.
Although the alternative, looking in every time-based index seems much
worse.

Any advise would be greatly appreciated.

Thanks,
— Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/83c68139-773d-4000-8eee-2e619c6302b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Chris Berry-2) #2

I apologize having to reply to my own message.
But please replace all CID w/ MID below. I started editing the text and
inadvertently hit Send.
Thanks,
-- Chris

On Friday, June 6, 2014 1:03:20 PM UTC-5, Chris Berry wrote:

Greetings,

I have a couple of questions about keys and routing.

Let’s imagine that I have

  1. A set of time-based Indexes. The indexes are time-based because the
    overall set is unbounded (it grows by over 1M/day) . Thus, we'll have, say,
    an Index per Quarter to keep them bounded individually and therefore
    maintain predictable performance.
  2. The documents stored in these Indexes are “keyed” by MID and
    "routed to” by PID.
  3. Every document has a unique MID (it’s id_) and contains a PID field
    that is unique to the document but not across all documents.
  4. The common use case is to want all the MIDs for a given PID.

So no problem there. (Hopefully that makes sense…)

My first question; when I lookup by MID in a given Index, and I do not
have a PID (routing key), is that an inefficient lookup?
I.e. will it have to scan all of the Shards to find it??

And my second question is really more about design.
I also need to be able to lookup a document by it’s CID and I don’t
necessarily know which time-based Index I will find it in. (E.g. it may
have been inactive and then be resurrected)
I was hoping to avoid having some sort of metadata Index that could yield
this info for me (I.e. given a CID, return it’s PID and current Index)
because it will end up being unbounded (we are talking billions of entries
eventually).
Perhaps this won’t really matter because it is only looked-up by key,
which should be fast as it would go to the correct Shard.
And the data set is only a few bytes?? But still, it seems like a I would
be creating an eventual problem.
Although the alternative, looking in every time-based index seems much
worse.

Any advise would be greatly appreciated.

Thanks,
— Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e1d66ad-e177-423a-813b-e95f0ec52697%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3