Greetings,
I have a couple of questions about keys and routing.
Let’s imagine that I have
- A set of time-based Indexes. The indexes are time-based because the
overall set is unbounded (it grows by over 1M/day) . Thus, we'll have, say,
an Index per Quarter to keep them bounded individually and therefore
maintain predictable performance.
- The documents stored in these Indexes are “keyed” by MID and "routed
to” by PID.
- Every document has a unique MID (it’s id_) and contains a PID field
that is unique to the document but not across all documents.
- The common use case is to want all the MIDs for a given PID.
So no problem there. (Hopefully that makes sense…)
My first question; when I lookup by MID in a given Index, and I do not have
a PID (routing key), is that an inefficient lookup?
I.e. will it have to scan all of the Shards to find it??
And my second question is really more about design.
I also need to be able to lookup a document by it’s CID and I don’t
necessarily know which time-based Index I will find it in. (E.g. it may
have been inactive and then be resurrected)
I was hoping to avoid having some sort of metadata Index that could yield
this info for me (I.e. given a CID, return it’s PID and current Index)
because it will end up being unbounded (we are talking billions of entries
eventually).
Perhaps this won’t really matter because it is only looked-up by key, which
should be fast as it would go to the correct Shard.
And the data set is only a few bytes?? But still, it seems like a I would
be creating an eventual problem.
Although the alternative, looking in every time-based index seems much
worse.
Any advise would be greatly appreciated.
Thanks,
— Chris
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/83c68139-773d-4000-8eee-2e619c6302b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I apologize having to reply to my own message.
But please replace all CID w/ MID below. I started editing the text and
inadvertently hit Send.
Thanks,
-- Chris
On Friday, June 6, 2014 1:03:20 PM UTC-5, Chris Berry wrote:
Greetings,
I have a couple of questions about keys and routing.
Let’s imagine that I have
- A set of time-based Indexes. The indexes are time-based because the
overall set is unbounded (it grows by over 1M/day) . Thus, we'll have, say,
an Index per Quarter to keep them bounded individually and therefore
maintain predictable performance.
- The documents stored in these Indexes are “keyed” by MID and
"routed to” by PID.
- Every document has a unique MID (it’s id_) and contains a PID field
that is unique to the document but not across all documents.
- The common use case is to want all the MIDs for a given PID.
So no problem there. (Hopefully that makes sense…)
My first question; when I lookup by MID in a given Index, and I do not
have a PID (routing key), is that an inefficient lookup?
I.e. will it have to scan all of the Shards to find it??
And my second question is really more about design.
I also need to be able to lookup a document by it’s CID and I don’t
necessarily know which time-based Index I will find it in. (E.g. it may
have been inactive and then be resurrected)
I was hoping to avoid having some sort of metadata Index that could yield
this info for me (I.e. given a CID, return it’s PID and current Index)
because it will end up being unbounded (we are talking billions of entries
eventually).
Perhaps this won’t really matter because it is only looked-up by key,
which should be fast as it would go to the correct Shard.
And the data set is only a few bytes?? But still, it seems like a I would
be creating an eventual problem.
Although the alternative, looking in every time-based index seems much
worse.
Any advise would be greatly appreciated.
Thanks,
— Chris
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e1d66ad-e177-423a-813b-e95f0ec52697%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.