Answers inline:
What i mean is how is _uid indexed? Reversed index?
_uids are indexed into a bloom filter. The process looks like this:
- Find the appropriate shard via routing, forward request to a node
with the shard
- Sequentially iterate over each segment in the shard
- Check bloom filter if DocID exists in this segment. If false,
move to next shard (blooms guarantee negative results)
- If yes, perform search through segment, as blooms have some amount
of false positive
In mongodb TTL, the timestamp is indexed and the db runs a query every x
seconds to find the expired documents and delete .
While in Cassandra, expired cells are only deleted on compactions. There
is not another index that scans like mongodb.
Which TTL type does elasticsearch support?
See my answer in the first email. Elasticsearch's background process runs
every 60s, compares timestamps and then marks documents as deleted. The
physical deletion of data doesn't happen until a merge later removes it.
It looks like you can't range_query on _id field without indexing ? (like
get and multiget doesn't require indexing)
Correct. Queries only operate on inverted index data, so they require the
_id to be indexed along with the other fields. Get/Multiget use a
different mechanism (detailed above w/ bloom filter), they aren't queries,
so they don't require an inverted index.
Also when documents are filtered and no sorting is specified, are documents
sorted by _id ?
If docs are filtered and have no score, the score is automatically set to
- Since all documents have the same score, the resulting order is
effectively random. In actuality it is the order of the documents in the
segments, but the merge process tends to shuffle them enough that you can
consider it random.
Hope that helps!
-Zach
Thanks
On Tuesday, September 17, 2013 7:05:56 PM UTC+2, ddorian43 wrote:
How are Get and Multiget queries fetched? What data structure is used to
hold this data?
Are expired documents indexed and deleted on cron(like mongodb) or are
they deleted when segments are merged (cassandra/bigtable)?
And what should i use if i want to just filter documents (no scoring),
filter, filtered query what?
Thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.