Hey Clinton,
Thanks for the quick reply.
On Saturday, May 18, 2013 6:02:19 PM UTC-4, Clinton Gormley wrote:
Hi Mike
I was reading the new features list in 0.90 and saw social search. The
terms lookup mechanism seems to have some promise, but I have a few
questions/issues:
- It doesn't seem to work for the _id field (I.e. {"_id": {"terms":{ ...
} } })
you want:
{ terms: { _id: { index... etc }}}
Sorry I that wasn't a valid test case. Here's one that doesn't work:
$ curl -XPUT http://localhost:9200/index1/t1/123 -d '{ "name": "123" }'
{"ok":true,"_index":"index1","_type":"t1","_id":"123","_version":1}
$ curl -XPUT http://localhost:9200/index1/t1/456 -d '{ "name": "456" }'
{"ok":true,"_index":"index1","_type":"t1","_id":"456","_version":1}
$ curl -XPUT http://localhost:9200/index1/t2/1 -d '{ "ids": ["123", "456"]
}'
{"ok":true,"_index":"index1","_type":"t2","_id":"1","_version":1}
$ curl http://localhost:9200/index1/t1/_search -d '{ "query": { "filtered":
{ "filter": { "terms": { "_id": { "index": "index1", "type": "t2", "id":
"1", "path": "ids" } } } } } }'
{"took":48,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}
$ curl http://localhost:9200/index1/t1/_search -d '{ "query": { "filtered":
{ "filter": { "terms": { "_id": ["123", "456"] } } } } }'
{"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"index1","_type":"t1","_id":"456","_score":1.0,
"_source" : { "name": "456"
}},{"_index":"index1","_type":"t1","_id":"123","_score":1.0, "_source" : {
"name": "123" }}]}}
- The design means that you need to store the entire set of followers in
a single doc array. Would that mean reindexing the entire list (which for
us can be 300K+ longs) whenever the list changes?
Yes, although you could break them down into smaller chunks and use a bool
filter to combine them
Hmm good point.
- if I wanted to denormalize the data instead and use a has_child filter
to check the relationship, do you have any hints on how to create the
minimal possible child doc so 100M+ of these don't kill the index size? I
would be fine with losing the ability to do any other type of query (well
except for having a stable id for these docs). Here is what I have so far:
{"mapping": {"follower": {
"_parent": {"type": "user"},
"_source": {"enabled": false},
"_all": {"enabled": false},
"properties": {
"followerId": { "type": "long", "precision_step": 0 },
},
} }
I wouldn't disable the _source field - you'll regret it later on, eg when
you want to rebuild your index, or debug why a particular query isn't
working as expected. And I wouldn't worry about the precision_step either.
ES isn't the main datastore here, so reindexing from the database isn't an
issue. I ran into an issue when doing this with the above mapping - the
index got too big for the FS cache and query & indexing performance went
through the floor. This was with 3 nodes with 15G ram and an EBS RAID0.
Before adding the children the index was ~ 8G in size; afterwards it was
80G which is ~ 680 bytes for a doc that's 2 ints.
Also, in master, there is a big memory improvement on parent/child
queries. Now only parent IDs are loaded into memory. Previously it used to
load child IDs too
I saw that. I'm quite looking forward 0.90.1 - mostly because of the bulk
update support.
clint
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.