Social search

Mike_Kaplinskiy · May 18, 2013, 9:52pm

Hey folks,

I was reading the new features list in 0.90 and saw social search. The
terms lookup mechanism seems to have some promise, but I have a few
questions/issues:

It doesn't seem to work for the _id field (I.e. {"_id": {"terms":{ ... }
} })
The design means that you need to store the entire set of followers in a
single doc array. Would that mean reindexing the entire list (which for
us can be 300K+ longs) whenever the list changes?
if I wanted to denormalize the data instead and use a has_child filter to
check the relationship, do you have any hints on how to create the minimal
possible child doc so 100M+ of these don't kill the index size? I would be
fine with losing the ability to do any other type of query (well except for
having a stable id for these docs). Here is what I have so far:

{"mapping": {"follower": {
"_parent": {"type": "user"},
"_source": {"enabled": false},
"_all": {"enabled": false},
"properties": {
"followerId": { "type": "long", "precision_step": 0 },
},
} }

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · May 18, 2013, 10:02pm

Hi Mike

I was reading the new features list in 0.90 and saw social search. The

terms lookup mechanism seems to have some promise, but I have a few
questions/issues:

It doesn't seem to work for the _id field (I.e. {"_id": {"terms":{ ... }
} })

you want:

{ terms: { _id: { index... etc }}}

The design means that you need to store the entire set of followers in a
single doc array. Would that mean reindexing the entire list (which for
us can be 300K+ longs) whenever the list changes?

Yes, although you could break them down into smaller chunks and use a bool
filter to combine them

if I wanted to denormalize the data instead and use a has_child filter
to check the relationship, do you have any hints on how to create the
minimal possible child doc so 100M+ of these don't kill the index size? I
would be fine with losing the ability to do any other type of query (well
except for having a stable id for these docs). Here is what I have so far:

{"mapping": {"follower": {
"_parent": {"type": "user"},
"_source": {"enabled": false},
"_all": {"enabled": false},
"properties": {
"followerId": { "type": "long", "precision_step": 0 },
},
} }

I wouldn't disable the _source field - you'll regret it later on, eg when
you want to rebuild your index, or debug why a particular query isn't
working as expected. And I wouldn't worry about the precision_step either.

Also, in master, there is a big memory improvement on parent/child queries.
Now only parent IDs are loaded into memory. Previously it used to load
child IDs too

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mike_Kaplinskiy · May 20, 2013, 3:59am

Hey Clinton,

Thanks for the quick reply.

On Saturday, May 18, 2013 6:02:19 PM UTC-4, Clinton Gormley wrote:

Hi Mike

I was reading the new features list in 0.90 and saw social search. The

terms lookup mechanism seems to have some promise, but I have a few
questions/issues:

It doesn't seem to work for the _id field (I.e. {"_id": {"terms":{ ...
} } })

you want:

{ terms: { _id: { index... etc }}}

Sorry I that wasn't a valid test case. Here's one that doesn't work:

$ curl -XPUT http://localhost:9200/index1/t1/123 -d '{ "name": "123" }'
{"ok":true,"_index":"index1","_type":"t1","_id":"123","_version":1}
$ curl -XPUT http://localhost:9200/index1/t1/456 -d '{ "name": "456" }'
{"ok":true,"_index":"index1","_type":"t1","_id":"456","_version":1}
$ curl -XPUT http://localhost:9200/index1/t2/1 -d '{ "ids": ["123", "456"]
}'
{"ok":true,"_index":"index1","_type":"t2","_id":"1","_version":1}
$ curl http://localhost:9200/index1/t1/_search -d '{ "query": { "filtered":
{ "filter": { "terms": { "_id": { "index": "index1", "type": "t2", "id":
"1", "path": "ids" } } } } } }'
{"took":48,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}
$ curl http://localhost:9200/index1/t1/_search -d '{ "query": { "filtered":
{ "filter": { "terms": { "_id": ["123", "456"] } } } } }'
{"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"index1","_type":"t1","_id":"456","_score":1.0,
"_source" : { "name": "456"
}},{"_index":"index1","_type":"t1","_id":"123","_score":1.0, "_source" : {
"name": "123" }}]}}

The design means that you need to store the entire set of followers in
a single doc array. Would that mean reindexing the entire list (which for
us can be 300K+ longs) whenever the list changes?

Yes, although you could break them down into smaller chunks and use a bool
filter to combine them

Hmm good point.

if I wanted to denormalize the data instead and use a has_child filter
to check the relationship, do you have any hints on how to create the
minimal possible child doc so 100M+ of these don't kill the index size? I
would be fine with losing the ability to do any other type of query (well
except for having a stable id for these docs). Here is what I have so far:

{"mapping": {"follower": {
"_parent": {"type": "user"},
"_source": {"enabled": false},
"_all": {"enabled": false},
"properties": {
"followerId": { "type": "long", "precision_step": 0 },
},
} }

I wouldn't disable the _source field - you'll regret it later on, eg when
you want to rebuild your index, or debug why a particular query isn't
working as expected. And I wouldn't worry about the precision_step either.

ES isn't the main datastore here, so reindexing from the database isn't an
issue. I ran into an issue when doing this with the above mapping - the
index got too big for the FS cache and query & indexing performance went
through the floor. This was with 3 nodes with 15G ram and an EBS RAID0.
Before adding the children the index was ~ 8G in size; afterwards it was
80G which is ~ 680 bytes for a doc that's 2 ints.

Also, in master, there is a big memory improvement on parent/child
queries. Now only parent IDs are loaded into memory. Previously it used to
load child IDs too

I saw that. I'm quite looking forward 0.90.1 - mostly because of the bulk
update support.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · May 20, 2013, 11:32am

On 20 May 2013 05:59, Mike Kaplinskiy mike.kaplinskiy@gmail.com wrote:

$ curl -XPUT http://localhost:9200/index1/t1/123 -d '{ "name": "123" }'
{"ok":true,"_index":"index1","_type":"t1","_id":"123","_version":1}
$ curl -XPUT http://localhost:9200/index1/t1/456 -d '{ "name": "456" }'
{"ok":true,"_index":"index1","_type":"t1","_id":"456","_version":1}
$ curl -XPUT http://localhost:9200/index1/t2/1 -d '{ "ids": ["123",
"456"] }'
{"ok":true,"_index":"index1","_type":"t2","_id":"1","_version":1}
$ curl http://localhost:9200/index1/t1/_search -d '{ "query": {
"filtered": { "filter": { "terms": { "_id": { "index": "index1", "type":
"t2", "id": "1", "path": "ids" } } } } } }'

{"took":48,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}
$ curl http://localhost:9200/index1/t1/_search -d '{ "query": {
"filtered": { "filter": { "terms": { "_id": ["123", "456"] } } } } }'
{"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"index1","_type":"t1","_id":"456","_score":1.0,
"_source" : { "name": "456"
}},{"_index":"index1","_type":"t1","_id":"123","_score":1.0, "_source" : {
"name": "123" }}]}}

You're right, this doesn't work. I've opened this issue:

github.com/elastic/elasticsearch

Query DSL: External terms doesn't work with _id field

opened 11:32AM - 20 May 13 UTC

closed 01:17PM - 20 May 13 UTC

clintongormley

>bug v0.90.1 v1.0.0.Beta1

``` curl -XPUT http://localhost:9200/index1/t1/123 -d '{ "name": "123" }' curl -…XPUT http://localhost:9200/index1/t1/456 -d '{ "name": "456" }' curl -XPUT http://localhost:9200/index1/t2/1 -d '{ "ids": ["123", "456"] }' ``` Query with external terms returns no results: ``` curl http://localhost:9200/index1/t1/_search?pretty -d '{ "query": { "filtered": { "filter": { "terms": { "_id": { "index": "index1", "type": "t2", "id": "1", "path": "ids" } } } } } }' ``` Query with listed terms works: ``` curl http://localhost:9200/index1/t1/_search ?pretty -d '{ "query": { "filtered": { "filter": { "terms": { "_id": ["123", "456"] } } } } }' ``` External terms on `name` field works: ``` curl http://localhost:9200/index1/t1/_search?pretty -d '{ "query": { "filtered": { "filter": { "terms": { "name": { "index": "index1", "type": "t2", "id": "1", "path": "ids" } } } } } }' ``` Side issue: unmapped field throws NPE: ``` curl http://localhost:9200/index1/t1/_search?pretty -d '{ "query": { "filtered": { "filter": { "terms": { "XXX": { "index": "index1", "type": "t2", "id": "1", "path": "ids" } } } } } }' ```

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Terms lookup mechanism with multiple lookup docs Elasticsearch	1	694	July 6, 2017
Hierarchical query with terms lookup filter Elasticsearch	4	3467	July 6, 2017
Terms lookup filter for logfiles Elasticsearch	1	335	July 6, 2017
Term lookup filter Elasticsearch	3	1216	July 6, 2017
Options for personalized searches Elasticsearch	1	1092	October 20, 2019

Social search

Related topics