GIS queries not weighted like tf-idf


(Ricky Cook) #1

While queries involving terms are weighted with tf-idf, it appears that
there is no such thing happening for geospatial queries. This introduces a
problem when "terms" queries are weighted much higher than GIS queries
(when in most cases, this is likely to be opposite of what you want).

Another problem with this is that there is no weighting of searches based
on how large an area is, or how many documents sit inside it.

Attached are sets of data to be used with the below curl example, as well
as a plot of the results I'm getting (green crosses match, red do not. The
text is " (, )"

export ES_URI='http://localhost:9200'
curl -XDELETE "$ES_URI/geotest"
curl -XPUT -d '{"mappings": {"entities": {"properties": {"location_p":
{"type": "geo_point"}, "name": {"type": "string"}, "location": {"tree":
"quadtree", "type": "geo_shape", "precision": "1m"}}}, "areas":
{"properties": {"poly": {"tree": "quadtree", "type": "geo_shape",
"precision": "1m"}}}}}' "$ES_URI/geotest"
curl -XPOST --data-binary @1-areas.txt "$ES_URI/bulk"
curl -XPOST --data-binary @2-points.txt "$ES_URI/bulk"
sleep 5
curl -XPOST -d '{"query": {"bool": {"should": [{"geo_shape": {"location":
{"indexed_shape": {"path": "poly", "type": "areas", "id": "a137.0
-38.0",
"index": "geotest"}}}}, {"geo_shape": {"location": {"indexed_shape":
{"path": "poly", "type": "areas", "id": "a136.5
-37.5", "index":
"geotest"}}}}]}}, "explain": true, "size": 1000}'
"$ES_URI/geotest/entities/_search?pretty"

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/acd6483a-7679-4b11-b6bc-d7f7dd7784cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #2

You can take a look at the function_score query, should be able to boost up
matches (filter) on any geo condition. About the geo area size, you'll
probably have to hard code the boost for now inside the function_score
query, i.e., you know the area you are filtering and how big it is so you
boost explicitly based on that knowledge. About the number of documents in
the area, that's an interesting requirement - have to think about that for
a bit - but I suppose you could run a prior query first like maybe an
aggregation, determine in the counts, and then dynamically construct a
function_score query with appropriate boosts based on the counts and then
run it as a final query.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c5eabcf0-3252-4726-a5cb-7f93a4dd8295%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3