Newbie: How to search for an indexed phrase toward the end of a query

Hi all,

My challenge is trying to determine whether a single-line query contains a
location. For example: "new york style pizza in san francisco ca".

I have created a huge index of all location phrases I want to detect, with
allowable permutations. In the above example, I would have "san francisco
ca", "san francisco california", and possibly others such as "sf ca", "bay
area ca", and so forth, all as separate documents within the index. Casing
and punctuation would be discarded, so the query "New York style PIZZA, in
san francisco, ca" would become "new york style pizza in san francisco ca".

To date, a plain old match query appears to work best. However, it ignores
ordering... "san francisco ca" is a match, whereas "ca francisco san"
should not match. I've tried phrase matching, but I get no matches because
of the extra terms ("new york style pizza in") in the input query. I also
tried a multi-field match, with the cross_fields options, and indexing the
different portions of the location in the different fields (city, state).
I read about shingles but I'm not sure that would work, because the number
of terms in a given location is variable (i.e. "dallas texas" vs "san
francisco california" vs "new york city new york"). My last attempt
involved percolating, which I just could not work at all; not sure if I'm
indexing correctly, but when I call GET .../_percolate I get ALL documents
in the index. Also, building the .percolator type was painfully slow and
eventually crashed my instance (JVM memory 99%), while doing so with the
bulk api.

I've spent 3-4 days banging away at this problem and would really
appreciate some gentle guidance. Sample query/index/mappings would be
great, but even just letting me know what type of query (and indexing and
mapping) I should use would be tremendously helpful, so I can at least
"bark up the right tree"!

Bonus points: I'm making the assumption that the location phrase, if any,
will be at the end of the query. Some way to sense that or boost results
accordingly would be great.

More bonus points: I have the population information for each location.
Some way to boost result slightly for higher population would be great too
(I am using ES 1.3.2; function_score with field_value_factor function and
ln1p modifier appears to work well, but not sure how that would work if
percolating).

THANK YOU!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f183c25d-e3ac-4dbf-a366-a41a0721b34e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.