How do I build a query such that each token in a document field is matched?

Hello All- I am having a similar issue as the one Brian described. Brian

  • did you end up going with including the token count in your index for
    filtering? Did it work well? I am thinking about doing the same, but I
    have one more issue to solve too.

I have another doc in the index that just contains "Square" (as well as
"Square Steakhouse"), so if I search only on Square, I only want to get a
match on the "Square" document, not the Square Steakehouse doc...

Query: Square Steakhouse Result: Match to Square Steakehouse doc
Query: Square Steakhouses Result: Match to Square Steakehouse doc
Query: Squared Steakhouse Result: Match to Square Steakehouse doc
Query: Steakhouse Result: No Match
Query: Square Result: Match to Steakehouse doc
Query: Squared Result: Match to Steakehouse doc

Any suggestions?

Thanks.

On Wednesday, January 30, 2013 2:30:52 PM UTC-5, Brian Webster wrote:

I'm going to move forward with your idea:

The only thing I can think of doing is to:

  • index the number of tokens in that field
  • count the number of tokens in your query string
  • use a filter to make sure they are the same
    Of course, that means ensuring that you're counting the same number of
    tokens that would be generated by the analyzer (eg being aware of
    stopwords etc)

I'm going to write a function that uses the analyze API to extract the
number of tokens given a field: String GetTokenCount(string
Field_Or_Search_Text). This function will use the correct analyzer.

Then, upon indexing the document type in question, I will store the token
count of the relevant field.

Upon searching, I will use the same GetTokenCount() function to count the
user's search tokens.

Finally, I will structure the search JSON to utilize the filters as you
have suggested.

Obviously this solution is poor for some applications, but I anticipate
fewer than 10,000 searches per day and fewer than 10 index inserts per day
of the type that is involved. Besides, I'd imagine the analyze API is
rather speedy compared to running actual queries.

Thanks for the advice. This will be a little bit tedious, but not so bad.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.