How do I build a query such that each token in a document field is matched?

Brian_Webster · January 30, 2013, 4:53pm

I need to make sure that each token of a field is matched by at least one
token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter attached
to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be matched. I
need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

It is not an option to use not_analyzed, as I do need to take
advantage of analyzer features
- I intend to use kstem, custom synonyms, a custom char_filter, a
  lowercase filter, as well as a standard tokenizer
However, I need to make sure that each tokens of a field is matched

Is this possible in elastic search?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

egaumer · January 30, 2013, 6:11pm

There are many shades of grey between not_analyzed and analyzed. You could
still use an analyzer but configure it with a different tokenizer. For
instance...

This would essentially cause the entire string to be considered one token
(as opposed to splitting on whitespace). I'm not sure how this would effect
the stemmer so you'd have to experiment a bit but you can always write a
custom token filter.

On Wednesday, January 30, 2013 11:53:39 AM UTC-5, Brian Webster wrote:

I need to make sure that each token of a field is matched by at least
one token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter
attached to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be matched. I
need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

It is not an option to use not_analyzed, as I do need to take
advantage of analyzer features

I intend to use kstem, custom synonyms, a custom char_filter, a
lowercase filter, as well as a standard tokenizer

However, I need to make sure that each tokens of a field is matched

Is this possible in Elasticsearch?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Brian_Webster · January 30, 2013, 6:20pm

I've thought about using that, but two problems come to mind.

As you suggested, there will almost surely be a problem with stemming
I am limited, essentially, to a phrase search

For example, for the field field: Blue Green Red

I'd like the following searches to match:

Red Blue Green
Green Blue Red

I'd like the following searches not to match:

Blue
Green
Red
Red Blue
Blue Green
etc

It feels like I may have bumped into a lucene/Elasticsearch wall here.
Hopefully one of you veterans has some magic!

On Wednesday, 30 January 2013 12:11:54 UTC-6, egaumer wrote:

There are many shades of grey between not_analyzed and analyzed. You could
still use an analyzer but configure it with a different tokenizer. For
instance...

Elasticsearch Platform — Find real-time answers at scale | Elastic

This would essentially cause the entire string to be considered one token
(as opposed to splitting on whitespace). I'm not sure how this would effect
the stemmer so you'd have to experiment a bit but you can always write a
custom token filter.

On Wednesday, January 30, 2013 11:53:39 AM UTC-5, Brian Webster wrote:

I need to make sure that each token of a field is matched by at least
one token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter
attached to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be matched.
I need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

It is not an option to use not_analyzed, as I do need to take
advantage of analyzer features

I intend to use kstem, custom synonyms, a custom char_filter, a
lowercase filter, as well as a standard tokenizer

However, I need to make sure that each tokens of a field is matched

Is this possible in Elasticsearch?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · January 30, 2013, 6:45pm

Hiya

 1. As you suggested, there will almost surely be a problem with
    stemming
 2. I am limited, essentially, to a phrase search
For example, for the field field: Blue Green Red

I'd like the following searches to match:
* Red Blue Green
* Green Blue Red
I'd like the following searches not to match:
* Blue
* Green
* Red
* Red Blue
* Blue Green
* etc
It feels like I may have bumped into a lucene/Elasticsearch wall
here. Hopefully one of you veterans has some magic!

The only thing I can think of doing is to:

index the number of tokens in that field
count the number of tokens in your query string
use a filter to make sure they are the same

Of course, that means ensuring that you're counting the same number of
tokens that would be generated by the analyzer (eg being aware of
stopwords etc)

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

egaumer · January 30, 2013, 6:54pm

You could possible extend the stemmer to meet your needs (splitting on
whitespace internally and then reassembling the token).

In terms of being limited to a phrase search, that's true but you could do
something similar to creating shingles where you're indexing the various
permutations of the string. This of course, is only feasible if you're
input strings are reasonably small (which I assume is the case if you're
required to match every token).

On Wednesday, January 30, 2013 1:20:05 PM UTC-5, Brian Webster wrote:

I've thought about using that, but two problems come to mind.

As you suggested, there will almost surely be a problem with
stemming

I am limited, essentially, to a phrase search

For example, for the field field: Blue Green Red

I'd like the following searches to match:

Red Blue Green

Green Blue Red

I'd like the following searches not to match:

Blue

Green

Red

Red Blue

Blue Green

etc

It feels like I may have bumped into a lucene/Elasticsearch wall here.
Hopefully one of you veterans has some magic!

On Wednesday, 30 January 2013 12:11:54 UTC-6, egaumer wrote:

There are many shades of grey between not_analyzed and analyzed. You
could still use an analyzer but configure it with a different tokenizer.
For instance...

Elasticsearch Platform — Find real-time answers at scale | Elastic

This would essentially cause the entire string to be considered one token
(as opposed to splitting on whitespace). I'm not sure how this would effect
the stemmer so you'd have to experiment a bit but you can always write a
custom token filter.

On Wednesday, January 30, 2013 11:53:39 AM UTC-5, Brian Webster wrote:

I need to make sure that each token of a field is matched by at least
one token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter
attached to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be matched.
I need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

It is not an option to use not_analyzed, as I do need to take
advantage of analyzer features

I intend to use kstem, custom synonyms, a custom char_filter, a
lowercase filter, as well as a standard tokenizer

However, I need to make sure that each tokens of a field is matched

Is this possible in Elasticsearch?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · January 30, 2013, 7:04pm

Have you tried adding a "minimum_should_match" field to your match query?
Try setting it to 90-100%. I believe that will force all the tokens in
the query to match something (be it a stemmed token or a synonym, etc).

-Zach

On Wednesday, January 30, 2013 1:54:51 PM UTC-5, egaumer wrote:

You could possible extend the stemmer to meet your needs (splitting on
whitespace internally and then reassembling the token).

In terms of being limited to a phrase search, that's true but you could do
something similar to creating shingles where you're indexing the various
permutations of the string. This of course, is only feasible if you're
input strings are reasonably small (which I assume is the case if you're
required to match every token).

Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wednesday, January 30, 2013 1:20:05 PM UTC-5, Brian Webster wrote:

I've thought about using that, but two problems come to mind.

As you suggested, there will almost surely be a problem with
stemming

I am limited, essentially, to a phrase search

For example, for the field field: Blue Green Red

I'd like the following searches to match:

Red Blue Green

Green Blue Red

I'd like the following searches not to match:

Blue

Green

Red

Red Blue

Blue Green

etc

It feels like I may have bumped into a lucene/Elasticsearch wall here.
Hopefully one of you veterans has some magic!

On Wednesday, 30 January 2013 12:11:54 UTC-6, egaumer wrote:

There are many shades of grey between not_analyzed and analyzed. You
could still use an analyzer but configure it with a different tokenizer.
For instance...

Elasticsearch Platform — Find real-time answers at scale | Elastic

This would essentially cause the entire string to be considered one
token (as opposed to splitting on whitespace). I'm not sure how this would
effect the stemmer so you'd have to experiment a bit but you can always
write a custom token filter.

On Wednesday, January 30, 2013 11:53:39 AM UTC-5, Brian Webster wrote:

I need to make sure that each token of a field is matched by at least
one token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter
attached to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be
matched. I need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

It is not an option to use not_analyzed, as I do need to take
advantage of analyzer features

I intend to use kstem, custom synonyms, a custom char_filter,
a lowercase filter, as well as a standard tokenizer

However, I need to make sure that each tokens of a field is
matched

Is this possible in Elasticsearch?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Brian_Webster · January 30, 2013, 7:09pm

Hi Zach, I have thought about using that, but unfortunately my situation is
backwards.

Instead of making sure more, or all, search tokens are matched, I need to
make sure that most, or all, of the field tokens are matched.

I'm starting to think that Clinton's idea may be ideal. It is not simple,
but it might just do the trick here :

The only thing I can think of doing is to:

index the number of tokens in that field

count the number of tokens in your query string

use a filter to make sure they are the same
Of course, that means ensuring that you're counting the same number of
tokens that would be generated by the analyzer (eg being aware of
stopwords etc)

There might be also something in egaumer's ideas, but I'm having a hard
time converting those ideas into a workable solution.

Brian Webster | 918 633 6863

On Wed, Jan 30, 2013 at 1:04 PM, Zachary Tong zacharyjtong@gmail.comwrote:

Have you tried adding a "minimum_should_match" field to your match
query? Try setting it to 90-100%. I believe that will force all the
tokens in the query to match something (be it a stemmed token or a synonym,
etc).

-Zach

On Wednesday, January 30, 2013 1:54:51 PM UTC-5, egaumer wrote:

You could possible extend the stemmer to meet your needs (splitting on
whitespace internally and then reassembling the token).

In terms of being limited to a phrase search, that's true but you could
do something similar to creating shingles where you're indexing the various
permutations of the string. This of course, is only feasible if you're
input strings are reasonably small (which I assume is the case if you're
required to match every token).

Elasticsearch Platform — Find real-time answers at scale | Elastic**
analysis/shingle-tokenfilter.**htmlhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/shingle-tokenfilter.html

On Wednesday, January 30, 2013 1:20:05 PM UTC-5, Brian Webster wrote:

I've thought about using that, but two problems come to mind.

As you suggested, there will almost surely be a problem with
stemming

I am limited, essentially, to a phrase search

For example, for the field field: Blue Green Red

I'd like the following searches to match:

Red Blue Green

Green Blue Red

I'd like the following searches not to match:

Blue

Green

Red

Red Blue

Blue Green

etc

It feels like I may have bumped into a lucene/Elasticsearch wall here.
Hopefully one of you veterans has some magic!

On Wednesday, 30 January 2013 12:11:54 UTC-6, egaumer wrote:

There are many shades of grey between not_analyzed and analyzed. You
could still use an analyzer but configure it with a different tokenizer.
For instance...

Elasticsearch Platform — Find real-time answers at scale | Elastic**
analysis/keyword-tokenizer.**htmlhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-tokenizer.html

This would essentially cause the entire string to be considered one
token (as opposed to splitting on whitespace). I'm not sure how this would
effect the stemmer so you'd have to experiment a bit but you can always
write a custom token filter.

On Wednesday, January 30, 2013 11:53:39 AM UTC-5, Brian Webster wrote:

I need to make sure that each token of a field is matched by at
least one token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter
attached to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be
matched. I need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

It is not an option to use not_analyzed, as I do need to take
advantage of analyzer features

I intend to use kstem, custom synonyms, a custom char_filter,
a lowercase filter, as well as a standard tokenizer

However, I need to make sure that each tokens of a field is
matched

Is this possible in Elasticsearch?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Brian_Webster · January 30, 2013, 7:30pm

I'm going to move forward with your idea:

The only thing I can think of doing is to:

index the number of tokens in that field

count the number of tokens in your query string

use a filter to make sure they are the same
Of course, that means ensuring that you're counting the same number of
tokens that would be generated by the analyzer (eg being aware of
stopwords etc)

I'm going to write a function that uses the analyze API to extract the
number of tokens given a field: String GetTokenCount(string
Field_Or_Search_Text). This function will use the correct analyzer.

Then, upon indexing the document type in question, I will store the token
count of the relevant field.

Upon searching, I will use the same GetTokenCount() function to count the
user's search tokens.

Finally, I will structure the search JSON to utilize the filters as you
have suggested.

Obviously this solution is poor for some applications, but I anticipate
fewer than 10,000 searches per day and fewer than 10 index inserts per day
of the type that is involved. Besides, I'd imagine the analyze API is
rather speedy compared to running actual queries.

Thanks for the advice. This will be a little bit tedious, but not so bad.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

thale_jacobs · July 31, 2013, 1:52pm

Hello All- I am having a similar issue as the one Brian described. Brian

did you end up going with including the token count in your index for
filtering? Did it work well? I am thinking about doing the same, but I
have one more issue to solve too.

I have another doc in the index that just contains "Square" (as well as
"Square Steakhouse"), so if I search only on Square, I only want to get a
match on the "Square" document, not the Square Steakehouse doc...

Query: Square Steakhouse Result: Match to Square Steakehouse doc
Query: Square Steakhouses Result: Match to Square Steakehouse doc
Query: Squared Steakhouse Result: Match to Square Steakehouse doc
Query: Steakhouse Result: No Match
Query: Square Result: Match to Steakehouse doc
Query: Squared Result: Match to Steakehouse doc

Any suggestions?

Thanks.

On Wednesday, January 30, 2013 2:30:52 PM UTC-5, Brian Webster wrote:

I'm going to move forward with your idea:

The only thing I can think of doing is to:

index the number of tokens in that field

count the number of tokens in your query string

use a filter to make sure they are the same
Of course, that means ensuring that you're counting the same number of
tokens that would be generated by the analyzer (eg being aware of
stopwords etc)

I'm going to write a function that uses the analyze API to extract the
number of tokens given a field: String GetTokenCount(string
Field_Or_Search_Text). This function will use the correct analyzer.

Then, upon indexing the document type in question, I will store the token
count of the relevant field.

Upon searching, I will use the same GetTokenCount() function to count the
user's search tokens.

Finally, I will structure the search JSON to utilize the filters as you
have suggested.

Obviously this solution is poor for some applications, but I anticipate
fewer than 10,000 searches per day and fewer than 10 index inserts per day
of the type that is involved. Besides, I'd imagine the analyze API is
rather speedy compared to running actual queries.

Thanks for the advice. This will be a little bit tedious, but not so bad.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Brian_Webster · August 12, 2013, 8:32pm

I think this will work great for you. Just compare the number of analyzed
tokens at index and query time. I think we were solving identical
problems.

     Dim client = New ElasticConnection()
        Dim result = client.Post("

http://localhost:9200/chc/_analyze?analyzer=" & Analyzer, RawString)
Dim J = JObject.Parse(result.ToString())
'newtonsoft.json.linq.jobject
Return (From X In J("tokens")).Count()

Here is my query code, where "MyQuery" is my index field name. This might
be "title" or "name" or something for you.

{
"size": 30,
"query": {
"filtered": {
"query": {
"match": {
"MyQuery": {
"query": "[query]",
"operator": "AND"
}
}
},
"filter": {
"term": {
"TokenCount": "[tokencount]"
}
}
}
}
}

Sorry for the late reply.

Brian Webster | 918 633 6863

On Wed, Jul 31, 2013 at 8:52 AM, thalejacobs@gmail.com wrote:

Hello All- I am having a similar issue as the one Brian described. Brian

did you end up going with including the token count in your index for
filtering? Did it work well? I am thinking about doing the same, but I
have one more issue to solve too.

I have another doc in the index that just contains "Square" (as well as
"Square Steakhouse"), so if I search only on Square, I only want to get a
match on the "Square" document, not the Square Steakehouse doc...

Query: Square Steakhouse Result: Match to Square Steakehouse doc
Query: Square Steakhouses Result: Match to Square Steakehouse doc
Query: Squared Steakhouse Result: Match to Square Steakehouse doc
Query: Steakhouse Result: No Match
Query: Square Result: Match to Steakehouse doc
Query: Squared Result: Match to Steakehouse doc

Any suggestions?

Thanks.

On Wednesday, January 30, 2013 2:30:52 PM UTC-5, Brian Webster wrote:

I'm going to move forward with your idea:

The only thing I can think of doing is to:

index the number of tokens in that field

count the number of tokens in your query string

use a filter to make sure they are the same
Of course, that means ensuring that you're counting the same number of
tokens that would be generated by the analyzer (eg being aware of
stopwords etc)

I'm going to write a function that uses the analyze API to extract the
number of tokens given a field: String GetTokenCount(string
Field_Or_Search_Text). This function will use the correct analyzer.

Then, upon indexing the document type in question, I will store the token
count of the relevant field.

Upon searching, I will use the same GetTokenCount() function to count the
user's search tokens.

Finally, I will structure the search JSON to utilize the filters as you
have suggested.

Obviously this solution is poor for some applications, but I anticipate
fewer than 10,000 searches per day and fewer than 10 index inserts per day
of the type that is involved. Besides, I'd imagine the analyze API is
rather speedy compared to running actual queries.

Thanks for the advice. This will be a little bit tedious, but not so bad.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ttJTE52hXf8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

sina_tamanna · August 13, 2013, 8:01am

Hey,

Late reply, but I think a solution might be the use of an analyzer with the
combination of whitespace tokenizer, stemmer filter, and concatenation
filter. Then mapping the field analyzer (both index and search analyzers)
to it.

A simple concatenation filter is included in this post:
http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-concatenation-filter-td3711094.htmlI haven't tried though.

The setting might look like this:

"index": {
"analysis": {
"analyzer": {
"myAnalyzer": {
"tokenizer": "whitespace",
"filter": ["lowercase", "english_snowball", "stop",
"filter_concatenate"]
}
},
"filter": {
"filter_concatenate": {
"type":
"com.monpetitguide.elasticsearch.analysis.ConcatenateTokenFilterFactory",
"token_separator": ""
},
"english_snowball" : {
"type": "snowball",
"language": "English"
},
}
}
}

So in your scenario, "Square Steakhouse", "Squared Steakhouses", ..., will
be converted to "squaresteakhouse", therefore, "Squared" will mismatch,
where "Squared Steakhouse" will match.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Brian_Webster · August 13, 2013, 7:50pm

You can see my post here for a more detailed solution:

Brian Webster | 918 633 6863

On Tue, Aug 13, 2013 at 3:01 AM, sina.tamanna@gmail.com wrote:

Hey,

Late reply, but I think a solution might be the use of an analyzer with
the combination of whitespace tokenizer, stemmer filter, and concatenation
filter. Then mapping the field analyzer (both index and search analyzers)
to it.

A simple concatenation filter is included in this post:
http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-concatenation-filter-td3711094.htmlI haven't tried though.

The setting might look like this:

"index": {
"analysis": {
"analyzer": {
"myAnalyzer": {
"tokenizer": "whitespace",
"filter": ["lowercase", "english_snowball", "stop",
"filter_concatenate"]
}
},
"filter": {
"filter_concatenate": {
"type":
"com.monpetitguide.elasticsearch.analysis.ConcatenateTokenFilterFactory",
"token_separator": ""
},
"english_snowball" : {
"type": "snowball",
"language": "English"
},
}
}
}

So in your scenario, "Square Steakhouse", "Squared Steakhouses", ..., will
be converted to "squaresteakhouse", therefore, "Squared" will mismatch,
where "Squared Steakhouse" will match.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ttJTE52hXf8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Match only? Elasticsearch	3	725	March 15, 2022
Search multiple fields with “and” operator (but use fields' own analyzers) Elasticsearch	7	2425	July 6, 2017
Elasticsearch query to match all tokens inside a specific field Elasticsearch kql-kibana-query-language	1	595	July 24, 2023
Match Exact Value of a Field and not be Included as a Subset in That Field, not more not less Elasticsearch	1	335	April 16, 2019
Multi_match and query tokenization Elasticsearch	2	384	July 6, 2017

How do I build a query such that each token in a document field is matched?

Related topics