How do I build a query such that each token in a document field is matched?

I need to make sure that each token of a field is matched by at least one
token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter attached
to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be matched. I
need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

  • It is not an option to use not_analyzed, as I do need to take
    advantage of analyzer features
    • I intend to use kstem, custom synonyms, a custom char_filter, a
      lowercase filter, as well as a standard tokenizer
  • However, I need to make sure that each tokens of a field is matched

Is this possible in elastic search?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

There are many shades of grey between not_analyzed and analyzed. You could
still use an analyzer but configure it with a different tokenizer. For
instance...

This would essentially cause the entire string to be considered one token
(as opposed to splitting on whitespace). I'm not sure how this would effect
the stemmer so you'd have to experiment a bit but you can always write a
custom token filter.

On Wednesday, January 30, 2013 11:53:39 AM UTC-5, Brian Webster wrote:

I need to make sure that each token of a field is matched by at least
one token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter
attached to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be matched. I
need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

  • It is not an option to use not_analyzed, as I do need to take
    advantage of analyzer features
    • I intend to use kstem, custom synonyms, a custom char_filter, a
      lowercase filter, as well as a standard tokenizer
  • However, I need to make sure that each tokens of a field is matched

Is this possible in Elasticsearch?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I've thought about using that, but two problems come to mind.

  1. As you suggested, there will almost surely be a problem with stemming
  2. I am limited, essentially, to a phrase search

For example, for the field field: Blue Green Red

I'd like the following searches to match:

  • Red Blue Green
  • Green Blue Red

I'd like the following searches not to match:

  • Blue
  • Green
  • Red
  • Red Blue
  • Blue Green
  • etc

It feels like I may have bumped into a lucene/Elasticsearch wall here.
Hopefully one of you veterans has some magic!

On Wednesday, 30 January 2013 12:11:54 UTC-6, egaumer wrote:

There are many shades of grey between not_analyzed and analyzed. You could
still use an analyzer but configure it with a different tokenizer. For
instance...

Elasticsearch Platform — Find real-time answers at scale | Elastic

This would essentially cause the entire string to be considered one token
(as opposed to splitting on whitespace). I'm not sure how this would effect
the stemmer so you'd have to experiment a bit but you can always write a
custom token filter.

On Wednesday, January 30, 2013 11:53:39 AM UTC-5, Brian Webster wrote:

I need to make sure that each token of a field is matched by at least
one token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter
attached to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be matched.
I need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

  • It is not an option to use not_analyzed, as I do need to take
    advantage of analyzer features
    • I intend to use kstem, custom synonyms, a custom char_filter, a
      lowercase filter, as well as a standard tokenizer
  • However, I need to make sure that each tokens of a field is matched

Is this possible in Elasticsearch?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya

 1. As you suggested, there will almost surely be a problem with
    stemming
 2. I am limited, essentially, to a phrase search

For example, for the field field: Blue Green Red

I'd like the following searches to match:
* Red Blue Green
* Green Blue Red
I'd like the following searches not to match:
* Blue
* Green
* Red
* Red Blue
* Blue Green
* etc
It feels like I may have bumped into a lucene/Elasticsearch wall
here. Hopefully one of you veterans has some magic!

The only thing I can think of doing is to:

  • index the number of tokens in that field
  • count the number of tokens in your query string
  • use a filter to make sure they are the same

Of course, that means ensuring that you're counting the same number of
tokens that would be generated by the analyzer (eg being aware of
stopwords etc)

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You could possible extend the stemmer to meet your needs (splitting on
whitespace internally and then reassembling the token).

In terms of being limited to a phrase search, that's true but you could do
something similar to creating shingles where you're indexing the various
permutations of the string. This of course, is only feasible if you're
input strings are reasonably small (which I assume is the case if you're
required to match every token).

On Wednesday, January 30, 2013 1:20:05 PM UTC-5, Brian Webster wrote:

I've thought about using that, but two problems come to mind.

  1. As you suggested, there will almost surely be a problem with
    stemming
  2. I am limited, essentially, to a phrase search

For example, for the field field: Blue Green Red

I'd like the following searches to match:

  • Red Blue Green
  • Green Blue Red

I'd like the following searches not to match:

  • Blue
  • Green
  • Red
  • Red Blue
  • Blue Green
  • etc

It feels like I may have bumped into a lucene/Elasticsearch wall here.
Hopefully one of you veterans has some magic!

On Wednesday, 30 January 2013 12:11:54 UTC-6, egaumer wrote:

There are many shades of grey between not_analyzed and analyzed. You
could still use an analyzer but configure it with a different tokenizer.
For instance...

Elasticsearch Platform — Find real-time answers at scale | Elastic

This would essentially cause the entire string to be considered one token
(as opposed to splitting on whitespace). I'm not sure how this would effect
the stemmer so you'd have to experiment a bit but you can always write a
custom token filter.

On Wednesday, January 30, 2013 11:53:39 AM UTC-5, Brian Webster wrote:

I need to make sure that each token of a field is matched by at least
one token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter
attached to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be matched.
I need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

  • It is not an option to use not_analyzed, as I do need to take
    advantage of analyzer features
    • I intend to use kstem, custom synonyms, a custom char_filter, a
      lowercase filter, as well as a standard tokenizer
  • However, I need to make sure that each tokens of a field is matched

Is this possible in Elasticsearch?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Have you tried adding a "minimum_should_match" field to your match query?
Try setting it to 90-100%. I believe that will force all the tokens in
the query to match something (be it a stemmed token or a synonym, etc).

-Zach

On Wednesday, January 30, 2013 1:54:51 PM UTC-5, egaumer wrote:

You could possible extend the stemmer to meet your needs (splitting on
whitespace internally and then reassembling the token).

In terms of being limited to a phrase search, that's true but you could do
something similar to creating shingles where you're indexing the various
permutations of the string. This of course, is only feasible if you're
input strings are reasonably small (which I assume is the case if you're
required to match every token).

Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wednesday, January 30, 2013 1:20:05 PM UTC-5, Brian Webster wrote:

I've thought about using that, but two problems come to mind.

  1. As you suggested, there will almost surely be a problem with
    stemming
  2. I am limited, essentially, to a phrase search

For example, for the field field: Blue Green Red

I'd like the following searches to match:

  • Red Blue Green
  • Green Blue Red

I'd like the following searches not to match:

  • Blue
  • Green
  • Red
  • Red Blue
  • Blue Green
  • etc

It feels like I may have bumped into a lucene/Elasticsearch wall here.
Hopefully one of you veterans has some magic!

On Wednesday, 30 January 2013 12:11:54 UTC-6, egaumer wrote:

There are many shades of grey between not_analyzed and analyzed. You
could still use an analyzer but configure it with a different tokenizer.
For instance...

Elasticsearch Platform — Find real-time answers at scale | Elastic

This would essentially cause the entire string to be considered one
token (as opposed to splitting on whitespace). I'm not sure how this would
effect the stemmer so you'd have to experiment a bit but you can always
write a custom token filter.

On Wednesday, January 30, 2013 11:53:39 AM UTC-5, Brian Webster wrote:

I need to make sure that each token of a field is matched by at least
one token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter
attached to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be
matched. I need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

  • It is not an option to use not_analyzed, as I do need to take
    advantage of analyzer features
    • I intend to use kstem, custom synonyms, a custom char_filter,
      a lowercase filter, as well as a standard tokenizer
  • However, I need to make sure that each tokens of a field is
    matched

Is this possible in Elasticsearch?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Zach, I have thought about using that, but unfortunately my situation is
backwards.

Instead of making sure more, or all, search tokens are matched, I need to
make sure that most, or all, of the field tokens are matched.

I'm starting to think that Clinton's idea may be ideal. It is not simple,
but it might just do the trick here :

The only thing I can think of doing is to:

  • index the number of tokens in that field
  • count the number of tokens in your query string
  • use a filter to make sure they are the same
    Of course, that means ensuring that you're counting the same number of
    tokens that would be generated by the analyzer (eg being aware of
    stopwords etc)

There might be also something in egaumer's ideas, but I'm having a hard
time converting those ideas into a workable solution.

Brian Webster | 918 633 6863

On Wed, Jan 30, 2013 at 1:04 PM, Zachary Tong zacharyjtong@gmail.comwrote:

Have you tried adding a "minimum_should_match" field to your match
query? Try setting it to 90-100%. I believe that will force all the
tokens in the query to match something (be it a stemmed token or a synonym,
etc).

-Zach

On Wednesday, January 30, 2013 1:54:51 PM UTC-5, egaumer wrote:

You could possible extend the stemmer to meet your needs (splitting on
whitespace internally and then reassembling the token).

In terms of being limited to a phrase search, that's true but you could
do something similar to creating shingles where you're indexing the various
permutations of the string. This of course, is only feasible if you're
input strings are reasonably small (which I assume is the case if you're
required to match every token).

Elasticsearch Platform — Find real-time answers at scale | Elastic**
analysis/shingle-tokenfilter.**htmlhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/shingle-tokenfilter.html

On Wednesday, January 30, 2013 1:20:05 PM UTC-5, Brian Webster wrote:

I've thought about using that, but two problems come to mind.

  1. As you suggested, there will almost surely be a problem with
    stemming
  2. I am limited, essentially, to a phrase search

For example, for the field field: Blue Green Red

I'd like the following searches to match:

  • Red Blue Green
  • Green Blue Red

I'd like the following searches not to match:

  • Blue
  • Green
  • Red
  • Red Blue
  • Blue Green
  • etc

It feels like I may have bumped into a lucene/Elasticsearch wall here.
Hopefully one of you veterans has some magic!

On Wednesday, 30 January 2013 12:11:54 UTC-6, egaumer wrote:

There are many shades of grey between not_analyzed and analyzed. You
could still use an analyzer but configure it with a different tokenizer.
For instance...

Elasticsearch Platform — Find real-time answers at scale | Elastic**
analysis/keyword-tokenizer.**htmlhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-tokenizer.html

This would essentially cause the entire string to be considered one
token (as opposed to splitting on whitespace). I'm not sure how this would
effect the stemmer so you'd have to experiment a bit but you can always
write a custom token filter.

On Wednesday, January 30, 2013 11:53:39 AM UTC-5, Brian Webster wrote:

I need to make sure that each token of a field is matched by at
least one token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user
searches for Square, or Steakhouse. Furthermore, with kstem filter
attached to the default analyzer, Steakhouses is also likely to match.

{
"size": 30,
"query": {
"match": {
"Store_Name": {
"query": "Square",
"operator": "AND"
}
}
}
}

Unfortunately, I need each token of the Store_Name field to be
matched. I need the following behavior:

Query: Square Steakhouse Result: Match
Query: Square Steakhouses Result: Match
Query: Squared Steakhouse Result: Match
Query: Square Result: No Match
Query: Steakhouse Result: No Match

In summary

  • It is not an option to use not_analyzed, as I do need to take
    advantage of analyzer features
    • I intend to use kstem, custom synonyms, a custom char_filter,
      a lowercase filter, as well as a standard tokenizer
  • However, I need to make sure that each tokens of a field is
    matched

Is this possible in Elasticsearch?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I'm going to move forward with your idea:

The only thing I can think of doing is to:

  • index the number of tokens in that field
  • count the number of tokens in your query string
  • use a filter to make sure they are the same
    Of course, that means ensuring that you're counting the same number of
    tokens that would be generated by the analyzer (eg being aware of
    stopwords etc)

I'm going to write a function that uses the analyze API to extract the
number of tokens given a field: String GetTokenCount(string
Field_Or_Search_Text). This function will use the correct analyzer.

Then, upon indexing the document type in question, I will store the token
count of the relevant field.

Upon searching, I will use the same GetTokenCount() function to count the
user's search tokens.

Finally, I will structure the search JSON to utilize the filters as you
have suggested.

Obviously this solution is poor for some applications, but I anticipate
fewer than 10,000 searches per day and fewer than 10 index inserts per day
of the type that is involved. Besides, I'd imagine the analyze API is
rather speedy compared to running actual queries.

Thanks for the advice. This will be a little bit tedious, but not so bad.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello All- I am having a similar issue as the one Brian described. Brian

  • did you end up going with including the token count in your index for
    filtering? Did it work well? I am thinking about doing the same, but I
    have one more issue to solve too.

I have another doc in the index that just contains "Square" (as well as
"Square Steakhouse"), so if I search only on Square, I only want to get a
match on the "Square" document, not the Square Steakehouse doc...

Query: Square Steakhouse Result: Match to Square Steakehouse doc
Query: Square Steakhouses Result: Match to Square Steakehouse doc
Query: Squared Steakhouse Result: Match to Square Steakehouse doc
Query: Steakhouse Result: No Match
Query: Square Result: Match to Steakehouse doc
Query: Squared Result: Match to Steakehouse doc

Any suggestions?

Thanks.

On Wednesday, January 30, 2013 2:30:52 PM UTC-5, Brian Webster wrote:

I'm going to move forward with your idea:

The only thing I can think of doing is to:

  • index the number of tokens in that field
  • count the number of tokens in your query string
  • use a filter to make sure they are the same
    Of course, that means ensuring that you're counting the same number of
    tokens that would be generated by the analyzer (eg being aware of
    stopwords etc)

I'm going to write a function that uses the analyze API to extract the
number of tokens given a field: String GetTokenCount(string
Field_Or_Search_Text). This function will use the correct analyzer.

Then, upon indexing the document type in question, I will store the token
count of the relevant field.

Upon searching, I will use the same GetTokenCount() function to count the
user's search tokens.

Finally, I will structure the search JSON to utilize the filters as you
have suggested.

Obviously this solution is poor for some applications, but I anticipate
fewer than 10,000 searches per day and fewer than 10 index inserts per day
of the type that is involved. Besides, I'd imagine the analyze API is
rather speedy compared to running actual queries.

Thanks for the advice. This will be a little bit tedious, but not so bad.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I think this will work great for you. Just compare the number of analyzed
tokens at index and query time. I think we were solving identical
problems.

     Dim client = New ElasticConnection()
        Dim result = client.Post("

http://localhost:9200/chc/_analyze?analyzer=" & Analyzer, RawString)
Dim J = JObject.Parse(result.ToString())
'newtonsoft.json.linq.jobject
Return (From X In J("tokens")).Count()

Here is my query code, where "MyQuery" is my index field name. This might
be "title" or "name" or something for you.

{
"size": 30,
"query": {
"filtered": {
"query": {
"match": {
"MyQuery": {
"query": "[query]",
"operator": "AND"
}
}
},
"filter": {
"term": {
"TokenCount": "[tokencount]"
}
}
}
}
}

Sorry for the late reply.

Brian Webster | 918 633 6863

On Wed, Jul 31, 2013 at 8:52 AM, thalejacobs@gmail.com wrote:

Hello All- I am having a similar issue as the one Brian described. Brian

  • did you end up going with including the token count in your index for
    filtering? Did it work well? I am thinking about doing the same, but I
    have one more issue to solve too.

I have another doc in the index that just contains "Square" (as well as
"Square Steakhouse"), so if I search only on Square, I only want to get a
match on the "Square" document, not the Square Steakehouse doc...

Query: Square Steakhouse Result: Match to Square Steakehouse doc
Query: Square Steakhouses Result: Match to Square Steakehouse doc
Query: Squared Steakhouse Result: Match to Square Steakehouse doc
Query: Steakhouse Result: No Match
Query: Square Result: Match to Steakehouse doc
Query: Squared Result: Match to Steakehouse doc

Any suggestions?

Thanks.

On Wednesday, January 30, 2013 2:30:52 PM UTC-5, Brian Webster wrote:

I'm going to move forward with your idea:

The only thing I can think of doing is to:

  • index the number of tokens in that field
  • count the number of tokens in your query string
  • use a filter to make sure they are the same
    Of course, that means ensuring that you're counting the same number of
    tokens that would be generated by the analyzer (eg being aware of
    stopwords etc)

I'm going to write a function that uses the analyze API to extract the
number of tokens given a field: String GetTokenCount(string
Field_Or_Search_Text). This function will use the correct analyzer.

Then, upon indexing the document type in question, I will store the token
count of the relevant field.

Upon searching, I will use the same GetTokenCount() function to count the
user's search tokens.

Finally, I will structure the search JSON to utilize the filters as you
have suggested.

Obviously this solution is poor for some applications, but I anticipate
fewer than 10,000 searches per day and fewer than 10 index inserts per day
of the type that is involved. Besides, I'd imagine the analyze API is
rather speedy compared to running actual queries.

Thanks for the advice. This will be a little bit tedious, but not so bad.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ttJTE52hXf8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

Late reply, but I think a solution might be the use of an analyzer with the
combination of whitespace tokenizer, stemmer filter, and concatenation
filter. Then mapping the field analyzer (both index and search analyzers)
to it.

A simple concatenation filter is included in this post:
http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-concatenation-filter-td3711094.htmlI haven't tried though.

The setting might look like this:

"index": {
"analysis": {
"analyzer": {
"myAnalyzer": {
"tokenizer": "whitespace",
"filter": ["lowercase", "english_snowball", "stop",
"filter_concatenate"]
}
},
"filter": {
"filter_concatenate": {
"type":
"com.monpetitguide.elasticsearch.analysis.ConcatenateTokenFilterFactory",
"token_separator": ""
},
"english_snowball" : {
"type": "snowball",
"language": "English"
},
}
}
}

So in your scenario, "Square Steakhouse", "Squared Steakhouses", ..., will
be converted to "squaresteakhouse", therefore, "Squared" will mismatch,
where "Squared Steakhouse" will match.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You can see my post here for a more detailed solution:

Brian Webster | 918 633 6863

On Tue, Aug 13, 2013 at 3:01 AM, sina.tamanna@gmail.com wrote:

Hey,

Late reply, but I think a solution might be the use of an analyzer with
the combination of whitespace tokenizer, stemmer filter, and concatenation
filter. Then mapping the field analyzer (both index and search analyzers)
to it.

A simple concatenation filter is included in this post:
http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-concatenation-filter-td3711094.htmlI haven't tried though.

The setting might look like this:

"index": {
"analysis": {
"analyzer": {
"myAnalyzer": {
"tokenizer": "whitespace",
"filter": ["lowercase", "english_snowball", "stop",
"filter_concatenate"]
}
},
"filter": {
"filter_concatenate": {
"type":
"com.monpetitguide.elasticsearch.analysis.ConcatenateTokenFilterFactory",
"token_separator": ""
},
"english_snowball" : {
"type": "snowball",
"language": "English"
},
}
}
}

So in your scenario, "Square Steakhouse", "Squared Steakhouses", ..., will
be converted to "squaresteakhouse", therefore, "Squared" will mismatch,
where "Squared Steakhouse" will match.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ttJTE52hXf8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.