Analyzer Settings for partial and as-is searches

Srivatsa_Katta · April 23, 2013, 7:05am

Hi,

We have the following search scenarios and to achieve these we are using
analyzer settings as given below. But with these settings, the disk space
for indexing 1gig of data is almost six fold. As in after indexing 1gig raw
data ends up with 6gig index in elasticsearch. Even memory consumption is
shooting up to 4 to 5 gig while indexing.

*I want to know *

If the high memory usage and disk space is a result of using Shingle and
NGram together ?
Are there any other combination of analyzers give us the
same behaviour but uses lesser disk space and memory ?
We have set *"term_vector" to "with_positions_offsets" *as we need
highlighting for the content.

Version being used : 0.20.6
*
*
Partial Searches

*e.g. *If we have documents with content like

DOC1 -> "Search isn’t just free text search anymore – it’s about
exploring your data. Understanding it. Gaining insights that will make your
business better or improve your product"
DOC2 -> "Store complex real world entities in Elasticsearch as structured
JSON documents. All fields are indexed by default, and all the indices can
be used in a single query, to return results at breath taking speed."
DOC3 -> "Operators like *, +, -, % are used for
performing arithmetic operations"

Search Queries:

Query String (search) - Should result in both DOC1 and DOC2
Query String (doc) - Should result in DOC2 (as it partially matches *
documents*)

As-Is Searches

Should search for phrases as-is with out tokenizing, when given in double
quotes. This includes the special characters. For this, we are specifying *"keyword"
*as explicit analyzer in our search queries, apart from node level analyzer
settings.

e.g. For the same set of DOCs

Search Queries:
*
*
Query String ("search") - Should result in only DOC1 and NOT DOC2 (because
in DOC2 its a partial match)
Query String ("like *") - Should result in only DOC3, * should NOT be
treated as wildcard.

Analyzer settings @ node level
*
*
index:
analysis:
analyzer:
default_search:
type: custom
tokenizer: whitespace
filter: [lowercase,stop,asciifolding,kstem]
default_index:
type: custom
tokenizer: whitespace
filter: [lowercase,asciifolding,my_shingle,kstem,my_ngram,
stop]
filter:
my_ngram:
max_gram: 50
type: nGram
min_gram: 2
my_shingle:
type: shingle
max_shingle_size: 5

Appreciate any help !!

-katta

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Igor_Motov · April 24, 2013, 10:15am

Could you explain the reason for using shingles and ngrams at the same
time? I am also not quite sure why you have decided to place kstem and stop
after shingle and ngrams filter. Would something as simple as this work for
you?

filter: [lowercase, stop, asciifolding, kstem, my_ngram]

On Tuesday, April 23, 2013 9:05:05 AM UTC+2, Srivatsa Katta wrote:

Hi,

We have the following search scenarios and to achieve these we are using
analyzer settings as given below. But with these settings, the disk space
for indexing 1gig of data is almost six fold. As in after indexing 1gig raw
data ends up with 6gig index in elasticsearch. Even memory consumption is
shooting up to 4 to 5 gig while indexing.

*I want to know *

If the high memory usage and disk space is a result of using Shingle
and NGram together ?

Are there any other combination of analyzers give us the
same behaviour but uses lesser disk space and memory ?

We have set *"term_vector" to "with_positions_offsets" *as we need
highlighting for the content.

Version being used : 0.20.6
*
*
Partial Searches

*e.g. *If we have documents with content like

DOC1 -> "Search isn’t just free text search anymore – it’s about
exploring your data. Understanding it. Gaining insights that will make your
business better or improve your product"
DOC2 -> "Store complex real world entities in Elasticsearch as
structured JSON documents. All fields are indexed by default, and all the
indices can be used in a single query, to return results at breath taking
speed."
DOC3 -> "Operators like *, +, -, % are used for
performing arithmetic operations"

Search Queries:

Query String (search) - Should result in both DOC1 and DOC2
Query String (doc) - Should result in DOC2 (as it partially matches *
documents*)

As-Is Searches

Should search for phrases as-is with out tokenizing, when given in double
quotes. This includes the special characters. For this, we are specifying
*"keyword" *as explicit analyzer in our search queries, apart from node
level analyzer settings.

e.g. For the same set of DOCs

Search Queries:
*
*
Query String ("search") - Should result in only DOC1 and NOT DOC2 (because
in DOC2 its a partial match)
Query String ("like *") - Should result in only DOC3, * should NOT be
treated as wildcard.

Analyzer settings @ node level
*
*
index:
analysis:
analyzer:
default_search:
type: custom
tokenizer: whitespace
filter: [lowercase,stop,asciifolding,kstem]
default_index:
type: custom
tokenizer: whitespace
filter: [lowercase,asciifolding,my_shingle,kstem,my_ngram,
stop]
filter:
my_ngram:
max_gram: 50
type: nGram
min_gram: 2
my_shingle:
type: shingle
max_shingle_size: 5

Appreciate any help !!

-katta

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Srivatsa_Katta · April 26, 2013, 4:37am

Hi Igor,

With the suggested filters i.e filter: [lowercase, stop, asciifolding,
kstem, my_ngram] when we search for ["like *"] it returns no hits with
search analyzer "keyword" or "default_search".

We get the same behaviour while searching for any phrase(group of more than
one term).

For example, If we search for ["real world entities"] it should return DOC2
but it is returning no results.

As per our understanding we get this behaviour for the following reasons:

If the search analyzer is "keyword" it will try to find the whole
phrase("real world entities") as one token in the list of tokens that have
been generated while indexing. But the indexed content has no token "real
world entities" because whitespace tokenizer breaks content at whitespace.
If the search analyzer is "default_search" it will break the phrase into
three tokens "real","world" and "entities" and tries to find these three
tokens at 0 distance to each other. But because during indexing "n_gram"
filter would have generated extra tokens those three tokens will have some
other tokens in between them so they will not be at 0 distance.

So, unless the search analyzer also has "n_gram" filter, it will not find
the phrase. And including n_gram filter in search analyzer gives many
irrelavent search results.

e.g. searching for [network] gives document with [internet] as search
result.

-Katta/Vidhi

On Wed, Apr 24, 2013 at 3:45 PM, Igor Motov imotov@gmail.com wrote:

Could you explain the reason for using shingles and ngrams at the same
time? I am also not quite sure why you have decided to place kstem and stop
after shingle and ngrams filter. Would something as simple as this work for
you?

filter: [lowercase, stop, asciifolding, kstem, my_ngram]

On Tuesday, April 23, 2013 9:05:05 AM UTC+2, Srivatsa Katta wrote:

Hi,

We have the following search scenarios and to achieve these we are using
analyzer settings as given below. But with these settings, the disk space
for indexing 1gig of data is almost six fold. As in after indexing 1gig raw
data ends up with 6gig index in elasticsearch. Even memory consumption is
shooting up to 4 to 5 gig while indexing.

*I want to know *

If the high memory usage and disk space is a result of using Shingle
and NGram together ?

Are there any other combination of analyzers give us the
same behaviour but uses lesser disk space and memory ?

We have set *"term_vector" to "with_positions_offsets" *as we
need highlighting for the content.

Version being used : 0.20.6
*
*
Partial Searches

*e.g. *If we have documents with content like

DOC1 -> "Search isn’t just free text search anymore – it’s about
exploring your data. Understanding it. Gaining insights that will make your
business better or improve your product"
DOC2 -> "Store complex real world entities in Elasticsearch as
structured JSON documents. All fields are indexed by default, and all the
indices can be used in a single query, to return results at breath taking
speed."
DOC3 -> "Operators like *, +, -, % are used for performing arithmetic *
*operations"

Search Queries:

Query String (search) - Should result in both DOC1 and DOC2
Query String (doc) - Should result in DOC2 (as it partially matches *
documents*)

As-Is Searches

Should search for phrases as-is with out tokenizing, when given in double
quotes. This includes the special characters. For this, we are specifying
*"keyword" *as explicit analyzer in our search queries, apart from node
level analyzer settings.

e.g. For the same set of DOCs

Search Queries:
*
*
Query String ("search") - Should result in only DOC1 and NOT DOC2
(because in DOC2 its a partial match)
Query String ("like *") - Should result in only DOC3, * should NOT be
treated as wildcard.

Analyzer settings @ node level
*
*
index:
analysis:
analyzer:
default_search:
type: custom
tokenizer: whitespace
filter: [lowercase,stop,asciifolding,kstem]
default_index:
type: custom
tokenizer: whitespace
filter: [lowercase,asciifolding,my_shingle,kstem,
my_ngram,stop]
filter:
my_ngram:
max_gram: 50
type: nGram
min_gram: 2
my_shingle:
type: shingle
max_shingle_size: 5

Appreciate any help !!

-katta

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_1wWlRp9Rak/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Igor_Motov · April 26, 2013, 1:54pm

It might be better to index all fields as multifields with [lowercase,
stop, asciifolding, kstem] and use these fields when you search for
phrases. Your current analyzer produces a lot of tokens - check what it
produces it a try using Analyze API. I am actually surprised that the
resulted index is only 5 times bigger than original data.

On Friday, April 26, 2013 6:37:32 AM UTC+2, Srivatsa Katta wrote:

Hi Igor,

With the suggested filters i.e filter: [lowercase, stop, asciifolding,
kstem, my_ngram] when we search for ["like *"] it returns no hits with
search analyzer "keyword" or "default_search".

We get the same behaviour while searching for any phrase(group of more
than one term).

For example, If we search for ["real world entities"] it should return
DOC2 but it is returning no results.

As per our understanding we get this behaviour for the following reasons:

If the search analyzer is "keyword" it will try to find the whole
phrase("real world entities") as one token in the list of tokens that have
been generated while indexing. But the indexed content has no token "real
world entities" because whitespace tokenizer breaks content at whitespace.

If the search analyzer is "default_search" it will break the phrase
into three tokens "real","world" and "entities" and tries to find these
three tokens at 0 distance to each other. But because during indexing
"n_gram" filter would have generated extra tokens those three tokens will
have some other tokens in between them so they will not be at 0 distance.

So, unless the search analyzer also has "n_gram" filter, it will not find
the phrase. And including n_gram filter in search analyzer gives many
irrelavent search results.

e.g. searching for [network] gives document with [internet] as search
result.

-Katta/Vidhi

On Wed, Apr 24, 2013 at 3:45 PM, Igor Motov <imo...@gmail.com<javascript:>

wrote:

Could you explain the reason for using shingles and ngrams at the same
time? I am also not quite sure why you have decided to place kstem and stop
after shingle and ngrams filter. Would something as simple as this work for
you?

filter: [lowercase, stop, asciifolding, kstem, my_ngram]

On Tuesday, April 23, 2013 9:05:05 AM UTC+2, Srivatsa Katta wrote:

Hi,

We have the following search scenarios and to achieve these we are using
analyzer settings as given below. But with these settings, the disk space
for indexing 1gig of data is almost six fold. As in after indexing 1gig raw
data ends up with 6gig index in elasticsearch. Even memory consumption is
shooting up to 4 to 5 gig while indexing.

*I want to know *

If the high memory usage and disk space is a result of using Shingle
and NGram together ?

Are there any other combination of analyzers give us the
same behaviour but uses lesser disk space and memory ?

We have set *"term_vector" to "with_positions_offsets" *as we
need highlighting for the content.

Version being used : 0.20.6
*
*
Partial Searches

*e.g. *If we have documents with content like

DOC1 -> "Search isn’t just free text search anymore – it’s about
exploring your data. Understanding it. Gaining insights that will make your
business better or improve your product"
DOC2 -> "Store complex real world entities in Elasticsearch as
structured JSON documents. All fields are indexed by default, and all the
indices can be used in a single query, to return results at breath taking
speed."
DOC3 -> "Operators like *, +, -, % are used for performing arithmetic
**operations"

Search Queries:

Query String (search) - Should result in both DOC1 and DOC2
Query String (doc) - Should result in DOC2 (as it partially matches *
documents*)

As-Is Searches

Should search for phrases as-is with out tokenizing, when given in
double quotes. This includes the special characters. For this, we are
specifying *"keyword" *as explicit analyzer in our search queries,
apart from node level analyzer settings.

e.g. For the same set of DOCs

Search Queries:
*
*
Query String ("search") - Should result in only DOC1 and NOT DOC2
(because in DOC2 its a partial match)
Query String ("like *") - Should result in only DOC3, * should NOT be
treated as wildcard.

Analyzer settings @ node level
*
*
index:
analysis:
analyzer:
default_search:
type: custom
tokenizer: whitespace
filter: [lowercase,stop,asciifolding,kstem]
default_index:
type: custom
tokenizer: whitespace
filter: [lowercase,asciifolding,my_shingle,kstem,
my_ngram,stop]
filter:
my_ngram:
max_gram: 50
type: nGram
min_gram: 2
my_shingle:
type: shingle
max_shingle_size: 5

Appreciate any help !!

-katta

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_1wWlRp9Rak/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Srivatsa_Katta · May 3, 2013, 12:15pm

Hi Igor,

Thanks for the suggestion.

Let me replay my understanding of what you are suggesting.

Use multi_field type for every field in a document and map the
alternative field analyzer to [lowercase, stop, asciifolding, kstem]
The actual field will have the analyzer setting
[lowercase,asciifolding,kstem,my_ngram,stop]
(NGram to achieve partial matches)

One question I have though is, doesn't this increase the index size ?
almost twice the size am guessing.

On Fri, Apr 26, 2013 at 7:24 PM, Igor Motov imotov@gmail.com wrote:

It might be better to index all fields as multifields with [lowercase,
stop, asciifolding, kstem] and use these fields when you search for
phrases. Your current analyzer produces a lot of tokens - check what it
produces it a try using Analyze API. I am actually surprised that the
resulted index is only 5 times bigger than original data.

On Friday, April 26, 2013 6:37:32 AM UTC+2, Srivatsa Katta wrote:

Hi Igor,

With the suggested filters i.e filter: [lowercase, stop, asciifolding,
kstem, my_ngram] when we search for ["like *"] it returns no hits with
search analyzer "keyword" or "default_search".

We get the same behaviour while searching for any phrase(group of more
than one term).

For example, If we search for ["real world entities"] it should return
DOC2 but it is returning no results.

As per our understanding we get this behaviour for the following reasons:

If the search analyzer is "keyword" it will try to find the whole
phrase("real world entities") as one token in the list of tokens that have
been generated while indexing. But the indexed content has no token "real
world entities" because whitespace tokenizer breaks content at whitespace.

If the search analyzer is "default_search" it will break the phrase
into three tokens "real","world" and "entities" and tries to find these
three tokens at 0 distance to each other. But because during indexing
"n_gram" filter would have generated extra tokens those three tokens will
have some other tokens in between them so they will not be at 0 distance.

So, unless the search analyzer also has "n_gram" filter, it will not find
the phrase. And including n_gram filter in search analyzer gives many
irrelavent search results.

e.g. searching for [network] gives document with [internet] as search
result.

-Katta/Vidhi

On Wed, Apr 24, 2013 at 3:45 PM, Igor Motov imo...@gmail.com wrote:

Could you explain the reason for using shingles and ngrams at the same
time? I am also not quite sure why you have decided to place kstem and stop
after shingle and ngrams filter. Would something as simple as this work for
you?

filter: [lowercase, stop, asciifolding**, kstem, my_ngram]

On Tuesday, April 23, 2013 9:05:05 AM UTC+2, Srivatsa Katta wrote:

Hi,

We have the following search scenarios and to achieve these we are
using analyzer settings as given below. But with these settings, the disk
space for indexing 1gig of data is almost six fold. As in after indexing
1gig raw data ends up with 6gig index in elasticsearch. Even memory
consumption is shooting up to 4 to 5 gig while indexing.

*I want to know *

If the high memory usage and disk space is a result of using Shingle
and NGram together ?

Are there any other combination of analyzers give us the
same behaviour but uses lesser disk space and memory ?

We have set *"term_vector" to "with_positions_offsets" *as we
need highlighting for the content.

Version being used : 0.20.6
*
*
Partial Searches

*e.g. *If we have documents with content like

DOC1 -> "Search isn’t just free text search anymore – it’s about
exploring your data. Understanding it. Gaining insights that will make your
business better or improve your product"
DOC2 -> "Store complex real world entities in Elasticsearch as
structured JSON documents. All fields are indexed by default, and all the
indices can be used in a single query, to return results at breath taking
speed."
DOC3 -> "Operators like *, +, -, % are used for
performing arithmetic operations"

Search Queries:

Query String (search) - Should result in both DOC1 and DOC2
Query String (doc) - Should result in DOC2 (as it partially matches *
documents*)

As-Is Searches

Should search for phrases as-is with out tokenizing, when given in
double quotes. This includes the special characters. For this, we are
specifying *"keyword" *as explicit analyzer in our search queries,
apart from node level analyzer settings.

e.g. For the same set of DOCs

Search Queries:
*
*
Query String ("search") - Should result in only DOC1 and NOT DOC2
(because in DOC2 its a partial match)
Query String ("like *") - Should result in only DOC3, * should NOT be
treated as wildcard.

Analyzer settings @ node level
*
*
index:
analysis:
analyzer:
default_search:
type: custom
tokenizer: whitespace
filter: [lowercase,stop,asciifolding,kstem]
default_index:
type: custom
tokenizer: whitespace
filter: [lowercase,asciifolding,my_shingle,kstem,
my_ngram,stop]
filter:
my_ngram:
max_gram: 50
type: nGram
min_gram: 2
my_shingle:
type: shingle
max_shingle_size: 5

Appreciate any help !!

-katta

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/_**1wWlRp9Rak/unsubscribe?hl=en-**UShttps://groups.google.com/d/topic/elasticsearch/_1wWlRp9Rak/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_1wWlRp9Rak/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Igor_Motov · May 3, 2013, 12:31pm

Yes, it will increase index size but it will be much smaller increase than
with shingles.

On Friday, May 3, 2013 8:15:36 AM UTC-4, Srivatsa Katta wrote:

Hi Igor,

Thanks for the suggestion.

Let me replay my understanding of what you are suggesting.

Use multi_field type for every field in a document and map the
alternative field analyzer to [lowercase, stop, asciifolding, kstem]

The actual field will have the analyzer setting [lowercase,asciifolding,kstem,my_ngram,stop]
(NGram to achieve partial matches)

One question I have though is, doesn't this increase the index size ?
almost twice the size am guessing.

On Fri, Apr 26, 2013 at 7:24 PM, Igor Motov <imo...@gmail.com<javascript:>

wrote:

It might be better to index all fields as multifields with [lowercase,
stop, asciifolding, kstem] and use these fields when you search for
phrases. Your current analyzer produces a lot of tokens - check what it
produces it a try using Analyze API. I am actually surprised that the
resulted index is only 5 times bigger than original data.

On Friday, April 26, 2013 6:37:32 AM UTC+2, Srivatsa Katta wrote:

Hi Igor,

With the suggested filters i.e filter: [lowercase, stop, asciifolding,
kstem, my_ngram] when we search for ["like *"] it returns no hits with
search analyzer "keyword" or "default_search".

We get the same behaviour while searching for any phrase(group of more
than one term).

For example, If we search for ["real world entities"] it should return
DOC2 but it is returning no results.

As per our understanding we get this behaviour for the following reasons:

If the search analyzer is "keyword" it will try to find the whole
phrase("real world entities") as one token in the list of tokens that have
been generated while indexing. But the indexed content has no token "real
world entities" because whitespace tokenizer breaks content at whitespace.

If the search analyzer is "default_search" it will break the phrase
into three tokens "real","world" and "entities" and tries to find these
three tokens at 0 distance to each other. But because during indexing
"n_gram" filter would have generated extra tokens those three tokens will
have some other tokens in between them so they will not be at 0 distance.

So, unless the search analyzer also has "n_gram" filter, it will not
find the phrase. And including n_gram filter in search analyzer gives many
irrelavent search results.

e.g. searching for [network] gives document with [internet] as search
result.

-Katta/Vidhi

On Wed, Apr 24, 2013 at 3:45 PM, Igor Motov imo...@gmail.com wrote:

Could you explain the reason for using shingles and ngrams at the same
time? I am also not quite sure why you have decided to place kstem and stop
after shingle and ngrams filter. Would something as simple as this work for
you?

filter: [lowercase, stop, asciifolding**, kstem, my_ngram]

On Tuesday, April 23, 2013 9:05:05 AM UTC+2, Srivatsa Katta wrote:

Hi,

We have the following search scenarios and to achieve these we are
using analyzer settings as given below. But with these settings, the disk
space for indexing 1gig of data is almost six fold. As in after indexing
1gig raw data ends up with 6gig index in elasticsearch. Even memory
consumption is shooting up to 4 to 5 gig while indexing.

*I want to know *

If the high memory usage and disk space is a result of using
Shingle and NGram together ?

Are there any other combination of analyzers give us the
same behaviour but uses lesser disk space and memory ?

We have set *"term_vector" to "with_positions_offsets" *as we
need highlighting for the content.

Version being used : 0.20.6
*
*
Partial Searches

*e.g. *If we have documents with content like

DOC1 -> "Search isn’t just free text search anymore – it’s about
exploring your data. Understanding it. Gaining insights that will make your
business better or improve your product"
DOC2 -> "Store complex real world entities in Elasticsearch as
structured JSON documents. All fields are indexed by default, and all the
indices can be used in a single query, to return results at breath taking
speed."
DOC3 -> "Operators like *, +, -, % are used for
performing arithmetic operations"

Search Queries:

Query String (search) - Should result in both DOC1 and DOC2
Query String (doc) - Should result in DOC2 (as it partially matches *
documents*)

As-Is Searches

Should search for phrases as-is with out tokenizing, when given in
double quotes. This includes the special characters. For this, we are
specifying *"keyword" *as explicit analyzer in our search queries,
apart from node level analyzer settings.

e.g. For the same set of DOCs

Search Queries:
*
*
Query String ("search") - Should result in only DOC1 and NOT DOC2
(because in DOC2 its a partial match)
Query String ("like *") - Should result in only DOC3, * should NOT be
treated as wildcard.

Analyzer settings @ node level
*
*
index:
analysis:
analyzer:
default_search:
type: custom
tokenizer: whitespace
filter: [lowercase,stop,asciifolding,kstem]
default_index:
type: custom
tokenizer: whitespace
filter: [lowercase,asciifolding,my_shingle,kstem,
my_ngram,stop]
filter:
my_ngram:
max_gram: 50
type: nGram
min_gram: 2
my_shingle:
type: shingle
max_shingle_size: 5

Appreciate any help !!

-katta

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/_**1wWlRp9Rak/unsubscribe?hl=en-**UShttps://groups.google.com/d/topic/elasticsearch/_1wWlRp9Rak/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_1wWlRp9Rak/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elasticsearch index size using different analyzers Elasticsearch	16	1824	February 1, 2017
Analyzer problem Elasticsearch	3	339	July 6, 2017
Elasticsearch indexing storage mechanism Elasticsearch	3	556	July 5, 2017
Elasticsearch Index Analyzers and Memory Management Elasticsearch	1	358	May 14, 2019
Elasticsearch can't hanlde space after add analyzer Elasticsearch	3	405	April 21, 2022

Analyzer Settings for partial and as-is searches

Related topics