How to search exact text?

Amy · December 5, 2012, 10:46am

Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'

Why is "it" a special case? How do I search for the exact text and get back
results every time?
Regards,
Amy.

--

dadoonet · December 5, 2012, 11:14am

By default, Elasticsearch applied a standard analyzer (english analyzer).
The immediate consequence is that common words are ignored during the analyze
process.

"IT" is a common word in english. So it has not been indexed.

Your use case indicates that you have coded field, "it" instead of italian, I
suppose.

So, you can either define a mapping for the field language and set your field as
"index":"not_analyzed"

See doc here:

Or, you can define you own analyzer, for example, I often use a custom analyzer
with a keyword tokenizer with a lowercase filter.
And apply it to your field.

Does it help?
David.

Le 5 décembre 2012 à 11:46, Amy amyblarney@gmail.com a écrit :

Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'

Why is "it" a special case? How do I search for the exact text and get back
results every time?
Regards,
Amy.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

radu_gheorghe · December 5, 2012, 11:14am

Hello Amy,

On Wed, Dec 5, 2012 at 12:46 PM, Amy amyblarney@gmail.com wrote:

Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'

Why is "it" a special case?

It's because by default, fields are analyzed using the standard analyzer.
And that also ignores English stop words from the list of terms. And "it"
is an English stop word.

How do I search for the exact text and get back results every time?

If you want exact results of your documents, you can tell ES not to analyze
your field at all:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

That would also improve performance on indexing new docs.

But you can also customize the analysis process, as ES exposes lots of
options:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

radu_gheorghe · December 5, 2012, 11:16am

David, there should be some mid-air collision detection on this group

On Wed, Dec 5, 2012 at 1:14 PM, David Pilato david@pilato.fr wrote:

**
By default, Elasticsearch applied a standard analyzer (english analyzer).
The immediate consequence is that common words are ignored during the
analyze process.

"IT" is a common word in english. So it has not been indexed.

Your use case indicates that you have coded field, "it" instead of
italian, I suppose.

So, you can either define a mapping for the field language and set your
field as "index":"not_analyzed"

See doc here:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Or, you can define you own analyzer, for example, I often use a custom
analyzer with a keyword tokenizer with a lowercase filter.
And apply it to your field.

Does it help?
David.

Le 5 décembre 2012 à 11:46, Amy amyblarney@gmail.com a écrit :

Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'

Why is "it" a special case? How do I search for the exact text and get
back results every time?
Regards,
Amy.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

dadoonet · December 5, 2012, 11:20am

LOL! Right

What about a _version field on each thread

Cheers

Le 5 décembre 2012 à 12:16, Radu Gheorghe radu.gheorghe@sematext.com a écrit :

David, there should be some mid-air collision detection on this group

On Wed, Dec 5, 2012 at 1:14 PM, David Pilato <david@pilato.fr
mailto:david@pilato.fr > wrote:
By default, Elasticsearch applied a standard analyzer (english
analyzer).
The immediate consequence is that common words are ignored during the
analyze process.

"IT" is a common word in english. So it has not been indexed.

Your use case indicates that you have coded field, "it" instead of
italian, I suppose.

So, you can either define a mapping for the field language and set your
field as "index":"not_analyzed"

See doc here:
Elasticsearch Platform — Find real-time answers at scale | Elastic
http://www.elasticsearch.org/guide/reference/mapping/core-types.html

Or, you can define you own analyzer, for example, I often use a custom
analyzer with a keyword tokenizer with a lowercase filter.
And apply it to your field.

Does it help?
David.

Le 5 décembre 2012 à 11:46, Amy < amyblarney@gmail.com
mailto:amyblarney@gmail.com > a écrit :
> > > Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'
Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'
Why is "it" a special case? How do I search for the exact text and get
back results every time?
Regards,
Amy.
--
--
David Pilato
http://www.scrutmydocs.org/ http://www.scrutmydocs.org/
http://dev.david.pilato.fr/ http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--
--
http://sematext.com/ http://sematext.com/ -- Elasticsearch -- Solr --
Lucene

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Amy · December 5, 2012, 12:38pm

Hi,
Wow, that was quick! Thanks! That helped.
I added the standard analyser by adding the following to the
elasticsearch.yml config file:

#index Settings
index:
analysis:
analyzer:
# set standard analyzer with no stop words as the default for both
indexing and searching
default:
type: standard
stopwords: none

On Wednesday, December 5, 2012 11:20:41 AM UTC, David Pilato wrote:

LOL! Right

What about a _version field on each thread

Cheers

Le 5 décembre 2012 à 12:16, Radu Gheorghe <radu.g...@sematext.com<javascript:>>
a écrit :

David, there should be some mid-air collision detection on this group

On Wed, Dec 5, 2012 at 1:14 PM, David Pilato <da...@pilato.fr<javascript:>

wrote:

By default, Elasticsearch applied a standard analyzer (english
analyzer).
The immediate consequence is that common words are ignored during the
analyze process.

"IT" is a common word in english. So it has not been indexed.

Your use case indicates that you have coded field, "it" instead of
italian, I suppose.

So, you can either define a mapping for the field language and set your
field as "index":"not_analyzed"

See doc here:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Or, you can define you own analyzer, for example, I often use a custom
analyzer with a keyword tokenizer with a lowercase filter.
And apply it to your field.

Does it help?
David.

Le 5 décembre 2012 à 11:46, Amy < amybl...@gmail.com <javascript:>> a
écrit :

Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'

Why is "it" a special case? How do I search for the exact text and get
back results every time?
Regards,
Amy.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Topic		Replies	Views
ElasticSearch not able to find documents which contains the searched text Elasticsearch	4	442	July 6, 2017
Query string returns 0 hits Elasticsearch	3	805	April 11, 2019
Exact Match Query on analyzed field Elasticsearch	2	2333	July 6, 2017
Exact word search Elasticsearch	3	296	May 21, 2020
How do I implement exact full text search on an index created by fscrawler Elasticsearch	7	1268	December 4, 2019

How to search exact text?

Best regards, Radu

Related topics

Best regards,
Radu