How to search exact text?

Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'

Why is "it" a special case? How do I search for the exact text and get back
results every time?
Regards,
Amy.

--

By default, Elasticsearch applied a standard analyzer (english analyzer).
The immediate consequence is that common words are ignored during the analyze
process.

"IT" is a common word in english. So it has not been indexed.

Your use case indicates that you have coded field, "it" instead of italian, I
suppose.

So, you can either define a mapping for the field language and set your field as
"index":"not_analyzed"

See doc here:

Or, you can define you own analyzer, for example, I often use a custom analyzer
with a keyword tokenizer with a lowercase filter.
And apply it to your field.

Does it help?
David.

Le 5 décembre 2012 à 11:46, Amy amyblarney@gmail.com a écrit :

Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'

Why is "it" a special case? How do I search for the exact text and get back
results every time?
Regards,
Amy.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Hello Amy,

On Wed, Dec 5, 2012 at 12:46 PM, Amy amyblarney@gmail.com wrote:

Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'

Why is "it" a special case?

It's because by default, fields are analyzed using the standard analyzer.
And that also ignores English stop words from the list of terms. And "it"
is an English stop word.

How do I search for the exact text and get back results every time?

If you want exact results of your documents, you can tell ES not to analyze
your field at all:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

That would also improve performance on indexing new docs.

But you can also customize the analysis process, as ES exposes lots of
options:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

David, there should be some mid-air collision detection on this group :slight_smile:

On Wed, Dec 5, 2012 at 1:14 PM, David Pilato david@pilato.fr wrote:

**
By default, Elasticsearch applied a standard analyzer (english analyzer).
The immediate consequence is that common words are ignored during the
analyze process.

"IT" is a common word in english. So it has not been indexed.

Your use case indicates that you have coded field, "it" instead of
italian, I suppose.

So, you can either define a mapping for the field language and set your
field as "index":"not_analyzed"

See doc here:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Or, you can define you own analyzer, for example, I often use a custom
analyzer with a keyword tokenizer with a lowercase filter.
And apply it to your field.

Does it help?
David.

Le 5 décembre 2012 à 11:46, Amy amyblarney@gmail.com a écrit :

Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'

Why is "it" a special case? How do I search for the exact text and get
back results every time?
Regards,
Amy.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

LOL! Right :wink:

What about a _version field on each thread :wink:

Cheers

Le 5 décembre 2012 à 12:16, Radu Gheorghe radu.gheorghe@sematext.com a écrit :

David, there should be some mid-air collision detection on this group :slight_smile:

On Wed, Dec 5, 2012 at 1:14 PM, David Pilato <david@pilato.fr
mailto:david@pilato.fr > wrote:

By default, Elasticsearch applied a standard analyzer (english
analyzer).
The immediate consequence is that common words are ignored during the
analyze process.

"IT" is a common word in english. So it has not been indexed.

Your use case indicates that you have coded field, "it" instead of
italian, I suppose.

So, you can either define a mapping for the field language and set your
field as "index":"not_analyzed"

See doc here:
Elasticsearch Platform — Find real-time answers at scale | Elastic
http://www.elasticsearch.org/guide/reference/mapping/core-types.html

Or, you can define you own analyzer, for example, I often use a custom
analyzer with a keyword tokenizer with a lowercase filter.
And apply it to your field.

Does it help?
David.

Le 5 décembre 2012 à 11:46, Amy < amyblarney@gmail.com
mailto:amyblarney@gmail.com > a écrit :

> > > Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d

'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d

'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'

Why is "it" a special case? How do I search for the exact text and get

back results every time?
Regards,
Amy.

--

--
David Pilato
http://www.scrutmydocs.org/ http://www.scrutmydocs.org/
http://dev.david.pilato.fr/ http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
http://sematext.com/ http://sematext.com/ -- Elasticsearch -- Solr --
Lucene

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Hi,
Wow, that was quick! Thanks! That helped.
I added the standard analyser by adding the following to the
elasticsearch.yml config file:

#index Settings
index:
analysis:
analyzer:
# set standard analyzer with no stop words as the default for both
indexing and searching
default:
type: standard
stopwords: none

On Wednesday, December 5, 2012 11:20:41 AM UTC, David Pilato wrote:

LOL! Right :wink:

What about a _version field on each thread :wink:

Cheers

Le 5 décembre 2012 à 12:16, Radu Gheorghe <radu.g...@sematext.com<javascript:>>
a écrit :

David, there should be some mid-air collision detection on this group :slight_smile:

On Wed, Dec 5, 2012 at 1:14 PM, David Pilato <da...@pilato.fr<javascript:>

wrote:

By default, Elasticsearch applied a standard analyzer (english
analyzer).
The immediate consequence is that common words are ignored during the
analyze process.

"IT" is a common word in english. So it has not been indexed.

Your use case indicates that you have coded field, "it" instead of
italian, I suppose.

So, you can either define a mapping for the field language and set your
field as "index":"not_analyzed"

See doc here:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Or, you can define you own analyzer, for example, I often use a custom
analyzer with a keyword tokenizer with a lowercase filter.
And apply it to your field.

Does it help?
David.

Le 5 décembre 2012 à 11:46, Amy < amybl...@gmail.com <javascript:>> a
écrit :

Hi,
I've added the following 2 docs to my index:
curl -XPUT localhost:9200/testindex/doc/3 -d '{"language":"it"}'
curl -XPUT localhost:9200/testindex/doc/4 -d '{"language":"pp"}'

I'd like to search for the docs by language.

The following query returns no documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"it"}}]}}}'

Whereas searching for the other "language" (pp) does return documents:
curl -XPOST localhost:9200/testindex/_search -d
'{"query":{"bool":{"must":[{"term":{"language":"pp"}}]}}}'

Why is "it" a special case? How do I search for the exact text and get
back results every time?
Regards,
Amy.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--