How I can do exact search by "not_analyzed" fields?


(Alexey Sidelnikov) #1

Hi all!

Tired to find out the decision. I have some fields in my index that are
not_analized. Suppose something like that in mappings:{
"test-index-entities": {
"mappings": {
"testobject": {
"properties": {
"__location": {
"type": "geo_point"
},
"activity_end": {
"type": "long"
},
"activity_start": {
"type": "long"
},
"attributes": {},
"createdBy": {
"type": "string",
"index": "not_analyzed",
"include_in_all": true
}
}
}
}
}
}

Suppose I have value of "createdBy" attribute equals to "John Smith", and I
want to query ALL documents that have ANY field equals to "John Smith". As
I have field "createdBy" as not_analyzed, I should have one term "John
Smith" in _all field.

But executing of match query:
{
"match" : {
"_all" : "John Smith"
}
}

don't give results. I think that is because match query tokenizes query
itself and search for "john" and "smith" terms instead of searching by
not_tokenized "John Smith". In case of
{
"match" : {
"createdBy" : "John Smith"
}
}
all works fine, as ES finds in mappings that field is not tokenized.

Is there any way to specify in match query not to analyze query string? I
see "analyzer" parameter in documentation
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html,
tried "analyzer" : "none", "analyzer" : "not_analyzed", but it doesn't work.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1d9ca4f-a0d3-4a50-a6a5-e3a59f972f5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

The _all field has its own analyzer, so the analyzer that is defined on the
createdBy field is not applied.

I have never tried, but I believe the best solution is to use "copy-to" to
a custom field:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#copy-to
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-all.html

It should copy the tokens post-analysis.

Cheers,

Ivan

On Tue, Jul 29, 2014 at 10:28 AM, Alexey Sidelnikov <
alexey.sidelnikov@reltio.com> wrote:

Hi all!

Tired to find out the decision. I have some fields in my index that are
not_analized. Suppose something like that in mappings:{
"test-index-entities": {
"mappings": {
"testobject": {
"properties": {
"__location": {
"type": "geo_point"
},
"activity_end": {
"type": "long"
},
"activity_start": {
"type": "long"
},
"attributes": {},
"createdBy": {
"type": "string",
"index": "not_analyzed",
"include_in_all": true
}
}
}
}
}
}

Suppose I have value of "createdBy" attribute equals to "John Smith", and
I want to query ALL documents that have ANY field equals to "John Smith".
As I have field "createdBy" as not_analyzed, I should have one term "John
Smith" in _all field.

But executing of match query:
{
"match" : {
"_all" : "John Smith"
}
}

don't give results. I think that is because match query tokenizes query
itself and search for "john" and "smith" terms instead of searching by
not_tokenized "John Smith". In case of
{
"match" : {
"createdBy" : "John Smith"
}
}
all works fine, as ES finds in mappings that field is not tokenized.

Is there any way to specify in match query not to analyze query string? I
see "analyzer" parameter in documentation
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html,
tried "analyzer" : "none", "analyzer" : "not_analyzed", but it doesn't work.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1d9ca4f-a0d3-4a50-a6a5-e3a59f972f5c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c1d9ca4f-a0d3-4a50-a6a5-e3a59f972f5c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD2nBVB%2Bdktqr8coLWzXCzqLhc8bcmJisaJ%2Bo2q_-M_rQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #3

If I actually read the documentation I provided, I would have seen that my
assumption is wrong. :slight_smile: copy-to fields also define their own analyzer and
do not use post-analysis tokens.

Not sure if there is a clean way to achieve this goal in Elasticsearch. I
handle the use case by custom logic on the client side. My application
knows which fields are analyzed and which are not and creates queries
accordingly.

--
Ivan

On Tue, Jul 29, 2014 at 10:52 AM, Ivan Brusic ivan@brusic.com wrote:

The _all field has its own analyzer, so the analyzer that is defined on
the createdBy field is not applied.

I have never tried, but I believe the best solution is to use "copy-to" to
a custom field:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#copy-to

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-all.html

It should copy the tokens post-analysis.

Cheers,

Ivan

On Tue, Jul 29, 2014 at 10:28 AM, Alexey Sidelnikov <
alexey.sidelnikov@reltio.com> wrote:

Hi all!

Tired to find out the decision. I have some fields in my index that are
not_analized. Suppose something like that in mappings:{
"test-index-entities": {
"mappings": {
"testobject": {
"properties": {
"__location": {
"type": "geo_point"
},
"activity_end": {
"type": "long"
},
"activity_start": {
"type": "long"
},
"attributes": {},
"createdBy": {
"type": "string",
"index": "not_analyzed",
"include_in_all": true
}
}
}
}
}
}

Suppose I have value of "createdBy" attribute equals to "John Smith", and
I want to query ALL documents that have ANY field equals to "John Smith".
As I have field "createdBy" as not_analyzed, I should have one term "John
Smith" in _all field.

But executing of match query:
{
"match" : {
"_all" : "John Smith"
}
}

don't give results. I think that is because match query tokenizes query
itself and search for "john" and "smith" terms instead of searching by
not_tokenized "John Smith". In case of
{
"match" : {
"createdBy" : "John Smith"
}
}
all works fine, as ES finds in mappings that field is not tokenized.

Is there any way to specify in match query not to analyze query string? I
see "analyzer" parameter in documentation
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html,
tried "analyzer" : "none", "analyzer" : "not_analyzed", but it doesn't work.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1d9ca4f-a0d3-4a50-a6a5-e3a59f972f5c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c1d9ca4f-a0d3-4a50-a6a5-e3a59f972f5c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD4MP7wVF0n%2Boi3Sz_2FaPohQjwDRbU4oYgei4s6WHPPg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Alexey Sidelnikov) #4

Thank you for reply.

Really I want to search by all document fields and enumerating all possible
attribute values in bool query is not good idea for me as my documents have
large number of different attribute names.

Idea we used before, we didn't used not_analyzed, but used our own encoder
to encode field values to some string without token delimiters. But we
think that it is not good way to do this, we want to try to find out
ElasticSearch solution as really ES holds not_analyzed values as terms, and
it should be the fastest query I can imagine for ES - "find all documents
having term like 'John Smith'"

Surely we can create our own _all filed instead of out of the box, like
"allnottokenized" that would contain all non_analyzed values that document
have in all not_analyzed attributes. But it is huge work - every time when
indexing document, we need to check mapping for every attribute to
understand whether it tokenzied or not and then fill our own all value.
Strange if ES doesn't have such simple functionality out of the box

On Tuesday, July 29, 2014 9:55:24 PM UTC+4, Ivan Brusic wrote:

If I actually read the documentation I provided, I would have seen that my
assumption is wrong. :slight_smile: copy-to fields also define their own analyzer and
do not use post-analysis tokens.

Not sure if there is a clean way to achieve this goal in Elasticsearch. I
handle the use case by custom logic on the client side. My application
knows which fields are analyzed and which are not and creates queries
accordingly.

--
Ivan

On Tue, Jul 29, 2014 at 10:52 AM, Ivan Brusic <iv...@brusic.com
<javascript:>> wrote:

The _all field has its own analyzer, so the analyzer that is defined on
the createdBy field is not applied.

I have never tried, but I believe the best solution is to use "copy-to"
to a custom field:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#copy-to

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-all.html

It should copy the tokens post-analysis.

Cheers,

Ivan

On Tue, Jul 29, 2014 at 10:28 AM, Alexey Sidelnikov <
alexey.s...@reltio.com <javascript:>> wrote:

Hi all!

Tired to find out the decision. I have some fields in my index that are
not_analized. Suppose something like that in mappings:{
"test-index-entities": {
"mappings": {
"testobject": {
"properties": {
"__location": {
"type": "geo_point"
},
"activity_end": {
"type": "long"
},
"activity_start": {
"type": "long"
},
"attributes": {},
"createdBy": {
"type": "string",
"index": "not_analyzed",
"include_in_all": true
}
}
}
}
}
}

Suppose I have value of "createdBy" attribute equals to "John Smith",
and I want to query ALL documents that have ANY field equals to "John
Smith". As I have field "createdBy" as not_analyzed, I should have one term
"John Smith" in _all field.

But executing of match query:
{
"match" : {
"_all" : "John Smith"
}
}

don't give results. I think that is because match query tokenizes query
itself and search for "john" and "smith" terms instead of searching by
not_tokenized "John Smith". In case of
{
"match" : {
"createdBy" : "John Smith"
}
}
all works fine, as ES finds in mappings that field is not tokenized.

Is there any way to specify in match query not to analyze query string?
I see "analyzer" parameter in documentation
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html,
tried "analyzer" : "none", "analyzer" : "not_analyzed", but it doesn't work.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1d9ca4f-a0d3-4a50-a6a5-e3a59f972f5c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c1d9ca4f-a0d3-4a50-a6a5-e3a59f972f5c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/43e411f6-0c85-460e-b7ee-65261f2eec83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5