Tokenizing my field even with index not analyzed


(Josep Floriach Ventosinos) #1

Hi everyone,

I have an index where two of the data fields is long type, and the other
string type. They are PID and Batch_Name. I'm trying to do a query that
returns me the entry with PID = 25747 and batch name = ZEJINNSP05.That's
how I'm doing the query:

http://myserver.com:9200/batches-*/_search?q=PID:25747&Batch_Name:ZEJININSP05&pretty=true

The result of this query are two entries. One of both is correct. It have
this PID and this batch name. the other one has the correct PID but the
following batch name: ZEJINLECT32

I guess that happens because ES is tokenizing my field. But I don't
understand why, since I'm telling in my template that this field should not
be tokenized.

This is how looks my template:

#!/bin/sh
curl -XPUT 'localhost:9200/_template/batches_online' -d '{
"template": "batches-*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"index.refresh_interval": "5s"
},

"mappings": {
    "logs": {
        "properties": {

"Batch_name": {
"type": "string",
"index":"not_analyzed"
}
... (more fields)

What I'm doing wrong?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/077f7d0a-67b6-42fc-a056-f64d37ede808%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Tomislav Poljak) #2

Hi Josep,
problem of matching too much entries is not related to the analysis of
"Batch_name" field ('tokenization' of "Batch_name"), but actually its
in the way how you constructed URI search params. In your URL
.../_search?q=PID:25747&Batch_Name:ZEJININSP05... '&' is actually
separating http params (like &pretty=true etc) and not used as AND in
the Lucene query (what I think you expect). So, you actually get all
matches for q=PID:25747 regardless of 'Batch_Name' value.

Try this instead
http://myserver.com:9200/batches-*/_search?q=PID:25747%20AND%20Batch_Name:ZEJININSP05&pretty=true

(not sure if its "Batch_Name" which is in the query or "Batch_name" in mapping)

If you check http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-uri-request.html
for constructing URI search queries you'll see you need to use Lucene
syntax (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax)
when building a query as URI param

Hope this helps,
Tomislav

2014-05-27 9:52 GMT+02:00 Josep Floriach Ventosinos
josep.floriach.ventosinos@gmail.com:

Hi everyone,

I have an index where two of the data fields is long type, and the other
string type. They are PID and Batch_Name. I'm trying to do a query that
returns me the entry with PID = 25747 and batch name = ZEJINNSP05.That's how
I'm doing the query:

http://myserver.com:9200/batches-*/_search?q=PID:25747&Batch_Name:ZEJININSP05&pretty=true

The result of this query are two entries. One of both is correct. It have
this PID and this batch name. the other one has the correct PID but the
following batch name: ZEJINLECT32

I guess that happens because ES is tokenizing my field. But I don't
understand why, since I'm telling in my template that this field should not
be tokenized.

This is how looks my template:

#!/bin/sh
curl -XPUT 'localhost:9200/_template/batches_online' -d '{
"template": "batches-*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"index.refresh_interval": "5s"
},

"mappings": {
    "logs": {
        "properties": {

"Batch_name": {
"type": "string",
"index":"not_analyzed"
}
... (more fields)

What I'm doing wrong?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/077f7d0a-67b6-42fc-a056-f64d37ede808%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALuCJxjVXCrMy8k197fA7y%3D51fSt3Vnty5O8sP%3Dh9tckWP__yA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Josep Floriach Ventosinos) #3

Oh. I'm so dumb... :stuck_out_tongue: That helped me a lot.

Thanks!!

On Tuesday, May 27, 2014 9:52:12 AM UTC+2, Josep Floriach Ventosinos wrote:

Hi everyone,

I have an index where two of the data fields is long type, and the other
string type. They are PID and Batch_Name. I'm trying to do a query that
returns me the entry with PID = 25747 and batch name = ZEJINNSP05.That's
how I'm doing the query:

http://myserver.com:9200/batches-*/_search?q=PID:25747&Batch_Name:ZEJININSP05&pretty=true

The result of this query are two entries. One of both is correct. It have
this PID and this batch name. the other one has the correct PID but the
following batch name: ZEJINLECT32

I guess that happens because ES is tokenizing my field. But I don't
understand why, since I'm telling in my template that this field should not
be tokenized.

This is how looks my template:

#!/bin/sh
curl -XPUT 'localhost:9200/_template/batches_online' -d '{
"template": "batches-*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"index.refresh_interval": "5s"
},

"mappings": {
    "logs": {
        "properties": {

"Batch_name": {
"type": "string",
"index":"not_analyzed"
}
... (more fields)

What I'm doing wrong?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb49b695-d02e-4e0b-a195-c9352a879073%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4