Issue with searching string field


(Matthias Johnson) #1

Good day to all!

We are struggling with what seems like a potential problem with ES and
searching for data in field. To simplify I reduced our document to only the
salient points. Consider the following two simply documents:

test1:

{
"field1":"LANG000000904",
"field2":"LANG000000904"
}

test2:

{
"field1":"monkey",
"field2":"LANG000000904"
}

I put the documents into and index called /test/type with names 1 and 2
respectively.

The mapping for this comes out with string type for both fields.

{
"test": {
"type": {
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "string"
}
}
}
}
}

Note that field1 contains LANG000000904 in the first document and monkeyin the second.

Now when I search for LANG000000904 i get 0 hits:

curl -s 'http://localhost:9200/test/type/_search?pretty' -d '{ "query" :

{ "term" : { "field1":"LANG000000904" } } }'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

However searching for monkey i get one result as expected:

curl -s 'http://localhost:9200/test/type/_search?pretty' -d '{ "query" :

{ "term" : { "field1":"monkey" } } }'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "type",
"_id" : "2",
"_score" : 1.0, "_source" : {
"field1":"monkey",
"field2":"LANG000000904"
}

} ]

}
}

It seems to me that the first search for LANG000000904 should return 1
hit for document 1, but it seems that the alphanumeric string is somehow
not found while the purely alphabetic string is found .... Are we missing
something that would make this work correctly?

Secondly, we try the GET URI based search as well, which seemed to work,
however it seem there is an issue with that as well:

curl -s

'http://localhost:9200/test/type/_search?q=field1=LANG000000904&pretty'

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.087051645,
"hits" : [ {
"_index" : "test",
"_type" : "type",
"_id" : "1",
"_score" : 0.087051645, "_source" : {
"field1":"LANG000000904",
"field2":"LANG000000904"
}

}, {
  "_index" : "test",
  "_type" : "type",
  "_id" : "2",
  "_score" : 0.061554804, "_source" : {

"field1":"monkey",
"field2":"LANG000000904"
}

} ]

}
}

Note that only field1 in document 1 contains the string LANG000000904,
yet we seem to get both documents returned.

Additionally we tested the GET URI requests for searching and those appear
to be working as expected:

curl -s

'http://localhost:9200/test/type/_search?q=field1:LANG000000904&pretty'

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "type",
"_id" : "1",
"_score" : 1.0, "_source" : {
"field1":"LANG000000904",
"field2":"LANG000000904"
}

} ]

}
}

It seems that perhaps there is something not working right with POST/JSON
query, but perhaps we are not doing it right.

Any comments and ideas would be much appreciated.

Thanks,

@matthias

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Britta Weber) #2

Hi Matthias,

the mapping as is causes the field to be analyzed with a standard
analyzer. This includes case folding.
As for the first query, the "term" query does not perform any analysis
on the query string which is why the document is not found with the
"term" query: is searches for "LANG..." but the analyzed field
contains "lang...". You can either use "match" instead of "term" or
change your mapping to make sure the field is not analyzed.

The third query works, because "q=field1:LANG000000904" this is
interpreted as query on the field "field1" and the query string is
analyzed per default when passing as uri parameter.

As for the second query, you wrote "q=field1=LANG000000904". This is
interpreted as a search for the string "field1=LANG000000904" on all
fields. The query string is analyzed as before, resulting in two terms
"field1" and "lang000000904" which are then searched for on all
fields. This is why both documents match.

Hope that helps.

Cheers,
Brittas

On Fri, Sep 6, 2013 at 5:57 PM, Matthias Johnson opennomad@gmail.com wrote:

Good day to all!

We are struggling with what seems like a potential problem with ES and
searching for data in field. To simplify I reduced our document to only the
salient points. Consider the following two simply documents:

test1:

{
"field1":"LANG000000904",
"field2":"LANG000000904"
}

test2:

{
"field1":"monkey",
"field2":"LANG000000904"
}

I put the documents into and index called /test/type with names 1 and 2
respectively.

The mapping for this comes out with string type for both fields.

{
"test": {
"type": {
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "string"
}
}
}
}
}

Note that field1 contains LANG000000904 in the first document and monkey in
the second.

Now when I search for LANG000000904 i get 0 hits:

curl -s 'http://localhost:9200/test/type/_search?pretty' -d '{ "query" : {

"term" : { "field1":"LANG000000904" } } }'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

However searching for monkey i get one result as expected:

curl -s 'http://localhost:9200/test/type/_search?pretty' -d '{ "query" : {

"term" : { "field1":"monkey" } } }'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "type",
"_id" : "2",
"_score" : 1.0, "_source" : {
"field1":"monkey",
"field2":"LANG000000904"
}

} ]

}
}

It seems to me that the first search for LANG000000904 should return 1 hit
for document 1, but it seems that the alphanumeric string is somehow not
found while the purely alphabetic string is found .... Are we missing
something that would make this work correctly?

Secondly, we try the GET URI based search as well, which seemed to work,
however it seem there is an issue with that as well:

curl -s

'http://localhost:9200/test/type/_search?q=field1=LANG000000904&pretty'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.087051645,
"hits" : [ {
"_index" : "test",
"_type" : "type",
"_id" : "1",
"_score" : 0.087051645, "_source" : {
"field1":"LANG000000904",
"field2":"LANG000000904"
}

}, {
  "_index" : "test",
  "_type" : "type",
  "_id" : "2",
  "_score" : 0.061554804, "_source" : {

"field1":"monkey",
"field2":"LANG000000904"
}

} ]

}
}

Note that only field1 in document 1 contains the string LANG000000904, yet
we seem to get both documents returned.

Additionally we tested the GET URI requests for searching and those appear
to be working as expected:

curl -s

'http://localhost:9200/test/type/_search?q=field1:LANG000000904&pretty'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "type",
"_id" : "1",
"_score" : 1.0, "_source" : {
"field1":"LANG000000904",
"field2":"LANG000000904"
}

} ]

}
}

It seems that perhaps there is something not working right with POST/JSON
query, but perhaps we are not doing it right.

Any comments and ideas would be much appreciated.

Thanks,

@matthias

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3