One particular value in a field isn't indexed


(felix.kofink) #1

Hi,

this might be a rookie problem since I'm very new to elasticsearch.

I'm trying to put JSON documents into elasticsearch with a field "lang".
However if "lang" is set to "it" elasticsearch doesn't seem to recognize
the field since it's only returned when I filter for missing fields.
The problem can be recreated very simple:

curl -XPUT 'http://localhost:9200/test/test/1' -d '{"lang":"de"}'
curl -XPUT 'http://localhost:9200/test/test/2' -d '{"lang":"it"}'

If I try to search for lang:de:
curl -XGET http://s445.gfsrv.net:9200/test/test/_search?pretty -d
'
{
"query": {
"query_string": {
"query": "lang:de"
}
}

I get a result:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 0.30685282, "_source" : {"lang":"de"}
} ]
}
}

However if I search for lang:it:
curl -XGET http://s445.gfsrv.net:9200/test/test/_search?pretty -d '
{
"query": {
"query_string": {
"query": "lang:it"
}
}
}'

There is no result:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

If I search for missing field lang:
curl -XGET http://s445.gfsrv.net:9200/test2/test/_search?pretty -d
'{"query":{"filtered":{"filter":{"missing":{"field":"lang"}}}}}'

There it is:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test2",
"_type" : "test",
"_id" : "2",
"_score" : 1.0, "_source" : {"lang":"it"}
} ]
}
}

Thanks for your time and any help would be appreciated.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1bfddb52-2bbb-4e24-ab71-facbe365f45c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #2

Very rookie problem. :slight_smile:

The default (aka standard) analyzer uses a stopword filter and "it" is a
stopword. Try configuring your field with a custom analyzer which does not
use stopwords or a custom set of stopwords.

Cheers,

Ivan

On Tue, Feb 11, 2014 at 7:57 AM, felix.kofink@gameforge.de wrote:

Hi,

this might be a rookie problem since I'm very new to elasticsearch.

I'm trying to put JSON documents into elasticsearch with a field "lang".
However if "lang" is set to "it" elasticsearch doesn't seem to recognize
the field since it's only returned when I filter for missing fields.
The problem can be recreated very simple:

curl -XPUT 'http://localhost:9200/test/test/1' -d '{"lang":"de"}'
curl -XPUT 'http://localhost:9200/test/test/2' -d '{"lang":"it"}'

If I try to search for lang:de:
curl -XGET http://s445.gfsrv.net:9200/test/test/_search?pretty -d
'
{
"query": {
"query_string": {
"query": "lang:de"
}
}

I get a result:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 0.30685282, "_source" : {"lang":"de"}
} ]
}
}

However if I search for lang:it:
curl -XGET http://s445.gfsrv.net:9200/test/test/_search?pretty -d '
{
"query": {
"query_string": {
"query": "lang:it"
}
}
}'

There is no result:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

If I search for missing field lang:
curl -XGET http://s445.gfsrv.net:9200/test2/test/_search?pretty -d
'{"query":{"filtered":{"filter":{"missing":{"field":"lang"}}}}}'

There it is:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test2",
"_type" : "test",
"_id" : "2",
"_score" : 1.0, "_source" : {"lang":"it"}
} ]
}
}

Thanks for your time and any help would be appreciated.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1bfddb52-2bbb-4e24-ab71-facbe365f45c%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBZKgp6PHf_Do9PK4nBQbtHzowk6odWon%2BBqPXtL%2BZnqw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #3

Actually in your case, your search terms probably do not need to be
analyzed at all since you are not executing full-text searches on that
field. Try setting the field as non_analyzed and use a term query (which
does not analyze search terms). Better yet, using a term filter since
filters are faster, can be cached and do not influence scoring.

Cheers,

Ivan

On Tue, Feb 11, 2014 at 9:00 AM, Ivan Brusic ivan@brusic.com wrote:

Very rookie problem. :slight_smile:

The default (aka standard) analyzer uses a stopword filter and "it" is a
stopword. Try configuring your field with a custom analyzer which does not
use stopwords or a custom set of stopwords.

Cheers,

Ivan

On Tue, Feb 11, 2014 at 7:57 AM, felix.kofink@gameforge.de wrote:

Hi,

this might be a rookie problem since I'm very new to elasticsearch.

I'm trying to put JSON documents into elasticsearch with a field "lang".
However if "lang" is set to "it" elasticsearch doesn't seem to recognize
the field since it's only returned when I filter for missing fields.
The problem can be recreated very simple:

curl -XPUT 'http://localhost:9200/test/test/1' -d '{"lang":"de"}'
curl -XPUT 'http://localhost:9200/test/test/2' -d '{"lang":"it"}'

If I try to search for lang:de:
curl -XGET http://s445.gfsrv.net:9200/test/test/_search?pretty -d
'
{
"query": {
"query_string": {
"query": "lang:de"
}
}

I get a result:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 0.30685282, "_source" : {"lang":"de"}
} ]
}
}

However if I search for lang:it:
curl -XGET http://s445.gfsrv.net:9200/test/test/_search?pretty -d '
{
"query": {
"query_string": {
"query": "lang:it"
}
}
}'

There is no result:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

If I search for missing field lang:
curl -XGET http://s445.gfsrv.net:9200/test2/test/_search?pretty -d
'{"query":{"filtered":{"filter":{"missing":{"field":"lang"}}}}}'

There it is:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test2",
"_type" : "test",
"_id" : "2",
"_score" : 1.0, "_source" : {"lang":"it"}
} ]
}
}

Thanks for your time and any help would be appreciated.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1bfddb52-2bbb-4e24-ab71-facbe365f45c%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBWn_%3DXqBX8U_ngHO23RbVx10eCtvoTSojY334DirhDpA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(felix.kofink) #4

Hi Ivan,

thanks for your help.
So as far as I understand I have to do some reindexing now -> this will
take a while.

Thanks again and I will do some RTFMing to repent of not RTFMing enough :wink:

Cheers

Felix

Am Dienstag, 11. Februar 2014 18:05:38 UTC+1 schrieb Ivan Brusic:

Actually in your case, your search terms probably do not need to be
analyzed at all since you are not executing full-text searches on that
field. Try setting the field as non_analyzed and use a term query (which
does not analyze search terms). Better yet, using a term filter since
filters are faster, can be cached and do not influence scoring.

Cheers,

Ivan

On Tue, Feb 11, 2014 at 9:00 AM, Ivan Brusic <iv...@brusic.com<javascript:>

wrote:

Very rookie problem. :slight_smile:

The default (aka standard) analyzer uses a stopword filter and "it" is a
stopword. Try configuring your field with a custom analyzer which does not
use stopwords or a custom set of stopwords.

Cheers,

Ivan

On Tue, Feb 11, 2014 at 7:57 AM, <felix....@gameforge.de <javascript:>>wrote:

Hi,

this might be a rookie problem since I'm very new to elasticsearch.

I'm trying to put JSON documents into elasticsearch with a field "lang".
However if "lang" is set to "it" elasticsearch doesn't seem to recognize
the field since it's only returned when I filter for missing fields.
The problem can be recreated very simple:

curl -XPUT 'http://localhost:9200/test/test/1' -d '{"lang":"de"}'
curl -XPUT 'http://localhost:9200/test/test/2' -d '{"lang":"it"}'

If I try to search for lang:de:
curl -XGET http://s445.gfsrv.net:9200/test/test/_search?pretty -d
'
{
"query": {
"query_string": {
"query": "lang:de"
}
}

I get a result:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 0.30685282, "_source" : {"lang":"de"}
} ]
}
}

However if I search for lang:it:
curl -XGET http://s445.gfsrv.net:9200/test/test/_search?pretty -d '
{
"query": {
"query_string": {
"query": "lang:it"
}
}
}'

There is no result:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

If I search for missing field lang:
curl -XGET http://s445.gfsrv.net:9200/test2/test/_search?pretty -d
'{"query":{"filtered":{"filter":{"missing":{"field":"lang"}}}}}'

There it is:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test2",
"_type" : "test",
"_id" : "2",
"_score" : 1.0, "_source" : {"lang":"it"}
} ]
}
}

Thanks for your time and any help would be appreciated.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1bfddb52-2bbb-4e24-ab71-facbe365f45c%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/937117d4-9c54-42e3-955b-b71757396d39%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5