Getting RemoteTransportException/QueryPhaseExecutionException


(mgornik) #1

Hi,

We are using ES for half a year, and we are very pleased with how it's been
working. We have 3 servers running sharded ES configuration. We have two
mappings for two different document types. Recently, we dropped one of the
mappings and then later, added it back. We think that the problem we are
facing started since that single mapping was recreated (although we aren't
100% sure). We were hoping you can help us with this error we are
experiencing. When running a facet query, in the following form:

'{"facets":{"bounce_types":{"terms":{"field":"BT","size":100},"facet_filter":{"term":{"SID":50197}}}}}'

We get shard failures that contain two types of exceptions:
RemoteTransportException and QueryPhaseExecutionException. Here is how they
look:

{
"took" : 1120,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 3,
"failed" : 7,
"failures" : [ {
"index" : "myapp_sharded_prod",
"shard" : 1,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][1]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 0,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][0]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 3,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][3]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 9,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][9]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 8,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][8]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 4,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][4]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 7,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][7]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
} ]
},
"hits" : {
"total" : 37387748,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"bounce_types" : {
"_type" : "terms",
"missing" : 2600,
"total" : 95,
"other" : 0,
"terms" : [ {
"term" : 1,
"count" : 92
}, {
"term" : 64,
"count" : 2
}, {
"term" : 256,
"count" : 1
} ]
}
}

Can you provide us with any details on why these might be happening? Have
we put bad mapping back into ES, or is this something server-related (ES
version, JVM version)? I don't know if it is relevant, but we have two
mappings there and one was intact and the other one was recreated. All the
fields in mappings are present under the same names in both of them. If any
other info could help, please let me know and I'll post details right away.

Thanks a lot for your time!
Milan Gornik

--


(Igor Motov) #2

Hi,

I suspect that this error occurs because the "BT" field is defined as
"string" in the newly added mapping and as "integer" in the old one. If
this is not the case, could you post the mappings for both types here?

Igor

On Friday, October 26, 2012 10:10:06 AM UTC-4, mgornik wrote:

Hi,

We are using ES for half a year, and we are very pleased with how it's
been working. We have 3 servers running sharded ES configuration. We have
two mappings for two different document types. Recently, we dropped one of
the mappings and then later, added it back. We think that the problem we
are facing started since that single mapping was recreated (although we
aren't 100% sure). We were hoping you can help us with this error we are
experiencing. When running a facet query, in the following form:

'{"facets":{"bounce_types":{"terms":{"field":"BT","size":100},"facet_filter":{"term":{"SID":50197}}}}}'

We get shard failures that contain two types of exceptions:
RemoteTransportException and QueryPhaseExecutionException. Here is how they
look:

{
"took" : 1120,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 3,
"failed" : 7,
"failures" : [ {
"index" : "myapp_sharded_prod",
"shard" : 1,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][1]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 0,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][0]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 3,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][3]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 9,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][9]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 8,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][8]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 4,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][4]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 7,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][7]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
} ]
},
"hits" : {
"total" : 37387748,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"bounce_types" : {
"_type" : "terms",
"missing" : 2600,
"total" : 95,
"other" : 0,
"terms" : [ {
"term" : 1,
"count" : 92
}, {
"term" : 64,
"count" : 2
}, {
"term" : 256,
"count" : 1
} ]
}
}

Can you provide us with any details on why these might be happening? Have
we put bad mapping back into ES, or is this something server-related (ES
version, JVM version)? I don't know if it is relevant, but we have two
mappings there and one was intact and the other one was recreated. All the
fields in mappings are present under the same names in both of them. If any
other info could help, please let me know and I'll post details right away.

Thanks a lot for your time!
Milan Gornik

--


(JP Toto) #3

Igor,

Funny thing is that BT is mapped as "integer" and ES reports that back
correctly. We imported a bunch of data to the cluster with a python script
but I don't know that ES would have accept incorrect data to that field. It
would have rejected the document.

There are lots of instances where BT is null, by design. Would that be an
issue? (I'm working on this issue with Milan, above)

On Friday, October 26, 2012 12:57:51 PM UTC-4, Igor Motov wrote:

Hi,

I suspect that this error occurs because the "BT" field is defined as
"string" in the newly added mapping and as "integer" in the old one. If
this is not the case, could you post the mappings for both types here?

Igor

On Friday, October 26, 2012 10:10:06 AM UTC-4, mgornik wrote:

Hi,

We are using ES for half a year, and we are very pleased with how it's
been working. We have 3 servers running sharded ES configuration. We have
two mappings for two different document types. Recently, we dropped one of
the mappings and then later, added it back. We think that the problem we
are facing started since that single mapping was recreated (although we
aren't 100% sure). We were hoping you can help us with this error we are
experiencing. When running a facet query, in the following form:

'{"facets":{"bounce_types":{"terms":{"field":"BT","size":100},"facet_filter":{"term":{"SID":50197}}}}}'

We get shard failures that contain two types of exceptions:
RemoteTransportException and QueryPhaseExecutionException. Here is how they
look:

{
"took" : 1120,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 3,
"failed" : 7,
"failures" : [ {
"index" : "myapp_sharded_prod",
"shard" : 1,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][1]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 0,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][0]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 3,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][3]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 9,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][9]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 8,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][8]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 4,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][4]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 7,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][7]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
} ]
},
"hits" : {
"total" : 37387748,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"bounce_types" : {
"_type" : "terms",
"missing" : 2600,
"total" : 95,
"other" : 0,
"terms" : [ {
"term" : 1,
"count" : 92
}, {
"term" : 64,
"count" : 2
}, {
"term" : 256,
"count" : 1
} ]
}
}

Can you provide us with any details on why these might be happening? Have
we put bad mapping back into ES, or is this something server-related (ES
version, JVM version)? I don't know if it is relevant, but we have two
mappings there and one was intact and the other one was recreated. All the
fields in mappings are present under the same names in both of them. If any
other info could help, please let me know and I'll post details right away.

Thanks a lot for your time!
Milan Gornik

--


(Igor Motov) #4

So, here is another theory. The second type had "string" mapping and some
data was indexed into it as strings. Then this type was deleted, and
recreated properly, but string data is still in the index. We can verify
this theory by trying to get rid of these records. Could you run this
command to see if it will help:

curl -XPOST
'http://localhost:9200/myapp_sharded_prod/_optimize?only_expunge_deletes=true'

On Friday, October 26, 2012 2:38:35 PM UTC-4, JP Toto wrote:

Igor,

Funny thing is that BT is mapped as "integer" and ES reports that back
correctly. We imported a bunch of data to the cluster with a python script
but I don't know that ES would have accept incorrect data to that field. It
would have rejected the document.

There are lots of instances where BT is null, by design. Would that be an
issue? (I'm working on this issue with Milan, above)

On Friday, October 26, 2012 12:57:51 PM UTC-4, Igor Motov wrote:

Hi,

I suspect that this error occurs because the "BT" field is defined as
"string" in the newly added mapping and as "integer" in the old one. If
this is not the case, could you post the mappings for both types here?

Igor

On Friday, October 26, 2012 10:10:06 AM UTC-4, mgornik wrote:

Hi,

We are using ES for half a year, and we are very pleased with how it's
been working. We have 3 servers running sharded ES configuration. We have
two mappings for two different document types. Recently, we dropped one of
the mappings and then later, added it back. We think that the problem we
are facing started since that single mapping was recreated (although we
aren't 100% sure). We were hoping you can help us with this error we are
experiencing. When running a facet query, in the following form:

'{"facets":{"bounce_types":{"terms":{"field":"BT","size":100},"facet_filter":{"term":{"SID":50197}}}}}'

We get shard failures that contain two types of exceptions:
RemoteTransportException and QueryPhaseExecutionException. Here is how they
look:

{
"took" : 1120,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 3,
"failed" : 7,
"failures" : [ {
"index" : "myapp_sharded_prod",
"shard" : 1,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][1]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 0,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][0]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 3,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][3]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 9,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][9]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 8,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][8]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 4,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][4]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 7,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][7]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
} ]
},
"hits" : {
"total" : 37387748,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"bounce_types" : {
"_type" : "terms",
"missing" : 2600,
"total" : 95,
"other" : 0,
"terms" : [ {
"term" : 1,
"count" : 92
}, {
"term" : 64,
"count" : 2
}, {
"term" : 256,
"count" : 1
} ]
}
}

Can you provide us with any details on why these might be happening?
Have we put bad mapping back into ES, or is this something server-related
(ES version, JVM version)? I don't know if it is relevant, but we have two
mappings there and one was intact and the other one was recreated. All the
fields in mappings are present under the same names in both of them. If any
other info could help, please let me know and I'll post details right away.

Thanks a lot for your time!
Milan Gornik

--


(mgornik) #5

Hi Igor,

Thank you very much for your reply. I don't know ES too much, but looking
at what happened here, I would say your clue is correct. Please, tell me,
if we run _optimize on live system, will it cause ES to stop responding, or
just performance degradation, or maybe none of these things? Our current
configuration is 3 nodes with size: 79.9gb and

docs: {
num_docs: 91815179
max_doc: 108254990
deleted_docs: 16439811
}

Regards,
Milan

On Saturday, October 27, 2012 12:09:43 AM UTC+2, Igor Motov wrote:

So, here is another theory. The second type had "string" mapping and some
data was indexed into it as strings. Then this type was deleted, and
recreated properly, but string data is still in the index. We can verify
this theory by trying to get rid of these records. Could you run this
command to see if it will help:

curl -XPOST '
http://localhost:9200/myapp_sharded_prod/_optimize?only_expunge_deletes=true
'

On Friday, October 26, 2012 2:38:35 PM UTC-4, JP Toto wrote:

Igor,

Funny thing is that BT is mapped as "integer" and ES reports that back
correctly. We imported a bunch of data to the cluster with a python script
but I don't know that ES would have accept incorrect data to that field. It
would have rejected the document.

There are lots of instances where BT is null, by design. Would that be an
issue? (I'm working on this issue with Milan, above)

On Friday, October 26, 2012 12:57:51 PM UTC-4, Igor Motov wrote:

Hi,

I suspect that this error occurs because the "BT" field is defined as
"string" in the newly added mapping and as "integer" in the old one. If
this is not the case, could you post the mappings for both types here?

Igor

On Friday, October 26, 2012 10:10:06 AM UTC-4, mgornik wrote:

Hi,

We are using ES for half a year, and we are very pleased with how it's
been working. We have 3 servers running sharded ES configuration. We have
two mappings for two different document types. Recently, we dropped one of
the mappings and then later, added it back. We think that the problem we
are facing started since that single mapping was recreated (although we
aren't 100% sure). We were hoping you can help us with this error we are
experiencing. When running a facet query, in the following form:

'{"facets":{"bounce_types":{"terms":{"field":"BT","size":100},"facet_filter":{"term":{"SID":50197}}}}}'

We get shard failures that contain two types of exceptions:
RemoteTransportException and QueryPhaseExecutionException. Here is how they
look:

{
"took" : 1120,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 3,
"failed" : 7,
"failures" : [ {
"index" : "myapp_sharded_prod",
"shard" : 1,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][1]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 0,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][0]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 3,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][3]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 9,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][9]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 8,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][8]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 4,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][4]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 7,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[myapp_sharded_prod][7]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
} ]
},
"hits" : {
"total" : 37387748,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"bounce_types" : {
"_type" : "terms",
"missing" : 2600,
"total" : 95,
"other" : 0,
"terms" : [ {
"term" : 1,
"count" : 92
}, {
"term" : 64,
"count" : 2
}, {
"term" : 256,
"count" : 1
} ]
}
}

Can you provide us with any details on why these might be happening?
Have we put bad mapping back into ES, or is this something server-related
(ES version, JVM version)? I don't know if it is relevant, but we have two
mappings there and one was intact and the other one was recreated. All the
fields in mappings are present under the same names in both of them. If any
other info could help, please let me know and I'll post details right away.

Thanks a lot for your time!
Milan Gornik

--


(Igor Motov) #6

Running _optimize shouldn't cause ES to stop responding, but it will put
some additional load on the system since it will force merging of the
segments that have deleted records. The merge process happens automatically
on portions of the index on a regular basis as index grows, calling
_optimize just forces it to occur immediately and for all segments with
deleted records.

On Saturday, October 27, 2012 5:04:44 AM UTC-4, mgornik wrote:

Hi Igor,

Thank you very much for your reply. I don't know ES too much, but looking
at what happened here, I would say your clue is correct. Please, tell me,
if we run _optimize on live system, will it cause ES to stop responding, or
just performance degradation, or maybe none of these things? Our current
configuration is 3 nodes with size: 79.9gb and

docs: {
num_docs: 91815179
max_doc: 108254990
deleted_docs: 16439811
}

Regards,
Milan

On Saturday, October 27, 2012 12:09:43 AM UTC+2, Igor Motov wrote:

So, here is another theory. The second type had "string" mapping and some
data was indexed into it as strings. Then this type was deleted, and
recreated properly, but string data is still in the index. We can verify
this theory by trying to get rid of these records. Could you run this
command to see if it will help:

curl -XPOST '
http://localhost:9200/myapp_sharded_prod/_optimize?only_expunge_deletes=true
'

On Friday, October 26, 2012 2:38:35 PM UTC-4, JP Toto wrote:

Igor,

Funny thing is that BT is mapped as "integer" and ES reports that back
correctly. We imported a bunch of data to the cluster with a python script
but I don't know that ES would have accept incorrect data to that field. It
would have rejected the document.

There are lots of instances where BT is null, by design. Would that be
an issue? (I'm working on this issue with Milan, above)

On Friday, October 26, 2012 12:57:51 PM UTC-4, Igor Motov wrote:

Hi,

I suspect that this error occurs because the "BT" field is defined as
"string" in the newly added mapping and as "integer" in the old one. If
this is not the case, could you post the mappings for both types here?

Igor

On Friday, October 26, 2012 10:10:06 AM UTC-4, mgornik wrote:

Hi,

We are using ES for half a year, and we are very pleased with how it's
been working. We have 3 servers running sharded ES configuration. We have
two mappings for two different document types. Recently, we dropped one of
the mappings and then later, added it back. We think that the problem we
are facing started since that single mapping was recreated (although we
aren't 100% sure). We were hoping you can help us with this error we are
experiencing. When running a facet query, in the following form:

'{"facets":{"bounce_types":{"terms":{"field":"BT","size":100},"facet_filter":{"term":{"SID":50197}}}}}'

We get shard failures that contain two types of exceptions:
RemoteTransportException and QueryPhaseExecutionException. Here is how they
look:

{
"took" : 1120,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 3,
"failed" : 7,
"failures" : [ {
"index" : "myapp_sharded_prod",
"shard" : 1,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][1]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 0,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][0]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 3,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][3]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 9,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][9]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 8,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][8]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 4,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][4]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 7,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][7]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
} ]
},
"hits" : {
"total" : 37387748,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"bounce_types" : {
"_type" : "terms",
"missing" : 2600,
"total" : 95,
"other" : 0,
"terms" : [ {
"term" : 1,
"count" : 92
}, {
"term" : 64,
"count" : 2
}, {
"term" : 256,
"count" : 1
} ]
}
}

Can you provide us with any details on why these might be happening?
Have we put bad mapping back into ES, or is this something server-related
(ES version, JVM version)? I don't know if it is relevant, but we have two
mappings there and one was intact and the other one was recreated. All the
fields in mappings are present under the same names in both of them. If any
other info could help, please let me know and I'll post details right away.

Thanks a lot for your time!
Milan Gornik

--


(mgornik) #7

We just run this and it seems to have repaired part of the data. For some
servers (SID parameter in that query), that query works now. However, the
same query is still throwing errors for some other SIDs.

We tracked count of deleted_docs we see. They reduced from that initial
count (16439811) to 839871. We don't see it going further down with that
number.

Is there anything else we might try? Any chance to check if actual mappings
of first and second types are different? ES reports the same mappings when
we query _mapping endpoint. Any chance to query somehow to see if the data
is not matching the expected mapping? E.g. any way to see if some records
have that BT field with invalid (non-integer) values?

Thanks,
Milan

On Saturday, October 27, 2012 1:51:35 PM UTC+2, Igor Motov wrote:

Running _optimize shouldn't cause ES to stop responding, but it will put
some additional load on the system since it will force merging of the
segments that have deleted records. The merge process happens automatically
on portions of the index on a regular basis as index grows, calling
_optimize just forces it to occur immediately and for all segments with
deleted records.

On Saturday, October 27, 2012 5:04:44 AM UTC-4, mgornik wrote:

Hi Igor,

Thank you very much for your reply. I don't know ES too much, but looking
at what happened here, I would say your clue is correct. Please, tell me,
if we run _optimize on live system, will it cause ES to stop responding, or
just performance degradation, or maybe none of these things? Our current
configuration is 3 nodes with size: 79.9gb and

docs: {
num_docs: 91815179
max_doc: 108254990
deleted_docs: 16439811
}

Regards,
Milan

On Saturday, October 27, 2012 12:09:43 AM UTC+2, Igor Motov wrote:

So, here is another theory. The second type had "string" mapping and
some data was indexed into it as strings. Then this type was deleted, and
recreated properly, but string data is still in the index. We can verify
this theory by trying to get rid of these records. Could you run this
command to see if it will help:

curl -XPOST '
http://localhost:9200/myapp_sharded_prod/_optimize?only_expunge_deletes=true
'

On Friday, October 26, 2012 2:38:35 PM UTC-4, JP Toto wrote:

Igor,

Funny thing is that BT is mapped as "integer" and ES reports that back
correctly. We imported a bunch of data to the cluster with a python script
but I don't know that ES would have accept incorrect data to that field. It
would have rejected the document.

There are lots of instances where BT is null, by design. Would that be
an issue? (I'm working on this issue with Milan, above)

On Friday, October 26, 2012 12:57:51 PM UTC-4, Igor Motov wrote:

Hi,

I suspect that this error occurs because the "BT" field is defined as
"string" in the newly added mapping and as "integer" in the old one. If
this is not the case, could you post the mappings for both types here?

Igor

On Friday, October 26, 2012 10:10:06 AM UTC-4, mgornik wrote:

Hi,

We are using ES for half a year, and we are very pleased with how
it's been working. We have 3 servers running sharded ES configuration. We
have two mappings for two different document types. Recently, we dropped
one of the mappings and then later, added it back. We think that the
problem we are facing started since that single mapping was recreated
(although we aren't 100% sure). We were hoping you can help us with this
error we are experiencing. When running a facet query, in the following
form:

'{"facets":{"bounce_types":{"terms":{"field":"BT","size":100},"facet_filter":{"term":{"SID":50197}}}}}'

We get shard failures that contain two types of exceptions:
RemoteTransportException and QueryPhaseExecutionException. Here is how they
look:

{
"took" : 1120,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 3,
"failed" : 7,
"failures" : [ {
"index" : "myapp_sharded_prod",
"shard" : 1,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][1]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 0,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][0]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 3,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][3]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 9,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][9]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 8,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][8]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 4,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][4]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 7,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][7]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
} ]
},
"hits" : {
"total" : 37387748,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"bounce_types" : {
"_type" : "terms",
"missing" : 2600,
"total" : 95,
"other" : 0,
"terms" : [ {
"term" : 1,
"count" : 92
}, {
"term" : 64,
"count" : 2
}, {
"term" : 256,
"count" : 1
} ]
}
}

Can you provide us with any details on why these might be happening?
Have we put bad mapping back into ES, or is this something server-related
(ES version, JVM version)? I don't know if it is relevant, but we have two
mappings there and one was intact and the other one was recreated. All the
fields in mappings are present under the same names in both of them. If any
other info could help, please let me know and I'll post details right away.

Thanks a lot for your time!
Milan Gornik

--


(Igor Motov) #8

I saw something like this before. I am not quite sure what's causing it.
You can try running _optimize several times to see if it helps. The next
thing to try would be full index optimization:

curl -XPOST
'http://localhost:9200/myapp_sharded_prod/_optimize?max_num_segments=1'

I don't think there is a way to see a deleted record using Elasticsearch.
You will have to use some lucene-level tools like Luke to find and retrieve
deleted records.

On Saturday, October 27, 2012 3:52:31 PM UTC-4, mgornik wrote:

We just run this and it seems to have repaired part of the data. For some
servers (SID parameter in that query), that query works now. However, the
same query is still throwing errors for some other SIDs.

We tracked count of deleted_docs we see. They reduced from that initial
count (16439811) to 839871. We don't see it going further down with that
number.

Is there anything else we might try? Any chance to check if actual
mappings of first and second types are different? ES reports the same
mappings when we query _mapping endpoint. Any chance to query somehow to
see if the data is not matching the expected mapping? E.g. any way to see
if some records have that BT field with invalid (non-integer) values?

Thanks,
Milan

On Saturday, October 27, 2012 1:51:35 PM UTC+2, Igor Motov wrote:

Running _optimize shouldn't cause ES to stop responding, but it will put
some additional load on the system since it will force merging of the
segments that have deleted records. The merge process happens automatically
on portions of the index on a regular basis as index grows, calling
_optimize just forces it to occur immediately and for all segments with
deleted records.

On Saturday, October 27, 2012 5:04:44 AM UTC-4, mgornik wrote:

Hi Igor,

Thank you very much for your reply. I don't know ES too much, but
looking at what happened here, I would say your clue is correct. Please,
tell me, if we run _optimize on live system, will it cause ES to stop
responding, or just performance degradation, or maybe none of these things?
Our current configuration is 3 nodes with size: 79.9gb and

docs: {
num_docs: 91815179
max_doc: 108254990
deleted_docs: 16439811
}

Regards,
Milan

On Saturday, October 27, 2012 12:09:43 AM UTC+2, Igor Motov wrote:

So, here is another theory. The second type had "string" mapping and
some data was indexed into it as strings. Then this type was deleted, and
recreated properly, but string data is still in the index. We can verify
this theory by trying to get rid of these records. Could you run this
command to see if it will help:

curl -XPOST '
http://localhost:9200/myapp_sharded_prod/_optimize?only_expunge_deletes=true
'

On Friday, October 26, 2012 2:38:35 PM UTC-4, JP Toto wrote:

Igor,

Funny thing is that BT is mapped as "integer" and ES reports that back
correctly. We imported a bunch of data to the cluster with a python script
but I don't know that ES would have accept incorrect data to that field. It
would have rejected the document.

There are lots of instances where BT is null, by design. Would that be
an issue? (I'm working on this issue with Milan, above)

On Friday, October 26, 2012 12:57:51 PM UTC-4, Igor Motov wrote:

Hi,

I suspect that this error occurs because the "BT" field is defined as
"string" in the newly added mapping and as "integer" in the old one. If
this is not the case, could you post the mappings for both types here?

Igor

On Friday, October 26, 2012 10:10:06 AM UTC-4, mgornik wrote:

Hi,

We are using ES for half a year, and we are very pleased with how
it's been working. We have 3 servers running sharded ES configuration. We
have two mappings for two different document types. Recently, we dropped
one of the mappings and then later, added it back. We think that the
problem we are facing started since that single mapping was recreated
(although we aren't 100% sure). We were hoping you can help us with this
error we are experiencing. When running a facet query, in the following
form:

'{"facets":{"bounce_types":{"terms":{"field":"BT","size":100},"facet_filter":{"term":{"SID":50197}}}}}'

We get shard failures that contain two types of exceptions:
RemoteTransportException and QueryPhaseExecutionException. Here is how they
look:

{
"took" : 1120,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 3,
"failed" : 7,
"failures" : [ {
"index" : "myapp_sharded_prod",
"shard" : 1,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][1]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 0,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][0]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 3,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][3]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 9,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][9]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 8,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][8]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 4,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][4]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 7,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][7]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
} ]
},
"hits" : {
"total" : 37387748,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"bounce_types" : {
"_type" : "terms",
"missing" : 2600,
"total" : 95,
"other" : 0,
"terms" : [ {
"term" : 1,
"count" : 92
}, {
"term" : 64,
"count" : 2
}, {
"term" : 256,
"count" : 1
} ]
}
}

Can you provide us with any details on why these might be happening?
Have we put bad mapping back into ES, or is this something server-related
(ES version, JVM version)? I don't know if it is relevant, but we have two
mappings there and one was intact and the other one was recreated. All the
fields in mappings are present under the same names in both of them. If any
other info could help, please let me know and I'll post details right away.

Thanks a lot for your time!
Milan Gornik

--


(mgornik) #9

Hi Igor,

We resolved the issue by rebuilding that index from scratch. _optimize
couldn't get the last batch of 100K documents purged, and seems that those
deleted documents caused the issue for us. Your help was very valuable,
thanks! With your help we were able to figure out what caused the issue
(the fact that we put mapping which was different from the one which was
already in the database).

Regards,
Milan

On Sunday, October 28, 2012 1:56:25 AM UTC+2, Igor Motov wrote:

I saw something like this before. I am not quite sure what's causing it.
You can try running _optimize several times to see if it helps. The next
thing to try would be full index optimization:

curl -XPOST '
http://localhost:9200/myapp_sharded_prod/_optimize?max_num_segments=1'

I don't think there is a way to see a deleted record using Elasticsearch.
You will have to use some lucene-level tools like Luke to find and retrieve
deleted records.

On Saturday, October 27, 2012 3:52:31 PM UTC-4, mgornik wrote:

We just run this and it seems to have repaired part of the data. For some
servers (SID parameter in that query), that query works now. However, the
same query is still throwing errors for some other SIDs.

We tracked count of deleted_docs we see. They reduced from that initial
count (16439811) to 839871. We don't see it going further down with that
number.

Is there anything else we might try? Any chance to check if actual
mappings of first and second types are different? ES reports the same
mappings when we query _mapping endpoint. Any chance to query somehow to
see if the data is not matching the expected mapping? E.g. any way to see
if some records have that BT field with invalid (non-integer) values?

Thanks,
Milan

On Saturday, October 27, 2012 1:51:35 PM UTC+2, Igor Motov wrote:

Running _optimize shouldn't cause ES to stop responding, but it will put
some additional load on the system since it will force merging of the
segments that have deleted records. The merge process happens automatically
on portions of the index on a regular basis as index grows, calling
_optimize just forces it to occur immediately and for all segments with
deleted records.

On Saturday, October 27, 2012 5:04:44 AM UTC-4, mgornik wrote:

Hi Igor,

Thank you very much for your reply. I don't know ES too much, but
looking at what happened here, I would say your clue is correct. Please,
tell me, if we run _optimize on live system, will it cause ES to stop
responding, or just performance degradation, or maybe none of these things?
Our current configuration is 3 nodes with size: 79.9gb and

docs: {
num_docs: 91815179
max_doc: 108254990
deleted_docs: 16439811
}

Regards,
Milan

On Saturday, October 27, 2012 12:09:43 AM UTC+2, Igor Motov wrote:

So, here is another theory. The second type had "string" mapping and
some data was indexed into it as strings. Then this type was deleted, and
recreated properly, but string data is still in the index. We can verify
this theory by trying to get rid of these records. Could you run this
command to see if it will help:

curl -XPOST '
http://localhost:9200/myapp_sharded_prod/_optimize?only_expunge_deletes=true
'

On Friday, October 26, 2012 2:38:35 PM UTC-4, JP Toto wrote:

Igor,

Funny thing is that BT is mapped as "integer" and ES reports that
back correctly. We imported a bunch of data to the cluster with a python
script but I don't know that ES would have accept incorrect data to that
field. It would have rejected the document.

There are lots of instances where BT is null, by design. Would that
be an issue? (I'm working on this issue with Milan, above)

On Friday, October 26, 2012 12:57:51 PM UTC-4, Igor Motov wrote:

Hi,

I suspect that this error occurs because the "BT" field is defined
as "string" in the newly added mapping and as "integer" in the old one. If
this is not the case, could you post the mappings for both types here?

Igor

On Friday, October 26, 2012 10:10:06 AM UTC-4, mgornik wrote:

Hi,

We are using ES for half a year, and we are very pleased with how
it's been working. We have 3 servers running sharded ES configuration. We
have two mappings for two different document types. Recently, we dropped
one of the mappings and then later, added it back. We think that the
problem we are facing started since that single mapping was recreated
(although we aren't 100% sure). We were hoping you can help us with this
error we are experiencing. When running a facet query, in the following
form:

'{"facets":{"bounce_types":{"terms":{"field":"BT","size":100},"facet_filter":{"term":{"SID":50197}}}}}'

We get shard failures that contain two types of exceptions:
RemoteTransportException and QueryPhaseExecutionException. Here is how they
look:

{
"took" : 1120,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 3,
"failed" : 7,
"failures" : [ {
"index" : "myapp_sharded_prod",
"shard" : 1,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][1]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 0,
"status" : 500,
"reason" :
"RemoteTransportException[[Glitch][inet[/172.30.0.214:9300]][search/phase/query]];
nested: QueryPhaseExecutionException[[myapp_sharded_prod][0]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 3,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][3]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 9,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][9]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 8,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][8]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 4,
"status" : 500,
"reason" : "RemoteTransportException[[Justin
Hammer][inet[/172.30.0.133:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[myapp_sharded_prod][4]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
}, {
"index" : "myapp_sharded_prod",
"shard" : 7,
"status" : 500,
"reason" :
"QueryPhaseExecutionException[[myapp_sharded_prod][7]:
query[ConstantScore(NotDeleted(cache(_type:DeliveryEvent)))],from[0],size[0]:
Query Failed [Failed to execute main query]]; nested:
NumberFormatException[Invalid shift value in prefixCoded string (is encoded
value really an INT?)]; "
} ]
},
"hits" : {
"total" : 37387748,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"bounce_types" : {
"_type" : "terms",
"missing" : 2600,
"total" : 95,
"other" : 0,
"terms" : [ {
"term" : 1,
"count" : 92
}, {
"term" : 64,
"count" : 2
}, {
"term" : 256,
"count" : 1
} ]
}
}

Can you provide us with any details on why these might be
happening? Have we put bad mapping back into ES, or is this something
server-related (ES version, JVM version)? I don't know if it is relevant,
but we have two mappings there and one was intact and the other one was
recreated. All the fields in mappings are present under the same names in
both of them. If any other info could help, please let me know and I'll
post details right away.

Thanks a lot for your time!
Milan Gornik

--


(system) #10