Missing 75k documents due to exceptions

Here's what I did:

  • Set up two EC2 instances with S3 as the Gateway
  • Created an index with 10 shards and 2 replicas
  • Queued up 100k documents to be sent to ElasticSearch

This resulted in about 41k (or more) exceptions that all look a bit
like this ...

[00:50:21,110][DEBUG][action.index ] [Urich, Ben]
[spokeprofile][1], Node[fedc7167-6969-450f-ae2a-6c1cd5d5ea34], [P],
S[S TARTED]: Failed to execute [[spokeprofile][person][103655324],
source[{
"name": "John Doe",
"title": "Purchasing Manager", "companyName": "Applied Signal
Technology, Inc.",
"companySize": 750, "revenue": 170375000,
"urls": [ "http://www.appsig.com/",
"http://www.appsig.com"
],
"sicCodes": [
"3812", "4813",
"5065",
"8742",
"8748",
"3669"
],
"cities": [
"Sunnyvale"
],
"states": [
"CA"
],
"businessSummary": "communications equipment nec (mfrs)"
}]]
org.elasticsearch.ElasticSearchIllegalStateException: Can't handle
serializing a dynamic type with content token [END_ARRAY] and fiel
d name [null]
at
org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue(XContentObjectMapper.java:
456)
at
org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContentObjectMapper.java:
328)
at
org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue(XContentObjectMapper.java:
396)
at
org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeArray(XContentObjectMapper.java:
388)
at
org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContentObjectMapper.java:
322)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
320)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
272)
at
org.elasticsearch.index.shard.service.InternalIndexShard.innerIndex(InternalIndexShard.java:
236) at
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:
228)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:
125)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:
56)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.performOnP
rimary(TransportShardReplicationOperationAction.java:328)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.access$400
(TransportShardReplicationOperationAction.java:198)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction$1.run(Tran
sportShardReplicationOperationAction.java:252)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

I have taken the mentioned json in the exception, and I do manage to index
it without exception. It might be a problem with the dynamic introduction of
types in elasticsearch. In general, elasticsearch introduces types
dynamically (if no explicit mappings are set for them), and once a type is
set (arrays are not considered types, they are automatically handled), it
can't be changed (for example, change a field from integer to string).

It seems like probably the set of json indexed caused elasticsearch to a
state where it throws this exception. Do all types share the same structure
for the same json fields? Is there a chance that you can recreate this
problem (using simple curl)? You would probably need to try and index
several jsons until you reach it (since, for me, indexing just this one is
fine). Don't worry about trying to nail down which one is the problematic,
just recreate it with a set of json to index, and I will find the problem.

-shay.banon

On Sat, Jul 17, 2010 at 7:11 AM, David Jensen djensen47@gmail.com wrote:

Here's what I did:

  • Set up two EC2 instances with S3 as the Gateway
  • Created an index with 10 shards and 2 replicas
  • Queued up 100k documents to be sent to Elasticsearch

This resulted in about 41k (or more) exceptions that all look a bit
like this ...

[00:50:21,110][DEBUG][action.index ] [Urich, Ben]
[spokeprofile][1], Node[fedc7167-6969-450f-ae2a-6c1cd5d5ea34], [P],
S[S TARTED]: Failed to execute [[spokeprofile][person][103655324],
source[{
"name": "John Doe",
"title": "Purchasing Manager", "companyName": "Applied Signal
Technology, Inc.",
"companySize": 750, "revenue": 170375000,
"urls": [ "http://www.appsig.com/",
"http://www.appsig.com"
],
"sicCodes": [
"3812", "4813",
"5065",
"8742",
"8748",
"3669"
],
"cities": [
"Sunnyvale"
],
"states": [
"CA"
],
"businessSummary": "communications equipment nec (mfrs)"
}]]
org.elasticsearch.ElasticSearchIllegalStateException: Can't handle
serializing a dynamic type with content token [END_ARRAY] and fiel
d name [null]
at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue(XContentObjectMapper.java:
456)
at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContentObjectMapper.java:
328)
at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue(XContentObjectMapper.java:
396)
at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeArray(XContentObjectMapper.java:
388)
at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContentObjectMapper.java:
322)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
320)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
272)
at

org.elasticsearch.index.shard.service.InternalIndexShard.innerIndex(InternalIndexShard.java:
236) at

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:
228)
at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:
125)
at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:
56)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.performOnP
rimary(TransportShardReplicationOperationAction.java:328)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.access$400
(TransportShardReplicationOperationAction.java:198)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction$1.run(Tran
sportShardReplicationOperationAction.java:252)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

So it seems that this particular record is missing a field, a date
field, which the other records have. I was under the impression that
the schema-free nature of ES would allow this type of record to go
through. Or is my speculation not correct?

On Jul 16, 9:11 pm, David Jensen djense...@gmail.com wrote:

Here's what I did:

  • Set up two EC2 instances with S3 as the Gateway
  • Created an index with 10 shards and 2 replicas
  • Queued up 100k documents to be sent to Elasticsearch

This resulted in about 41k (or more) exceptions that all look a bit
like this ...

[00:50:21,110][DEBUG][action.index ] [Urich, Ben]
[spokeprofile][1], Node[fedc7167-6969-450f-ae2a-6c1cd5d5ea34], [P],
S[S TARTED]: Failed to execute [[spokeprofile][person][103655324],
source[{
"name": "John Doe",
"title": "Purchasing Manager", "companyName": "Applied Signal
Technology, Inc.",
"companySize": 750, "revenue": 170375000,
"urls": [ "http://www.appsig.com/",
"http://www.appsig.com"
],
"sicCodes": [
"3812", "4813",
"5065",
"8742",
"8748",
"3669"
],
"cities": [
"Sunnyvale"
],
"states": [
"CA"
],
"businessSummary": "communications equipment nec (mfrs)"}]]

org.elasticsearch.ElasticSearchIllegalStateException:Can'thandleserializingadynamictypewithcontenttoken[END_ARRAY] and fiel
d name [null]
at
org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue (XContentObjectMapper.java:
456)
at
org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContent ObjectMapper.java:
328)
at
org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue (XContentObjectMapper.java:
396)
at
org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeArray (XContentObjectMapper.java:
388)
at
org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContent ObjectMapper.java:
322)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:
320)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:
272)
at
org.elasticsearch.index.shard.service.InternalIndexShard.innerIndex(Interna lIndexShard.java:
236) at
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalInde xShard.java:
228)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary (TransportIndexAction.java:
125)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary (TransportIndexAction.java:
56)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$AsyncShardOperationAction.performOnP
rimary(TransportShardReplicationOperationAction.java:328)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$AsyncShardOperationAction.access$400
(TransportShardReplicationOperationAction.java:198)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$AsyncShardOperationAction$1.run(Tran
sportShardReplicationOperationAction.java:252)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

You are correct, a missed field is not something that should break indexing.
Is there a chance for a recreation?

On Tue, Jul 20, 2010 at 2:30 AM, David Jensen djensen47@gmail.com wrote:

So it seems that this particular record is missing a field, a date
field, which the other records have. I was under the impression that
the schema-free nature of ES would allow this type of record to go
through. Or is my speculation not correct?

On Jul 16, 9:11 pm, David Jensen djense...@gmail.com wrote:

Here's what I did:

  • Set up two EC2 instances with S3 as the Gateway
  • Created an index with 10 shards and 2 replicas
  • Queued up 100k documents to be sent to Elasticsearch

This resulted in about 41k (or more) exceptions that all look a bit
like this ...

[00:50:21,110][DEBUG][action.index ] [Urich, Ben]
[spokeprofile][1], Node[fedc7167-6969-450f-ae2a-6c1cd5d5ea34], [P],
S[S TARTED]: Failed to execute [[spokeprofile][person][103655324],
source[{
"name": "John Doe",
"title": "Purchasing Manager", "companyName": "Applied Signal
Technology, Inc.",
"companySize": 750, "revenue": 170375000,
"urls": [ "http://www.appsig.com/",
"http://www.appsig.com"
],
"sicCodes": [
"3812", "4813",
"5065",
"8742",
"8748",
"3669"
],
"cities": [
"Sunnyvale"
],
"states": [
"CA"
],
"businessSummary": "communications equipment nec (mfrs)"}]]

org.elasticsearch.ElasticSearchIllegalStateException:Can'thandleserializingadynamictypewithcontenttoken[END_ARRAY]
and fiel

d name [null]
at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue
(XContentObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContent
ObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue
(XContentObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeArray
(XContentObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContent
ObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte
ntDocumentMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte
ntDocumentMapper.java:

  1. at

org.elasticsearch.index.shard.service.InternalIndexShard.innerIndex(Interna
lIndexShard.java:

  1.    at
    

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalInde
xShard.java:

  1. at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary
(TransportIndexAction.java:

  1. at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary
(TransportIndexAction.java:

  1. at
    

org.elasticsearch.action.support.replication.TransportShardReplicationOpera
tionAction

$AsyncShardOperationAction.performOnP
rimary(TransportShardReplicationOperationAction.java:328)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOpera
tionAction

$AsyncShardOperationAction.access$400
(TransportShardReplicationOperationAction.java:198)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOpera
tionAction

$AsyncShardOperationAction$1.run(Tran
sportShardReplicationOperationAction.java:252)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

I can try.

I need to finish off the test I'm working on now. I'm loading millions
of records for the next few days, when I successfully finish that
test. I can spin up some new instances and try to reproduce again.
It's going to be difficult to find the record that possibly screwed
everything up. :-\

On Jul 20, 12:29 am, Shay Banon shay.ba...@elasticsearch.com wrote:

You are correct, a missed field is not something that should break indexing.
Is there a chance for a recreation?

On Tue, Jul 20, 2010 at 2:30 AM, David Jensen djense...@gmail.com wrote:

So it seems that this particular record is missing a field, a date
field, which the other records have. I was under the impression that
the schema-free nature of ES would allow this type of record to go
through. Or is my speculation not correct?

On Jul 16, 9:11 pm, David Jensen djense...@gmail.com wrote:

Here's what I did:

  • Set up two EC2 instances with S3 as the Gateway
  • Created an index with 10 shards and 2 replicas
  • Queued up 100k documents to be sent to Elasticsearch

This resulted in about 41k (or more) exceptions that all look a bit
like this ...

[00:50:21,110][DEBUG][action.index ] [Urich, Ben]
[spokeprofile][1], Node[fedc7167-6969-450f-ae2a-6c1cd5d5ea34], [P],
S[S TARTED]: Failed to execute [[spokeprofile][person][103655324],
source[{
"name": "John Doe",
"title": "Purchasing Manager", "companyName": "Applied Signal
Technology, Inc.",
"companySize": 750, "revenue": 170375000,
"urls": [ "http://www.appsig.com/",
"http://www.appsig.com"
],
"sicCodes": [
"3812", "4813",
"5065",
"8742",
"8748",
"3669"
],
"cities": [
"Sunnyvale"
],
"states": [
"CA"
],
"businessSummary": "communications equipment nec (mfrs)"}]]

org.elasticsearch.ElasticSearchIllegalStateException:Can'thandleserializing adynamictypewithcontenttoken[END_ARRAY]
and fiel

d name [null]
at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue
(XContentObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContent
ObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue
(XContentObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeArray
(XContentObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContent
ObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte
ntDocumentMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte
ntDocumentMapper.java:

  1. at

org.elasticsearch.index.shard.service.InternalIndexShard.innerIndex(Interna
lIndexShard.java:

  1.    at
    

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalInde
xShard.java:

  1. at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary
(TransportIndexAction.java:

  1. at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary
(TransportIndexAction.java:

  1. at
    

org.elasticsearch.action.support.replication.TransportShardReplicationOpera
tionAction

$AsyncShardOperationAction.performOnP
rimary(TransportShardReplicationOperationAction.java:328)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOpera
tionAction

$AsyncShardOperationAction.access$400
(TransportShardReplicationOperationAction.java:198)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOpera
tionAction

$AsyncShardOperationAction$1.run(Tran
sportShardReplicationOperationAction.java:252)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Yea, I can imagine its difficult, if you do manage it would be great.

-shay.banon

On Tue, Jul 20, 2010 at 8:51 PM, David Jensen djensen47@gmail.com wrote:

I can try.

I need to finish off the test I'm working on now. I'm loading millions
of records for the next few days, when I successfully finish that
test. I can spin up some new instances and try to reproduce again.
It's going to be difficult to find the record that possibly screwed
everything up. :-\

On Jul 20, 12:29 am, Shay Banon shay.ba...@elasticsearch.com wrote:

You are correct, a missed field is not something that should break
indexing.
Is there a chance for a recreation?

On Tue, Jul 20, 2010 at 2:30 AM, David Jensen djense...@gmail.com
wrote:

So it seems that this particular record is missing a field, a date
field, which the other records have. I was under the impression that
the schema-free nature of ES would allow this type of record to go
through. Or is my speculation not correct?

On Jul 16, 9:11 pm, David Jensen djense...@gmail.com wrote:

Here's what I did:

  • Set up two EC2 instances with S3 as the Gateway
  • Created an index with 10 shards and 2 replicas
  • Queued up 100k documents to be sent to Elasticsearch

This resulted in about 41k (or more) exceptions that all look a bit
like this ...

[00:50:21,110][DEBUG][action.index ] [Urich, Ben]
[spokeprofile][1], Node[fedc7167-6969-450f-ae2a-6c1cd5d5ea34], [P],
S[S TARTED]: Failed to execute [[spokeprofile][person][103655324],
source[{
"name": "John Doe",
"title": "Purchasing Manager", "companyName": "Applied Signal
Technology, Inc.",
"companySize": 750, "revenue": 170375000,
"urls": [ "http://www.appsig.com/",
"http://www.appsig.com"
],
"sicCodes": [
"3812", "4813",
"5065",
"8742",
"8748",
"3669"
],
"cities": [
"Sunnyvale"
],
"states": [
"CA"
],
"businessSummary": "communications equipment nec (mfrs)"}]]

org.elasticsearch.ElasticSearchIllegalStateException:Can'thandleserializing
adynamictypewithcontenttoken[END_ARRAY]

and fiel

d name [null]
at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue

(XContentObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContent

ObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeValue

(XContentObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.serializeArray

(XContentObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentObjectMapper.parse(XContent

ObjectMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte

ntDocumentMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte

ntDocumentMapper.java:

  1. at

org.elasticsearch.index.shard.service.InternalIndexShard.innerIndex(Interna

lIndexShard.java:

  1.    at
    

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalInde

xShard.java:

  1. at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary

(TransportIndexAction.java:

  1. at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary

(TransportIndexAction.java:

  1. at
    

org.elasticsearch.action.support.replication.TransportShardReplicationOpera

tionAction

$AsyncShardOperationAction.performOnP
rimary(TransportShardReplicationOperationAction.java:328)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOpera

tionAction

$AsyncShardOperationAction.access$400
(TransportShardReplicationOperationAction.java:198)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOpera

tionAction

$AsyncShardOperationAction$1.run(Tran
sportShardReplicationOperationAction.java:252)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)