Parser error from ElasticSearch


(vineeth mohan) #1

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line and it
went tru. We are using httpClient library of apache to push data to ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" : "no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" : "no" },
"Author" : { "type" : "string" , "store" : "yes" , "include_in_all"
: "no"},
"fetchTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ] [Crimson Dynamo]
[algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P], s[STARTED]: Failed to
execute [index {[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
[Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid
UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(avasilenko) #2

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid
UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line and it
went tru. We are using httpClient library of apache to push data to ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" : "no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" : "no" },
"Author" : { "type" : "string" , "store" : "yes" , "include_in_all"
: "no"},
"fetchTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ] [Crimson
Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P], s[STARTED]: Failed
to execute [index {[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
[Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid
UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(vineeth mohan) #3

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to understand which
character is invalid UTF-8.
The interesting side is that the same feed data sent from command line using
curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko aa.vasilenko@gmail.comwrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid
UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line and it
went tru. We are using httpClient library of apache to push data to ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" : "no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" : "no" },
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ] [Crimson
Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P], s[STARTED]: Failed
to execute [index {[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
[Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid
UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(vineeth mohan) #4

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to understand
which character is invalid UTF-8.
The interesting side is that the same feed data sent from command line
using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko aa.vasilenko@gmail.comwrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid
UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line and it
went tru. We are using httpClient library of apache to push data to ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" : "no" }
,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" : "no"
},
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ] [Crimson
Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P], s[STARTED]: Failed
to execute [index {[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
[Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid
UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(avasilenko) #5

How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to understand
which character is invalid UTF-8.
The interesting side is that the same feed data sent from command line
using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko aa.vasilenko@gmail.comwrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid
UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line and it
went tru. We are using httpClient library of apache to push data to ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" : "no" }
,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" : "no"
},
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ] [Crimson
Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P], s[STARTED]: Failed
to execute [index {[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
[Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid
UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(vineeth mohan) #6

I am using org.json.JSONOBJECT to build the json and extract the string.
I simple pass a map to it which have all the required maps.

The string ES received is shown in the log. I dont see any non ASCII
character it it.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko aa.vasilenko@gmail.comwrote:

How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan <vineethmohan@algotree.com

wrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to understand
which character is invalid UTF-8.
The interesting side is that the same feed data sent from command line
using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko aa.vasilenko@gmail.comwrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line and
it went tru. We are using httpClient library of apache to push data to ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" : "no"
} ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" : "no"
},
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ] [Crimson
Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P], s[STARTED]: Failed
to execute [index {[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
[Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid
UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(avasilenko) #7

Retry, using jackson http://jackson.codehaus.org/

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

I am using org.json.JSONOBJECT to build the json and extract the string.
I simple pass a map to it which have all the required maps.

The string ES received is shown in the log. I dont see any non ASCII
character it it.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko aa.vasilenko@gmail.comwrote:

How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan <
vineethmohan@algotree.com> wrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to understand
which character is invalid UTF-8.
The interesting side is that the same feed data sent from command line
using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko <aa.vasilenko@gmail.com

wrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line and
it went tru. We are using httpClient library of apache to push data to ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" : "no"
} ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" :
"no" },
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ] [Crimson
Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P], s[STARTED]: Failed
to execute [index {[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
[Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(vineeth mohan) #8

Tried with Jackson.
Its also not helping.

This is a sample json which failed.

{"SourceName":"someAge","fetchTimeStamp":"20-10-2011 06:31:16","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm","Author":"","Content":"MILLIONS
of people have paid the wrong tax, with many to be notified by HM Revenue &
Customs this weekend.\n\nThe revelation is the latest chapter in what has
been a tale of long-running and fundamental problems in the tax
system.\n\nSix million cases have been identified where overpayments have
been made, with the taxpayers set toreceive £400 per case. However, an
announcement earlier this year confirmed that 1.2 million people will owe an
average £600.\n\nThe discrepancies have come to light as the taxman beds in
its new PAYE IT system, and relate to the 2007/2008 tax year, and previous
years.\n\nUnderpayers will have the opportunity to pay back the money
through another alteration in their tax code, rather than being forced to
hand over a lump sum.\n\n"Money that is owed going back many years is now
going to be automatically paid back as we get the tax system up to
scratch," said an HMRC spokesman.\n\n"We are getting cases that were left
unreconciled up to date as quickly as possible. Anyone owed money will be
paid back with interest without the need to contact us.\n\n"The fact is
there will always be some cases at the end of every tax year that require an
under or overpayment to balance but these cases will reduce as the new
system beds in."\n\nLast year, around six million were told they had paid
the wrong amount of tax.\n\nMargaret Hodge, chairman of the Public Accounts
Committee, had said that last year's reconciliations showed HMRC had
"failed in its duty to process PAYE accurately and on time".\n\nIn
November, Bernadette Kenny, who had presided over PAYE during last year's
crisis, announced she was leaving the department after five years.\n\nThe
taxman had agreed to write off the debts for those who owed less than £300,
but this was reduced to £50 for the estimated 1.2 million underpayers for
the 2010/2011 tax year.\n\nHMRC chief executive Dame Lesley Strathie had
warned that resource cuts would hamper attempts to resolve what were
estimated at nearly 18 million pre-2008 PAYE reconciliation cases, by
2012.\n\nAccountants also spoke of their struggles to help clients' have
their PAYE debts written off by using an extra statutory concession.\n\nLast
week 146,000 pensioners were told that they face a tax underpayment for the
2010/2011 tax year.\n\nAnother big concern is that HMRC's move to real-time
PAYE information, for employers by October 2013, is too tight considering
the scale of the exercise.\n\nIn a recent consultation which saw 187
respondents, 75% said the plan was unachievable in the
timeframe.","actualTimeStamp":"19-10-2011 01:47:00","Title":"Backgrounder:
Months of PAYE problems"}

And the response is
{"error":"MapperParsingException[Failed to parse [Content]]; nested:
JsonParseException[Invalid UTF-8 start byte 0xa3\n at [Source: [B@1b15e2; line: 1, column: 701]]; ","status":400}

Can someone pls take a look.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:54 PM, Alex Vasilenko aa.vasilenko@gmail.comwrote:

Retry, using jackson http://jackson.codehaus.org/

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

I am using org.json.JSONOBJECT to build the json and extract the string.
I simple pass a map to it which have all the required maps.

The string ES received is shown in the log. I dont see any non ASCII
character it it.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko aa.vasilenko@gmail.comwrote:

How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan <
vineethmohan@algotree.com> wrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to understand
which character is invalid UTF-8.
The interesting side is that the same feed data sent from command line
using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko <
aa.vasilenko@gmail.com> wrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line and
it went tru. We are using httpClient library of apache to push data to ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" :
"no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" :
"no" },
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ] [Crimson
Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P], s[STARTED]: Failed
to execute [index {[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to
parse [Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(vineeth mohan) #9

As far as i can see , i dont see any non UTF character.
And the worse the same text from command line is going in fine. ( I cannot
reproduce it anywhere else )

It will really help if someone can help here.
Am stuck with this for some days. :frowning:

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:34 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Tried with Jackson.
Its also not helping.

This is a sample json which failed.

{"SourceName":"someAge","fetchTimeStamp":"20-10-2011 06:31:16","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm","Author":"","Content":"MILLIONS
of people have paid the wrong tax, with many to be notified by HM Revenue &
Customs this weekend.\n\nThe revelation is the latest chapter in what has
been a tale of long-running and fundamental problems in the tax
system.\n\nSix million cases have been identified where overpayments have
been made, with the taxpayers set toreceive £400 per case. However, an
announcement earlier this year confirmed that 1.2 million people will owe an
average £600.\n\nThe discrepancies have come to light as the taxman beds in
its new PAYE IT system, and relate to the 2007/2008 tax year, and previous
years.\n\nUnderpayers will have the opportunity to pay back the money
through another alteration in their tax code, rather than being forced to
hand over a lump sum.\n\n"Money that is owed going back many years is now
going to be automatically paid back as we get the tax system up to
scratch," said an HMRC spokesman.\n\n"We are getting cases that were left
unreconciled up to date as quickly as possible. Anyone owed money will be
paid back with interest without the need to contact us.\n\n"The fact is
there will always be some cases at the end of every tax year that require an
under or overpayment to balance but these cases will reduce as the new
system beds in."\n\nLast year, around six million were told they had paid
the wrong amount of tax.\n\nMargaret Hodge, chairman of the Public Accounts
Committee, had said that last year's reconciliations showed HMRC had
"failed in its duty to process PAYE accurately and on time".\n\nIn
November, Bernadette Kenny, who had presided over PAYE during last year's
crisis, announced she was leaving the department after five years.\n\nThe
taxman had agreed to write off the debts for those who owed less than £300,
but this was reduced to £50 for the estimated 1.2 million underpayers for
the 2010/2011 tax year.\n\nHMRC chief executive Dame Lesley Strathie had
warned that resource cuts would hamper attempts to resolve what were
estimated at nearly 18 million pre-2008 PAYE reconciliation cases, by
2012.\n\nAccountants also spoke of their struggles to help clients' have
their PAYE debts written off by using an extra statutory concession.\n\nLast
week 146,000 pensioners were told that they face a tax underpayment for the
2010/2011 tax year.\n\nAnother big concern is that HMRC's move to real-time
PAYE information, for employers by October 2013, is too tight considering
the scale of the exercise.\n\nIn a recent consultation which saw 187
respondents, 75% said the plan was unachievable in the
timeframe.","actualTimeStamp":"19-10-2011 01:47:00","Title":"Backgrounder:
Months of PAYE problems"}

And the response is
{"error":"MapperParsingException[Failed to parse [Content]]; nested:
JsonParseException[Invalid UTF-8 start byte 0xa3\n at [Source: [B@1b15e2; line: 1, column: 701]]; ","status":400}

Can someone pls take a look.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:54 PM, Alex Vasilenko aa.vasilenko@gmail.comwrote:

Retry, using jackson http://jackson.codehaus.org/

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

I am using org.json.JSONOBJECT to build the json and extract the string.
I simple pass a map to it which have all the required maps.

The string ES received is shown in the log. I dont see any non ASCII
character it it.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko aa.vasilenko@gmail.comwrote:

How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan <
vineethmohan@algotree.com> wrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to understand
which character is invalid UTF-8.
The interesting side is that the same feed data sent from command line
using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko <
aa.vasilenko@gmail.com> wrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line
and it went tru. We are using httpClient library of apache to push data to
ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" :
"no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" :
"no" },
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy
hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ] [Crimson
Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P], s[STARTED]: Failed
to execute [index {[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to
parse [Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(David Pilato) #10

Try to remove the pound sign "£" just to see if it's the problem there.

David :wink:

Le 20 oct. 2011 à 15:08, Vineeth Mohan vineethmohan@algotree.com a écrit :

As far as i can see , i dont see any non UTF character.
And the worse the same text from command line is going in fine. ( I cannot reproduce it anywhere else )

It will really help if someone can help here.
Am stuck with this for some days. :frowning:

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:34 PM, Vineeth Mohan vineethmohan@algotree.com wrote:
Tried with Jackson.
Its also not helping.

This is a sample json which failed.

{"SourceName":"someAge","fetchTimeStamp":"20-10-2011 06:31:16","link":"http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm","Author":"","Content":"MILLIONS of people have paid the wrong tax, with many to be notified by HM Revenue & Customs this weekend.\n\nThe revelation is the latest chapter in what has been a tale of long-running and fundamental problems in the tax system.\n\nSix million cases have been identified where overpayments have been made, with the taxpayers set toreceive £400 per case. However, an announcement earlier this year confirmed that 1.2 million people will owe an average £600.\n\nThe discrepancies have come to light as the taxman beds in its new PAYE IT system, and relate to the 2007/2008 tax year, and previous years.\n\nUnderpayers will have the opportunity to pay back the money through another alteration in their tax code, rather than being forced to hand over a lump sum.\n\n"Money that is owed going back many years is now going to be automatically paid back as we get the tax system up to scratch," said an HMRC spokesman.\n\n"We are getting cases that were left unreconciled up to date as quickly as possible. Anyone owed money will be paid back with interest without the need to contact us.\n\n"The fact is there will always be some cases at the end of every tax year that require an under or overpayment to balance but these cases will reduce as the new system beds in."\n\nLast year, around six million were told they had paid the wrong amount of tax.\n\nMargaret Hodge, chairman of the Public Accounts Committee, had said that last year's reconciliations showed HMRC had "failed in its duty to process PAYE accurately and on time".\n\nIn November, Bernadette Kenny, who had presided over PAYE during last year's crisis, announced she was leaving the department after five years.\n\nThe taxman had agreed to write off the debts for those who owed less than £300, but this was reduced to £50 for the estimated 1.2 million underpayers for the 2010/2011 tax year.\n\nHMRC chief executive Dame Lesley Strathie had warned that resource cuts would hamper attempts to resolve what were estimated at nearly 18 million pre-2008 PAYE reconciliation cases, by 2012.\n\nAccountants also spoke of their struggles to help clients' have their PAYE debts written off by using an extra statutory concession.\n\nLast week 146,000 pensioners were told that they face a tax underpayment for the 2010/2011 tax year.\n\nAnother big concern is that HMRC's move to real-time PAYE information, for employers by October 2013, is too tight considering the scale of the exercise.\n\nIn a recent consultation which saw 187 respondents, 75% said the plan was unachievable in the timeframe.","actualTimeStamp":"19-10-2011 01:47:00","Title":"Backgrounder: Months of PAYE problems"}

And the response is
{"error":"MapperParsingException[Failed to parse [Content]]; nested: JsonParseException[Invalid UTF-8 start byte 0xa3\n at [Source: [B@1b15e2; line: 1, column: 701]]; ","status":400}

Can someone pls take a look.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:54 PM, Alex Vasilenko aa.vasilenko@gmail.com wrote:
Retry, using jackson http://jackson.codehaus.org/

2011/10/20 Vineeth Mohan vineethmohan@algotree.com
I am using org.json.JSONOBJECT to build the json and extract the string.
I simple pass a map to it which have all the required maps.

The string ES received is shown in the log. I dont see any non ASCII character it it.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko aa.vasilenko@gmail.com wrote:
How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com
It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan vineethmohan@algotree.com wrote:
Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to understand which character is invalid UTF-8.
The interesting side is that the same feed data sent from command line using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko aa.vasilenko@gmail.com wrote:
Hi Vineeth,

You have invalid UTF-8 character :).
Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com
Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line and it went tru. We are using httpClient library of apache to push data to ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" : "no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" , "include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" : "no" },
"Author" : { "type" : "string" , "store" : "yes" , "include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" : "dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ] [Crimson Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P], s[STARTED]: Failed to execute [index {[algotree][public][BbNjGVU0SKuR51VuSfKYLg], source[{"fetchTimeStamp":"20-10-2011 03:32:22","SourceName":"accountancyAge","link":"http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011 02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive increases in monthly users in latest financial results.\n\nFinancialForce.com reported a 500% increase in monthly users compared with the same period last year.\n\nThe online software provider is part of UNIT 4 which announced its third quarter results highlighting revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m), compared with the same period a year ago.\n\nMore than 53% of revenues across the group were recurring, with Germany, Scandinavia and Asia showing strong growth. The UK, Poland and Benlux performed in line with expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse [Content]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException: Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(vineeth mohan) #11

I tired that.
Thing is the same text sent from simple curl command line is going in just
fine

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:42 PM, David Pilato david@pilato.fr wrote:

Try to remove the pound sign "£" just to see if it's the problem there.

David :wink:

Le 20 oct. 2011 à 15:08, Vineeth Mohan vineethmohan@algotree.com a
écrit :

As far as i can see , i dont see any non UTF character.
And the worse the same text from command line is going in fine. ( I cannot
reproduce it anywhere else )

It will really help if someone can help here.
Am stuck with this for some days. :frowning:

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:34 PM, Vineeth Mohan <vineethmohan@algotree.com
vineethmohan@algotree.com> wrote:

Tried with Jackson.
Its also not helping.

This is a sample json which failed.

{"SourceName":"someAge","fetchTimeStamp":"20-10-2011 06:31:16","link":"http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm
http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm","Author":"","Content":"MILLIONS
of people have paid the wrong tax, with many to be notified by HM Revenue &
Customs this weekend.\n\nThe revelation is the latest chapter in what has
been a tale of long-running and fundamental problems in the tax
system.\n\nSix million cases have been identified where overpayments have
been made, with the taxpayers set toreceive £400 per case. However, an
announcement earlier this year confirmed that 1.2 million people will owe an
average £600.\n\nThe discrepancies have come to light as the taxman beds in
its new PAYE IT system, and relate to the 2007/2008 tax year, and previous
years.\n\nUnderpayers will have the opportunity to pay back the money
through another alteration in their tax code, rather than being forced to
hand over a lump sum.\n\n"Money that is owed going back many years is now
going to be automatically paid back as we get the tax system up to
scratch," said an HMRC spokesman.\n\n"We are getting cases that were left
unreconciled up to date as quickly as possible. Anyone owed money will be
paid back with interest without the need to contact us.\n\n"The fact is
there will always be some cases at the end of every tax year that require an
under or overpayment to balance but these cases will reduce as the new
system beds in."\n\nLast year, around six million were told they had paid
the wrong amount of tax.\n\nMargaret Hodge, chairman of the Public Accounts
Committee, had said that last year's reconciliations showed HMRC had
"failed in its duty to process PAYE accurately and on time".\n\nIn
November, Bernadette Kenny, who had presided over PAYE during last year's
crisis, announced she was leaving the department after five years.\n\nThe
taxman had agreed to write off the debts for those who owed less than £300,
but this was reduced to £50 for the estimated 1.2 million underpayers for
the 2010/2011 tax year.\n\nHMRC chief executive Dame Lesley Strathie had
warned that resource cuts would hamper attempts to resolve what were
estimated at nearly 18 million pre-2008 PAYE reconciliation cases, by
2012.\n\nAccountants also spoke of their struggles to help clients' have
their PAYE debts written off by using an extra statutory concession.\n\nLast
week 146,000 pensioners were told that they face a tax underpayment for the
2010/2011 tax year.\n\nAnother big concern is that HMRC's move to real-time
PAYE information, for employers by October 2013, is too tight considering
the scale of the exercise.\n\nIn a recent consultation which saw 187
respondents, 75% said the plan was unachievable in the
timeframe.","actualTimeStamp":"19-10-2011 01:47:00","Title":"Backgrounder:
Months of PAYE problems"}

And the response is
{"error":"MapperParsingException[Failed to parse [Content]]; nested:
JsonParseException[Invalid UTF-8 start byte 0xa3\n at [Source: [B@1b15e2; line: 1, column: 701]]; ","status":400}

Can someone pls take a look.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:54 PM, Alex Vasilenko <aa.vasilenko@gmail.com
aa.vasilenko@gmail.com> wrote:

Retry, using jackson http://jackson.codehaus.org/
http://jackson.codehaus.org/

2011/10/20 Vineeth Mohan < vineethmohan@algotree.com
vineethmohan@algotree.com>

I am using org.json.JSONOBJECT to build the json and extract the string.
I simple pass a map to it which have all the required maps.

The string ES received is shown in the log. I dont see any non ASCII
character it it.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko <aa.vasilenko@gmail.com
aa.vasilenko@gmail.com> wrote:

How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan < vineethmohan@algotree.com
vineethmohan@algotree.com>

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan <vineethmohan@algotree.com
vineethmohan@algotree.com> wrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to
understand which character is invalid UTF-8.
The interesting side is that the same feed data sent from command
line using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko <aa.vasilenko@gmail.com
aa.vasilenko@gmail.com> wrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan < vineethmohan@algotree.com
vineethmohan@algotree.com>

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line
and it went tru. We are using httpClient library of apache to push data to
ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" :
"no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" :
"no" },
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ]
[Crimson Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P],
s[STARTED]: Failed to execute [index
{[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to
parse [Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(vineeth mohan) #12

David , thanks a tonn.

That seems to work perfectly.
Now need to understand how curl command didnt have a issue with this and why
my java call was repelled.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:44 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

I tired that.
Thing is the same text sent from simple curl command line is going in just
fine

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:42 PM, David Pilato david@pilato.fr wrote:

Try to remove the pound sign "£" just to see if it's the problem there.

David :wink:

Le 20 oct. 2011 à 15:08, Vineeth Mohan vineethmohan@algotree.com a
écrit :

As far as i can see , i dont see any non UTF character.
And the worse the same text from command line is going in fine. ( I cannot
reproduce it anywhere else )

It will really help if someone can help here.
Am stuck with this for some days. :frowning:

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:34 PM, Vineeth Mohan <vineethmohan@algotree.com
vineethmohan@algotree.com> wrote:

Tried with Jackson.
Its also not helping.

This is a sample json which failed.

{"SourceName":"someAge","fetchTimeStamp":"20-10-2011 06:31:16","link":"http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm
http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm","Author":"","Content":"MILLIONS
of people have paid the wrong tax, with many to be notified by HM Revenue &
Customs this weekend.\n\nThe revelation is the latest chapter in what has
been a tale of long-running and fundamental problems in the tax
system.\n\nSix million cases have been identified where overpayments have
been made, with the taxpayers set toreceive £400 per case. However, an
announcement earlier this year confirmed that 1.2 million people will owe an
average £600.\n\nThe discrepancies have come to light as the taxman beds in
its new PAYE IT system, and relate to the 2007/2008 tax year, and previous
years.\n\nUnderpayers will have the opportunity to pay back the money
through another alteration in their tax code, rather than being forced to
hand over a lump sum.\n\n"Money that is owed going back many years is now
going to be automatically paid back as we get the tax system up to
scratch," said an HMRC spokesman.\n\n"We are getting cases that were left
unreconciled up to date as quickly as possible. Anyone owed money will be
paid back with interest without the need to contact us.\n\n"The fact is
there will always be some cases at the end of every tax year that require an
under or overpayment to balance but these cases will reduce as the new
system beds in."\n\nLast year, around six million were told they had paid
the wrong amount of tax.\n\nMargaret Hodge, chairman of the Public Accounts
Committee, had said that last year's reconciliations showed HMRC had
"failed in its duty to process PAYE accurately and on time".\n\nIn
November, Bernadette Kenny, who had presided over PAYE during last year's
crisis, announced she was leaving the department after five years.\n\nThe
taxman had agreed to write off the debts for those who owed less than £300,
but this was reduced to £50 for the estimated 1.2 million underpayers for
the 2010/2011 tax year.\n\nHMRC chief executive Dame Lesley Strathie had
warned that resource cuts would hamper attempts to resolve what were
estimated at nearly 18 million pre-2008 PAYE reconciliation cases, by
2012.\n\nAccountants also spoke of their struggles to help clients' have
their PAYE debts written off by using an extra statutory concession.\n\nLast
week 146,000 pensioners were told that they face a tax underpayment for the
2010/2011 tax year.\n\nAnother big concern is that HMRC's move to real-time
PAYE information, for employers by October 2013, is too tight considering
the scale of the exercise.\n\nIn a recent consultation which saw 187
respondents, 75% said the plan was unachievable in the
timeframe.","actualTimeStamp":"19-10-2011 01:47:00","Title":"Backgrounder:
Months of PAYE problems"}

And the response is
{"error":"MapperParsingException[Failed to parse [Content]]; nested:
JsonParseException[Invalid UTF-8 start byte 0xa3\n at [Source: [B@1b15e2; line: 1, column: 701]]; ","status":400}

Can someone pls take a look.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:54 PM, Alex Vasilenko <aa.vasilenko@gmail.com
aa.vasilenko@gmail.com> wrote:

Retry, using jackson http://jackson.codehaus.org/
http://jackson.codehaus.org/

2011/10/20 Vineeth Mohan < vineethmohan@algotree.com
vineethmohan@algotree.com>

I am using org.json.JSONOBJECT to build the json and extract the
string.
I simple pass a map to it which have all the required maps.

The string ES received is shown in the log. I dont see any non ASCII
character it it.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko <aa.vasilenko@gmail.com
aa.vasilenko@gmail.com> wrote:

How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan < vineethmohan@algotree.com
vineethmohan@algotree.com>

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan <vineethmohan@algotree.com
vineethmohan@algotree.com> wrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to
understand which character is invalid UTF-8.
The interesting side is that the same feed data sent from command
line using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko <aa.vasilenko@gmail.com
aa.vasilenko@gmail.com> wrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan < vineethmohan@algotree.com
vineethmohan@algotree.com>

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line
and it went tru. We are using httpClient library of apache to push data to
ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" :
"no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" :
"no" },
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ]
[Crimson Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P],
s[STARTED]: Failed to execute [index
{[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to
parse [Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(Tomasz Kloc) #13

You didn't remove all pound characters. 0xa3 means exactly pound
character in iso-8859-1 format.

On 20.10.2011 15:14, Vineeth Mohan wrote:

I tired that.
Thing is the same text sent from simple curl command line is going in
just fine

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:42 PM, David Pilato <david@pilato.fr
mailto:david@pilato.fr> wrote:

Try to remove the pound sign "£" just to see if it's the problem
there.

David ;-)

Le 20 oct. 2011 Ã  15:08, Vineeth Mohan <vineethmohan@algotree.com
<mailto:vineethmohan@algotree.com>> a écrit :
As far as i can see  , i dont see any non UTF character.
And the worse the same text from command line is going in fine. (
I cannot reproduce it anywhere else )

It will really help if someone can help here.
Am stuck with this for some days. :(

Thanks
         Vineeth

On Thu, Oct 20, 2011 at 6:34 PM, Vineeth Mohan
<vineethmohan@algotree.com <mailto:vineethmohan@algotree.com>> wrote:

    Tried with Jackson.
    Its also not helping.

    This is a sample json which failed.

    {"SourceName":"someAge","fetchTimeStamp":"20-10-2011
    06:31:16","link":"http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm","Author":"","Content":"MILLIONS
    of people have paid the wrong tax, with many to be notified
    by HM Revenue & Customs this weekend.\n\nThe revelation is
    the latest chapter in what has been a tale of long-running
    and fundamental problems in the tax system.\n\nSix million
    cases have been identified where overpayments have been made,
    with the taxpayers set toreceive £400 per case. However, an
    announcement earlier this year confirmed that 1.2 million
    people will owe an average £600.\n\nThe discrepancies have
    come to light as the taxman beds in its new PAYE IT system,
    and relate to the 2007/2008 tax year, and previous
    years.\n\nUnderpayers will have the opportunity to pay back
    the money through another alteration in their tax code,
    rather than being forced to hand over a lump sum.\n\n\"Money
    that is owed going back many years is now going to be
    automatically paid back as we get the tax system up to
    scratch,\" said an HMRC spokesman.\n\n\"We are getting cases
    that were left unreconciled up to date as quickly as
    possible. Anyone owed money will be paid back with interest
    without the need to contact us.\n\n\"The fact is there will
    always be some cases at the end of every tax year that
    require an under or overpayment to balance but these cases
    will reduce as the new system beds in.\"\n\nLast year, around
    six million were told they had paid the wrong amount of
    tax.\n\nMargaret Hodge, chairman of the Public Accounts
    Committee, had said that last year's reconciliations showed
    HMRC had \"failed in its duty to process PAYE accurately and
    on time\".\n\nIn November, Bernadette Kenny, who had presided
    over PAYE during last year's crisis, announced she was
    leaving the department after five years.\n\nThe taxman had
    agreed to write off the debts for those who owed less than
    £300, but this was reduced to £50 for the estimated 1.2
    million underpayers for the 2010/2011 tax year.\n\nHMRC chief
    executive Dame Lesley Strathie had warned that resource cuts
    would hamper attempts to resolve what were estimated at
    nearly 18 million pre-2008 PAYE reconciliation cases, by
    2012.\n\nAccountants also spoke of their struggles to help
    clients' have their PAYE debts written off by using an extra
    statutory concession.\n\nLast week 146,000 pensioners were
    told that they face a tax underpayment for the 2010/2011 tax
    year.\n\nAnother big concern is that HMRC's move to real-time
    PAYE information, for employers by October 2013, is too tight
    considering the scale of the exercise.\n\nIn a recent
    consultation which saw 187 respondents, 75% said the plan was
    unachievable in the timeframe.","actualTimeStamp":"19-10-2011
    01:47:00","Title":"Backgrounder: Months of PAYE problems"}

    And the response is
    {"error":"MapperParsingException[Failed to parse [Content]];
    nested: JsonParseException[Invalid UTF-8 start byte 0xa3\n at
    [Source: [B@1b15e2; line: 1, column: 701]]; ","status":400}

    Can someone pls take a look.

    Thanks
               Vineeth


    On Thu, Oct 20, 2011 at 4:54 PM, Alex Vasilenko
    <aa.vasilenko@gmail.com <mailto:aa.vasilenko@gmail.com>> wrote:

        Retry, using jackson http://jackson.codehaus.org/

        2011/10/20 Vineeth Mohan <vineethmohan@algotree.com
        <mailto:vineethmohan@algotree.com>>

            I am using org.json.JSONOBJECT to build the json and
            extract the string.
            I simple pass a map to it which have all the required
            maps.

            The string ES received is shown in the log. I dont
            see any non ASCII character it it.

            Thanks
                      Vineeth


            On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko
            <aa.vasilenko@gmail.com
            <mailto:aa.vasilenko@gmail.com>> wrote:

                How do you serialize document before sending?

                Alexandr Vasilenko

                2011/10/20 Vineeth Mohan
                <vineethmohan@algotree.com
                <mailto:vineethmohan@algotree.com>>

                    It would be a great help if someone could
                    give pointers here ...

                    Thanks
                             Vineeth


                    On Thu, Oct 20, 2011 at 4:35 PM, Vineeth
                    Mohan <vineethmohan@algotree.com
                    <mailto:vineethmohan@algotree.com>> wrote:

                        Hello Alex ,

                        Thanks but yes , i noticed that.

                        The feed details are given in that log
                        but i am not able to understand which
                        character is invalid UTF-8.
                        The interesting side is that the same
                        feed data sent from command line using
                        curl works fine.

                        Thanks
                                   Vineeth


                        On Thu, Oct 20, 2011 at 3:43 PM, Alex
                        Vasilenko <aa.vasilenko@gmail.com
                        <mailto:aa.vasilenko@gmail.com>> wrote:

                            Hi Vineeth,

                            You have invalid UTF-8 character :).

                                Caused by:
                                org.elasticsearch.common.jackson.JsonParseException:
                                Invalid UTF-8 start byte 0xa3
                                 at [Source: [B@1fe0d66; line: 1,
                                column: 744]


                            Regards,
                            Alexandr Vasilenko


                            2011/10/20 Vineeth Mohan
                            <vineethmohan@algotree.com
                            <mailto:vineethmohan@algotree.com>>

                                Hi ,

                                I am seeing the following error
                                from elasticSearch side.
                                I took the same source shown in
                                ES and tried it from command line
                                and it went tru. We are using
                                httpClient library of apache to
                                push data to ES.
                                Also the schema of the group is
                                curl -X PUT
                                "localhost:9200/algotree/public/_mapping"
                                -d '{
                                "public" :{
                                    "properties" :{
                                        "PDF" : { "type" :
                                "string" , "store" : "yes" ,
                                "index" : "no" } ,
                                        "Title" : { "type" :
                                "string" , "store" : "yes" },
                                        "SourceName" : { "type" :
                                "string" , "store" : "yes" ,
                                "include_in_all" : "no"},
                                        "link" : { "type" :
                                "string" , "store" : "yes" ,
                                "index" : "no" },
                                        "Author" : { "type" :
                                "string" , "store" : "yes" ,
                                "include_in_all" : "no"},
                                        "fetchTimeStamp" : {
                                "type" : "date", "format" :
                                "dd-MM-yyyy hh:mm:ss" ,
                                "include_in_all" : "no"},
                                        "actualTimeStamp" : {
                                "type" : "date", "format" :
                                "dd-MM-yyyy hh:mm:ss" ,
                                "include_in_all" : "no"},
                                        "Content" : { "type" :
                                "string" , "store" : "yes" }
                                        }
                                    }
                                }'

                                I am not understanding where the
                                error is.

                                [2011-10-20
                                15:32:22,388][DEBUG][action.index            
                                ] [Crimson Dynamo] [algotree][1],
                                node[ow58UVQRTJ-BNffAJzHroA],
                                [P], s[STARTED]: Failed to
                                execute [index
                                {[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
                                source[{"fetchTimeStamp":"20-10-2011
                                03:32:22","SourceName":"accountancyAge","link":"http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
                                02:14:00","Content":"AN INTERNET
                                SOFTWAREbusiness has posted
                                massive increases in monthly
                                users in latest financial
                                results.\n\nFinancialForce.com
                                reported a 500% increase in
                                monthly users compared with the
                                same period last year.\n\nThe
                                online software provider is part
                                of UNIT 4 which announced its
                                third quarter results
                                highlighting revenue across all
                                its divisions had increased 3% to
                                \u20ac102.5m (£89.53m), compared
                                with the same period a year
                                ago.\n\nMore than 53% of revenues
                                across the group were recurring,
                                with Germany, Scandinavia and
                                Asia showing strong growth. The
                                UK, Poland and Benlux performed
                                in line with
                                expectations.","Title":"FinancialForce.com
                                <http://FinancialForce.com> sees
                                500% growth"}]}]
                                org.elasticsearch.index.mapper.MapperParsingException:
                                Failed to parse [Content]
                                    at
                                org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
                                    at
                                org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
                                    at
                                org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
                                    at
                                org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
                                    at
                                org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
                                    at
                                org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
                                    at
                                org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
                                    at
                                org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
                                    at
                                org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
                                    at
                                java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                                    at
                                java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                                    at
                                java.lang.Thread.run(Thread.java:619)
                                Caused by:
                                org.elasticsearch.common.jackson.JsonParseException:
                                Invalid UTF-8 start byte 0xa3
                                 at [Source: [B@1fe0d66; line: 1,
                                column: 744]
                                    at
                                org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
                                    at
                                org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
                                    at
                                org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
                                    at
                                org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
                                    at
                                org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
                                    at
                                org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
                                    at
                                org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
                                    at
                                org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
                                    at
                                org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
                                    at
                                org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
                                    at
                                org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
                                    at
                                org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
                                    ... 11 more


                                Thanks
                                          Vineeth

(vineeth mohan) #14

Can you explain how this worked when i used curl from command line !!!!

This was the bit which caught me.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:52 PM, Tomasz Kloc tomek.kloc.iit@gmail.comwrote:

**
You didn't remove all pound characters. 0xa3 means exactly pound character
in iso-8859-1 format.

On 20.10.2011 15:14, Vineeth Mohan wrote:

I tired that.
Thing is the same text sent from simple curl command line is going in just
fine

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:42 PM, David Pilato david@pilato.fr wrote:

Try to remove the pound sign "£" just to see if it's the problem there.

David :wink:

Le 20 oct. 2011 à 15:08, Vineeth Mohan vineethmohan@algotree.com a
écrit :

As far as i can see , i dont see any non UTF character.
And the worse the same text from command line is going in fine. ( I cannot
reproduce it anywhere else )

It will really help if someone can help here.
Am stuck with this for some days. :frowning:

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:34 PM, Vineeth Mohan <vineethmohan@algotree.com

wrote:

Tried with Jackson.
Its also not helping.

This is a sample json which failed.

{"SourceName":"someAge","fetchTimeStamp":"20-10-2011 06:31:16","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm","Author":"","Content":"MILLIONS
of people have paid the wrong tax, with many to be notified by HM Revenue &
Customs this weekend.\n\nThe revelation is the latest chapter in what has
been a tale of long-running and fundamental problems in the tax
system.\n\nSix million cases have been identified where overpayments have
been made, with the taxpayers set toreceive £400 per case. However, an
announcement earlier this year confirmed that 1.2 million people will owe an
average £600.\n\nThe discrepancies have come to light as the taxman beds in
its new PAYE IT system, and relate to the 2007/2008 tax year, and previous
years.\n\nUnderpayers will have the opportunity to pay back the money
through another alteration in their tax code, rather than being forced to
hand over a lump sum.\n\n"Money that is owed going back many years is now
going to be automatically paid back as we get the tax system up to
scratch," said an HMRC spokesman.\n\n"We are getting cases that were left
unreconciled up to date as quickly as possible. Anyone owed money will be
paid back with interest without the need to contact us.\n\n"The fact is
there will always be some cases at the end of every tax year that require an
under or overpayment to balance but these cases will reduce as the new
system beds in."\n\nLast year, around six million were told they had paid
the wrong amount of tax.\n\nMargaret Hodge, chairman of the Public Accounts
Committee, had said that last year's reconciliations showed HMRC had
"failed in its duty to process PAYE accurately and on time".\n\nIn
November, Bernadette Kenny, who had presided over PAYE during last year's
crisis, announced she was leaving the department after five years.\n\nThe
taxman had agreed to write off the debts for those who owed less than £300,
but this was reduced to £50 for the estimated 1.2 million underpayers for
the 2010/2011 tax year.\n\nHMRC chief executive Dame Lesley Strathie had
warned that resource cuts would hamper attempts to resolve what were
estimated at nearly 18 million pre-2008 PAYE reconciliation cases, by
2012.\n\nAccountants also spoke of their struggles to help clients' have
their PAYE debts written off by using an extra statutory concession.\n\nLast
week 146,000 pensioners were told that they face a tax underpayment for the
2010/2011 tax year.\n\nAnother big concern is that HMRC's move to real-time
PAYE information, for employers by October 2013, is too tight considering
the scale of the exercise.\n\nIn a recent consultation which saw 187
respondents, 75% said the plan was unachievable in the
timeframe.","actualTimeStamp":"19-10-2011 01:47:00","Title":"Backgrounder:
Months of PAYE problems"}

And the response is
{"error":"MapperParsingException[Failed to parse [Content]]; nested:
JsonParseException[Invalid UTF-8 start byte 0xa3\n at [Source: [B@1b15e2; line: 1, column: 701]]; ","status":400}

Can someone pls take a look.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:54 PM, Alex Vasilenko aa.vasilenko@gmail.comwrote:

Retry, using jackson http://jackson.codehaus.org/

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

I am using org.json.JSONOBJECT to build the json and extract the
string.
I simple pass a map to it which have all the required maps.

The string ES received is shown in the log. I dont see any non ASCII
character it it.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko <
aa.vasilenko@gmail.com> wrote:

How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan <
vineethmohan@algotree.com> wrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to
understand which character is invalid UTF-8.
The interesting side is that the same feed data sent from command
line using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko <
aa.vasilenko@gmail.com> wrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line
and it went tru. We are using httpClient library of apache to push data to
ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" :
"no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" :
"no" },
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ]
[Crimson Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P],
s[STARTED]: Failed to execute [index
{[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to
parse [Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(Tomasz Kloc) #15

Maybe curl depends on your locale settings and know what encoding use
when sending data?

On 20.10.2011 15:29, Vineeth Mohan wrote:

Can you explain how this worked when i used curl from command line !!!!

This was the bit which caught me.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:52 PM, Tomasz Kloc <tomek.kloc.iit@gmail.com
mailto:tomek.kloc.iit@gmail.com> wrote:

You didn't remove all pound characters. 0xa3 means exactly pound
character in iso-8859-1 format.

On 20.10.2011 15:14, Vineeth Mohan wrote:
I tired that.
Thing is the same text sent from simple curl command line is
going in just fine

Thanks
           Vineeth

On Thu, Oct 20, 2011 at 6:42 PM, David Pilato <david@pilato.fr
<mailto:david@pilato.fr>> wrote:

    Try to remove the pound sign "£" just to see if it's the
    problem there.

    David ;-)

    Le 20 oct. 2011 Ã  15:08, Vineeth Mohan
    <vineethmohan@algotree.com
    <mailto:vineethmohan@algotree.com>> a écrit :
    As far as i can see  , i dont see any non UTF character.
    And the worse the same text from command line is going in
    fine. ( I cannot reproduce it anywhere else )

    It will really help if someone can help here.
    Am stuck with this for some days. :(

    Thanks
             Vineeth

    On Thu, Oct 20, 2011 at 6:34 PM, Vineeth Mohan
    <vineethmohan@algotree.com
    <mailto:vineethmohan@algotree.com>> wrote:

        Tried with Jackson.
        Its also not helping.

        This is a sample json which failed.

        {"SourceName":"someAge","fetchTimeStamp":"20-10-2011
        06:31:16","link":"http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm","Author":"","Content":"MILLIONS
        of people have paid the wrong tax, with many to be
        notified by HM Revenue & Customs this weekend.\n\nThe
        revelation is the latest chapter in what has been a tale
        of long-running and fundamental problems in the tax
        system.\n\nSix million cases have been identified where
        overpayments have been made, with the taxpayers set
        toreceive £400 per case. However, an announcement
        earlier this year confirmed that 1.2 million people will
        owe an average £600.\n\nThe discrepancies have come to
        light as the taxman beds in its new PAYE IT system, and
        relate to the 2007/2008 tax year, and previous
        years.\n\nUnderpayers will have the opportunity to pay
        back the money through another alteration in their tax
        code, rather than being forced to hand over a lump
        sum.\n\n\"Money that is owed going back many years is
        now going to be automatically paid back as we get the
        tax system up to scratch,\" said an HMRC
        spokesman.\n\n\"We are getting cases that were left
        unreconciled up to date as quickly as possible. Anyone
        owed money will be paid back with interest without the
        need to contact us.\n\n\"The fact is there will always
        be some cases at the end of every tax year that require
        an under or overpayment to balance but these cases will
        reduce as the new system beds in.\"\n\nLast year, around
        six million were told they had paid the wrong amount of
        tax.\n\nMargaret Hodge, chairman of the Public Accounts
        Committee, had said that last year's reconciliations
        showed HMRC had \"failed in its duty to process PAYE
        accurately and on time\".\n\nIn November, Bernadette
        Kenny, who had presided over PAYE during last year's
        crisis, announced she was leaving the department after
        five years.\n\nThe taxman had agreed to write off the
        debts for those who owed less than £300, but this was
        reduced to £50 for the estimated 1.2 million underpayers
        for the 2010/2011 tax year.\n\nHMRC chief executive Dame
        Lesley Strathie had warned that resource cuts would
        hamper attempts to resolve what were estimated at nearly
        18 million pre-2008 PAYE reconciliation cases, by
        2012.\n\nAccountants also spoke of their struggles to
        help clients' have their PAYE debts written off by using
        an extra statutory concession.\n\nLast week 146,000
        pensioners were told that they face a tax underpayment
        for the 2010/2011 tax year.\n\nAnother big concern is
        that HMRC's move to real-time PAYE information, for
        employers by October 2013, is too tight considering the
        scale of the exercise.\n\nIn a recent consultation which
        saw 187 respondents, 75% said the plan was unachievable
        in the timeframe.","actualTimeStamp":"19-10-2011
        01:47:00","Title":"Backgrounder: Months of PAYE problems"}

        And the response is
        {"error":"MapperParsingException[Failed to parse
        [Content]]; nested: JsonParseException[Invalid UTF-8
        start byte 0xa3\n at [Source: [B@1b15e2; line: 1,
        column: 701]]; ","status":400}

        Can someone pls take a look.

        Thanks
                   Vineeth


        On Thu, Oct 20, 2011 at 4:54 PM, Alex Vasilenko
        <aa.vasilenko@gmail.com <mailto:aa.vasilenko@gmail.com>>
        wrote:

            Retry, using jackson http://jackson.codehaus.org/

            2011/10/20 Vineeth Mohan <vineethmohan@algotree.com
            <mailto:vineethmohan@algotree.com>>

                I am using org.json.JSONOBJECT to build the json
                and extract the string.
                I simple pass a map to it which have all the
                required maps.

                The string ES received is shown in the log. I
                dont see any non ASCII character it it.

                Thanks
                          Vineeth


                On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko
                <aa.vasilenko@gmail.com
                <mailto:aa.vasilenko@gmail.com>> wrote:

                    How do you serialize document before sending?

                    Alexandr Vasilenko

                    2011/10/20 Vineeth Mohan
                    <vineethmohan@algotree.com
                    <mailto:vineethmohan@algotree.com>>

                        It would be a great help if someone
                        could give pointers here ...

                        Thanks
                                 Vineeth


                        On Thu, Oct 20, 2011 at 4:35 PM, Vineeth
                        Mohan <vineethmohan@algotree.com
                        <mailto:vineethmohan@algotree.com>> wrote:

                            Hello Alex ,

                            Thanks but yes , i noticed that.

                            The feed details are given in that
                            log but i am not able to understand
                            which character is invalid UTF-8.
                            The interesting side is that the
                            same feed data sent from command
                            line using curl works fine.

                            Thanks
                                       Vineeth


                            On Thu, Oct 20, 2011 at 3:43 PM,
                            Alex Vasilenko
                            <aa.vasilenko@gmail.com
                            <mailto:aa.vasilenko@gmail.com>> wrote:

                                Hi Vineeth,

                                You have invalid UTF-8 character
                                :).

                                    Caused by:
                                    org.elasticsearch.common.jackson.JsonParseException:
                                    Invalid UTF-8 start byte 0xa3
                                     at [Source: [B@1fe0d66;
                                    line: 1, column: 744]


                                Regards,
                                Alexandr Vasilenko


                                2011/10/20 Vineeth Mohan
                                <vineethmohan@algotree.com
                                <mailto:vineethmohan@algotree.com>>

                                    Hi ,

                                    I am seeing the following
                                    error from elasticSearch side.
                                    I took the same source shown
                                    in ES and tried it from
                                    command line and it went
                                    tru. We are using httpClient
                                    library of apache to push
                                    data to ES.
                                    Also the schema of the group is
                                    curl -X PUT
                                    "localhost:9200/algotree/public/_mapping"
                                    -d '{
                                    "public" :{
                                        "properties" :{
                                            "PDF" : { "type" :
                                    "string" , "store" : "yes" ,
                                    "index" : "no" } ,
                                            "Title" : { "type" :
                                    "string" , "store" : "yes" },
                                            "SourceName" : {
                                    "type" : "string" , "store"
                                    : "yes" , "include_in_all" :
                                    "no"},
                                            "link" : { "type" :
                                    "string" , "store" : "yes" ,
                                    "index" : "no" },
                                            "Author" : { "type"
                                    : "string" , "store" : "yes"
                                    , "include_in_all" : "no"},
                                            "fetchTimeStamp" : {
                                    "type" : "date", "format" :
                                    "dd-MM-yyyy hh:mm:ss" ,
                                    "include_in_all" : "no"},
                                            "actualTimeStamp" :
                                    { "type" : "date", "format"
                                    : "dd-MM-yyyy hh:mm:ss" ,
                                    "include_in_all" : "no"},
                                            "Content" : { "type"
                                    : "string" , "store" : "yes" }
                                            }
                                        }
                                    }'

                                    I am not understanding where
                                    the error is.

                                    [2011-10-20
                                    15:32:22,388][DEBUG][action.index            
                                    ] [Crimson Dynamo]
                                    [algotree][1],
                                    node[ow58UVQRTJ-BNffAJzHroA], [P],
                                    s[STARTED]: Failed to
                                    execute [index
                                    {[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
                                    source[{"fetchTimeStamp":"20-10-2011
                                    03:32:22","SourceName":"accountancyAge","link":"http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
                                    02:14:00","Content":"AN
                                    INTERNET SOFTWAREbusiness
                                    has posted massive increases
                                    in monthly users in latest
                                    financial
                                    results.\n\nFinancialForce.com
                                    reported a 500% increase in
                                    monthly users compared with
                                    the same period last
                                    year.\n\nThe online software
                                    provider is part of UNIT 4
                                    which announced its third
                                    quarter results highlighting
                                    revenue across all its
                                    divisions had increased 3%
                                    to \u20ac102.5m (£89.53m),
                                    compared with the same
                                    period a year ago.\n\nMore
                                    than 53% of revenues across
                                    the group were recurring,
                                    with Germany, Scandinavia
                                    and Asia showing strong
                                    growth. The UK, Poland and
                                    Benlux performed in line
                                    with
                                    expectations.","Title":"FinancialForce.com
                                    <http://FinancialForce.com>
                                    sees 500% growth"}]}]
                                    org.elasticsearch.index.mapper.MapperParsingException:
                                    Failed to parse [Content]
                                        at
                                    org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
                                        at
                                    org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
                                        at
                                    org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
                                        at
                                    org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
                                        at
                                    org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
                                        at
                                    org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
                                        at
                                    org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
                                        at
                                    org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
                                        at
                                    org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
                                        at
                                    java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                                        at
                                    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                                        at
                                    java.lang.Thread.run(Thread.java:619)
                                    Caused by:
                                    org.elasticsearch.common.jackson.JsonParseException:
                                    Invalid UTF-8 start byte 0xa3
                                     at [Source: [B@1fe0d66;
                                    line: 1, column: 744]
                                        at
                                    org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
                                        at
                                    org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
                                        at
                                    org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
                                        at
                                    org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
                                        at
                                    org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
                                        at
                                    org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
                                        at
                                    org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
                                        at
                                    org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
                                        at
                                    org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
                                        at
                                    org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
                                        at
                                    org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
                                        at
                                    org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
                                        ... 11 more


                                    Thanks
                                              Vineeth

(vineeth mohan) #16

1 quick question.
On light of the current issue , i am considering to move to the ES API for
Java which is provided.
I dropped it in the first place , cause lotz of stuffs including lucene is
being pulled into the dependency if i use the standard ES API.

Will using the standard ES API bring any speed variations ?

Thanks
Vineeth

On Thu, Oct 20, 2011 at 7:10 PM, Tomasz Kloc tomek.kloc.iit@gmail.comwrote:

**
Maybe curl depends on your locale settings and know what encoding use when
sending data?

On 20.10.2011 15:29, Vineeth Mohan wrote:

Can you explain how this worked when i used curl from command line !!!!

This was the bit which caught me.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:52 PM, Tomasz Kloc tomek.kloc.iit@gmail.comwrote:

You didn't remove all pound characters. 0xa3 means exactly pound
character in iso-8859-1 format.

On 20.10.2011 15:14, Vineeth Mohan wrote:

I tired that.
Thing is the same text sent from simple curl command line is going in just
fine

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:42 PM, David Pilato david@pilato.fr wrote:

Try to remove the pound sign "£" just to see if it's the problem there.

David :wink:

Le 20 oct. 2011 à 15:08, Vineeth Mohan vineethmohan@algotree.com a
écrit :

As far as i can see , i dont see any non UTF character.
And the worse the same text from command line is going in fine. ( I
cannot reproduce it anywhere else )

It will really help if someone can help here.
Am stuck with this for some days. :frowning:

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:34 PM, Vineeth Mohan <
vineethmohan@algotree.com> wrote:

Tried with Jackson.
Its also not helping.

This is a sample json which failed.

{"SourceName":"someAge","fetchTimeStamp":"20-10-2011 06:31:16","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm","Author":"","Content":"MILLIONS
of people have paid the wrong tax, with many to be notified by HM Revenue &
Customs this weekend.\n\nThe revelation is the latest chapter in what has
been a tale of long-running and fundamental problems in the tax
system.\n\nSix million cases have been identified where overpayments have
been made, with the taxpayers set toreceive £400 per case. However, an
announcement earlier this year confirmed that 1.2 million people will owe an
average £600.\n\nThe discrepancies have come to light as the taxman beds in
its new PAYE IT system, and relate to the 2007/2008 tax year, and previous
years.\n\nUnderpayers will have the opportunity to pay back the money
through another alteration in their tax code, rather than being forced to
hand over a lump sum.\n\n"Money that is owed going back many years is now
going to be automatically paid back as we get the tax system up to
scratch," said an HMRC spokesman.\n\n"We are getting cases that were left
unreconciled up to date as quickly as possible. Anyone owed money will be
paid back with interest without the need to contact us.\n\n"The fact is
there will always be some cases at the end of every tax year that require an
under or overpayment to balance but these cases will reduce as the new
system beds in."\n\nLast year, around six million were told they had paid
the wrong amount of tax.\n\nMargaret Hodge, chairman of the Public Accounts
Committee, had said that last year's reconciliations showed HMRC had
"failed in its duty to process PAYE accurately and on time".\n\nIn
November, Bernadette Kenny, who had presided over PAYE during last year's
crisis, announced she was leaving the department after five years.\n\nThe
taxman had agreed to write off the debts for those who owed less than £300,
but this was reduced to £50 for the estimated 1.2 million underpayers for
the 2010/2011 tax year.\n\nHMRC chief executive Dame Lesley Strathie had
warned that resource cuts would hamper attempts to resolve what were
estimated at nearly 18 million pre-2008 PAYE reconciliation cases, by
2012.\n\nAccountants also spoke of their struggles to help clients' have
their PAYE debts written off by using an extra statutory concession.\n\nLast
week 146,000 pensioners were told that they face a tax underpayment for the
2010/2011 tax year.\n\nAnother big concern is that HMRC's move to real-time
PAYE information, for employers by October 2013, is too tight considering
the scale of the exercise.\n\nIn a recent consultation which saw 187
respondents, 75% said the plan was unachievable in the
timeframe.","actualTimeStamp":"19-10-2011 01:47:00","Title":"Backgrounder:
Months of PAYE problems"}

And the response is
{"error":"MapperParsingException[Failed to parse [Content]]; nested:
JsonParseException[Invalid UTF-8 start byte 0xa3\n at [Source: [B@1b15e2; line: 1, column: 701]]; ","status":400}

Can someone pls take a look.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:54 PM, Alex Vasilenko <aa.vasilenko@gmail.com

wrote:

Retry, using jackson http://jackson.codehaus.org/

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

I am using org.json.JSONOBJECT to build the json and extract the
string.
I simple pass a map to it which have all the required maps.

The string ES received is shown in the log. I dont see any non ASCII
character it it.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko <
aa.vasilenko@gmail.com> wrote:

How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan <
vineethmohan@algotree.com> wrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to
understand which character is invalid UTF-8.
The interesting side is that the same feed data sent from command
line using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko <
aa.vasilenko@gmail.com> wrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line
and it went tru. We are using httpClient library of apache to push data to
ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" :
"no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index"
: "no" },
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ]
[Crimson Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P],
s[STARTED]: Failed to execute [index
{[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly users
compared with the same period last year.\n\nThe online software provider is
part of UNIT 4 which announced its third quarter results highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m (£89.53m),
compared with the same period a year ago.\n\nMore than 53% of revenues
across the group were recurring, with Germany, Scandinavia and Asia showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to
parse [Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth


(David Pilato) #17

The command line does not use the same encoding for characters than your web
application.
When you cut&paste from one document to the command line, you think you have the
same encoding as you see the same thing on your screen but this is not true.

What you see is different in term of character encoding.

The pound character is 0xa3 in iso-8859-1 [1]
But in UTF-8, 0xa3 is not really a readable character [2]

That's my opinion.

[1] http://en.wikipedia.org/wiki/ISO_8859-1 (see Codepage layout section)
[2] http://en.wikipedia.org/wiki/UTF-8 (see Codepage layout section)

HTH
David.

Le 20 octobre 2011 à 15:29, Vineeth Mohan vineethmohan@algotree.com a écrit :

Can you explain how this worked when i used curl from command line !!!!

This was the bit which caught me.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:52 PM, Tomasz Kloc tomek.kloc.iit@gmail.comwrote:

**
You didn't remove all pound characters. 0xa3 means exactly pound character
in iso-8859-1 format.

On 20.10.2011 15:14, Vineeth Mohan wrote:

I tired that.
Thing is the same text sent from simple curl command line is going in just
fine

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:42 PM, David Pilato david@pilato.fr wrote:

Try to remove the pound sign "£" just to see if it's the problem there.

David :wink:

Le 20 oct. 2011 à 15:08, Vineeth Mohan vineethmohan@algotree.com a
écrit :

As far as i can see , i dont see any non UTF character.
And the worse the same text from command line is going in fine. ( I cannot
reproduce it anywhere else )

It will really help if someone can help here.
Am stuck with this for some days. :frowning:

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:34 PM, Vineeth Mohan <vineethmohan@algotree.com

wrote:

Tried with Jackson.
Its also not helping.

This is a sample json which failed.

{"SourceName":"someAge","fetchTimeStamp":"20-10-2011 06:31:16","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm","Author":"","Content":"MILLIONS
of people have paid the wrong tax, with many to be notified by HM Revenue
&
Customs this weekend.\n\nThe revelation is the latest chapter in what has
been a tale of long-running and fundamental problems in the tax
system.\n\nSix million cases have been identified where overpayments have
been made, with the taxpayers set toreceive £400 per case. However, an
announcement earlier this year confirmed that 1.2 million people will owe
an
average £600.\n\nThe discrepancies have come to light as the taxman beds
in
its new PAYE IT system, and relate to the 2007/2008 tax year, and previous
years.\n\nUnderpayers will have the opportunity to pay back the money
through another alteration in their tax code, rather than being forced to
hand over a lump sum.\n\n"Money that is owed going back many years is now
going to be automatically paid back as we get the tax system up to
scratch," said an HMRC spokesman.\n\n"We are getting cases that were
left
unreconciled up to date as quickly as possible. Anyone owed money will be
paid back with interest without the need to contact us.\n\n"The fact is
there will always be some cases at the end of every tax year that require
an
under or overpayment to balance but these cases will reduce as the new
system beds in."\n\nLast year, around six million were told they had paid
the wrong amount of tax.\n\nMargaret Hodge, chairman of the Public
Accounts
Committee, had said that last year's reconciliations showed HMRC had
"failed in its duty to process PAYE accurately and on time".\n\nIn
November, Bernadette Kenny, who had presided over PAYE during last year's
crisis, announced she was leaving the department after five years.\n\nThe
taxman had agreed to write off the debts for those who owed less than
£300,
but this was reduced to £50 for the estimated 1.2 million underpayers for
the 2010/2011 tax year.\n\nHMRC chief executive Dame Lesley Strathie had
warned that resource cuts would hamper attempts to resolve what were
estimated at nearly 18 million pre-2008 PAYE reconciliation cases, by
2012.\n\nAccountants also spoke of their struggles to help clients' have
their PAYE debts written off by using an extra statutory
concession.\n\nLast
week 146,000 pensioners were told that they face a tax underpayment for
the
2010/2011 tax year.\n\nAnother big concern is that HMRC's move to
real-time
PAYE information, for employers by October 2013, is too tight considering
the scale of the exercise.\n\nIn a recent consultation which saw 187
respondents, 75% said the plan was unachievable in the
timeframe.","actualTimeStamp":"19-10-2011 01:47:00","Title":"Backgrounder:
Months of PAYE problems"}

And the response is
{"error":"MapperParsingException[Failed to parse [Content]]; nested:
JsonParseException[Invalid UTF-8 start byte 0xa3\n at [Source: [B@1b15e2; line: 1, column: 701]]; ","status":400}

Can someone pls take a look.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:54 PM, Alex Vasilenko
aa.vasilenko@gmail.comwrote:

Retry, using jackson http://jackson.codehaus.org/

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

I am using org.json.JSONOBJECT to build the json and extract the
string.
I simple pass a map to it which have all the required maps.

The string ES received is shown in the log. I dont see any non ASCII
character it it.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko <
aa.vasilenko@gmail.com> wrote:

How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan <
vineethmohan@algotree.com> wrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to
understand which character is invalid UTF-8.
The interesting side is that the same feed data sent from command
line using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko <
aa.vasilenko@gmail.com> wrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command line
and it went tru. We are using httpClient library of apache to push
data to
ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" , "index" :
"no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" , "index" :
"no" },
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ]
[Crimson Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA], [P],
s[STARTED]: Failed to execute [index
{[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"
http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011
02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted
massive
increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in monthly
users
compared with the same period last year.\n\nThe online software
provider is
part of UNIT 4 which announced its third quarter results
highlighting
revenue across all its divisions had increased 3% to \u20ac102.5m
(£89.53m),
compared with the same period a year ago.\n\nMore than 53% of
revenues
across the group were recurring, with Germany, Scandinavia and Asia
showing
strong growth. The UK, Poland and Benlux performed in line with
expectations.","Title":"FinancialForce.com sees 500% growth"}]}]
org.elasticsearch.index.mapper.MapperParsingException: Failed to
parse [Content]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.common.jackson.JsonParseException:
Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)
... 11 more

Thanks
Vineeth

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(vineeth mohan) #18

Thanks guyz ,

Let me try using a simple UTF conversion before hitting ES.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 7:15 PM, david@pilato.fr david@pilato.fr wrote:

**

The command line does not use the same encoding for characters than your
web application.

When you cut&paste from one document to the command line, you think you
have the same encoding as you see the same thing on your screen but this is
not true.

What you see is different in term of character encoding.

The pound character is 0xa3 in iso-8859-1 [1]

But in UTF-8, 0xa3 is not really a readable character [2]

That's my opinion.

[1] http://en.wikipedia.org/wiki/ISO_8859-1 (see Codepage layout section)

[2] http://en.wikipedia.org/wiki/UTF-8 (see Codepage layout section)

HTH

David.

Le 20 octobre 2011 à 15:29, Vineeth Mohan vineethmohan@algotree.com a
écrit :

Can you explain how this worked when i used curl from command line !!!!

This was the bit which caught me.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:52 PM, Tomasz Kloc tomek.kloc.iit@gmail.comwrote:

**
You didn't remove all pound characters. 0xa3 means exactly pound
character

in iso-8859-1 format.

On 20.10.2011 15:14, Vineeth Mohan wrote:

I tired that.
Thing is the same text sent from simple curl command line is going in
just

fine

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:42 PM, David Pilato david@pilato.fr wrote:

Try to remove the pound sign "£" just to see if it's the problem
there.

David :wink:

Le 20 oct. 2011 à 15:08, Vineeth Mohan vineethmohan@algotree.com a
écrit :

As far as i can see , i dont see any non UTF character.
And the worse the same text from command line is going in fine. ( I
cannot

reproduce it anywhere else )

It will really help if someone can help here.
Am stuck with this for some days. :frowning:

Thanks
Vineeth

On Thu, Oct 20, 2011 at 6:34 PM, Vineeth Mohan <
vineethmohan@algotree.com

wrote:

Tried with Jackson.
Its also not helping.

This is a sample json which failed.

{"SourceName":"someAge","fetchTimeStamp":"20-10-2011
06:31:16","link":"

http://feeds.accountancyage.com/c/551/f/540070/s/19644c19/l/0L0Saccountancyage0N0Caa0Cnews0C21181580Cbackgrounder0Emonths0Epaye0DWT0Brss0If0FNews0GWT0Brss0Ia0FBackgrounder0J3A0KMonths0Kof0KPAYE0Kproblems/story01.htm","Author":"","Content":"MILLIONS

of people have paid the wrong tax, with many to be notified by HM
Revenue &

Customs this weekend.\n\nThe revelation is the latest chapter in what
has

been a tale of long-running and fundamental problems in the tax
system.\n\nSix million cases have been identified where overpayments
have

been made, with the taxpayers set toreceive £400 per case. However,
an

announcement earlier this year confirmed that 1.2 million people will
owe an

average £600.\n\nThe discrepancies have come to light as the taxman
beds in

its new PAYE IT system, and relate to the 2007/2008 tax year, and
previous

years.\n\nUnderpayers will have the opportunity to pay back the money

through another alteration in their tax code, rather than being
forced to

hand over a lump sum.\n\n"Money that is owed going back many years
is now

going to be automatically paid back as we get the tax system up to
scratch," said an HMRC spokesman.\n\n"We are getting cases that
were left

unreconciled up to date as quickly as possible. Anyone owed money
will be

paid back with interest without the need to contact us.\n\n"The fact
is

there will always be some cases at the end of every tax year that
require an

under or overpayment to balance but these cases will reduce as the
new

system beds in."\n\nLast year, around six million were told they had
paid

the wrong amount of tax.\n\nMargaret Hodge, chairman of the Public
Accounts

Committee, had said that last year's reconciliations showed HMRC had
"failed in its duty to process PAYE accurately and on time".\n\nIn
November, Bernadette Kenny, who had presided over PAYE during last
year's

crisis, announced she was leaving the department after five
years.\n\nThe

taxman had agreed to write off the debts for those who owed less than
£300,

but this was reduced to £50 for the estimated 1.2 million underpayers
for

the 2010/2011 tax year.\n\nHMRC chief executive Dame Lesley Strathie
had

warned that resource cuts would hamper attempts to resolve what were
estimated at nearly 18 million pre-2008 PAYE reconciliation cases, by

2012.\n\nAccountants also spoke of their struggles to help clients'
have

their PAYE debts written off by using an extra statutory
concession.\n\nLast

week 146,000 pensioners were told that they face a tax underpayment
for the

2010/2011 tax year.\n\nAnother big concern is that HMRC's move to
real-time

PAYE information, for employers by October 2013, is too tight
considering

the scale of the exercise.\n\nIn a recent consultation which saw 187
respondents, 75% said the plan was unachievable in the
timeframe.","actualTimeStamp":"19-10-2011
01:47:00","Title":"Backgrounder:

Months of PAYE problems"}

And the response is
{"error":"MapperParsingException[Failed to parse [Content]]; nested:
JsonParseException[Invalid UTF-8 start byte 0xa3\n at [Source:
[B@1b15e2;

line: 1, column: 701]]; ","status":400}

Can someone pls take a look.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:54 PM, Alex Vasilenko <
aa.vasilenko@gmail.com>wrote:

Retry, using jackson http://jackson.codehaus.org/

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

I am using org.json.JSONOBJECT to build the json and extract the
string.
I simple pass a map to it which have all the required maps.

The string ES received is shown in the log. I dont see any non
ASCII

character it it.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:48 PM, Alex Vasilenko <
aa.vasilenko@gmail.com> wrote:

How do you serialize document before sending?

Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

It would be a great help if someone could give pointers here ...

Thanks
Vineeth

On Thu, Oct 20, 2011 at 4:35 PM, Vineeth Mohan <
vineethmohan@algotree.com> wrote:

Hello Alex ,

Thanks but yes , i noticed that.

The feed details are given in that log but i am not able to
understand which character is invalid UTF-8.
The interesting side is that the same feed data sent from
command

line using curl works fine.

Thanks
Vineeth

On Thu, Oct 20, 2011 at 3:43 PM, Alex Vasilenko <
aa.vasilenko@gmail.com> wrote:

Hi Vineeth,

You have invalid UTF-8 character :).

Caused by:
org.elasticsearch.common.jackson.JsonParseException:

Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]

Regards,
Alexandr Vasilenko

2011/10/20 Vineeth Mohan vineethmohan@algotree.com

Hi ,

I am seeing the following error from elasticSearch side.
I took the same source shown in ES and tried it from command
line

and it went tru. We are using httpClient library of apache to
push data to

ES.
Also the schema of the group is
curl -X PUT "localhost:9200/algotree/public/_mapping" -d '{
"public" :{
"properties" :{
"PDF" : { "type" : "string" , "store" : "yes" ,
"index" :

"no" } ,
"Title" : { "type" : "string" , "store" : "yes" },
"SourceName" : { "type" : "string" , "store" : "yes" ,

"include_in_all" : "no"},
"link" : { "type" : "string" , "store" : "yes" ,
"index" :

"no" },
"Author" : { "type" : "string" , "store" : "yes" ,
"include_in_all" : "no"},
"fetchTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"actualTimeStamp" : { "type" : "date", "format" :
"dd-MM-yyyy hh:mm:ss" , "include_in_all" : "no"},
"Content" : { "type" : "string" , "store" : "yes" }
}
}
}'

I am not understanding where the error is.

[2011-10-20 15:32:22,388][DEBUG][action.index ]
[Crimson Dynamo] [algotree][1], node[ow58UVQRTJ-BNffAJzHroA],
[P],

s[STARTED]: Failed to execute [index
{[algotree][public][BbNjGVU0SKuR51VuSfKYLg],
source[{"fetchTimeStamp":"20-10-2011
03:32:22","SourceName":"accountancyAge","link":"

http://feeds.accountancyage.com/c/551/f/540070/s/196b7d32/l/0L0Saccountancyage0N0Caa0Cnews0C21185550Cfinancialforcecom0E50A0A0Egrowth0DWT0Brss0If0FNews0GWT0Brss0Ia0FFinancialForce0N0Ksees0K50A0A0J250Kgrowth/story01.htm","Author":"","actualTimeStamp":"20-10-2011

02:14:00","Content":"AN INTERNET SOFTWAREbusiness has posted
massive

increases in monthly users in latest financial
results.\n\nFinancialForce.com reported a 500% increase in
monthly users

compared with the same period last year.\n\nThe online
software provider is

part of UNIT 4 which announced its third quarter results
highlighting

revenue across all its divisions had increased 3% to
\u20ac102.5m (£89.53m),

compared with the same period a year ago.\n\nMore than 53% of
revenues

across the group were recurring, with Germany, Scandinavia and
Asia showing

strong growth. The UK, Poland and Benlux performed in line
with

expectations.","Title":"FinancialForce.com sees 500%
growth"}]}]

org.elasticsearch.index.mapper.MapperParsingException: Failed
to

parse [Content]
at

org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:312)

at

org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:577)

at

org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:443)

at

org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)

at

org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)

at

org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:269)

at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193)

at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:464)

at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:377)

at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

Caused by:
org.elasticsearch.common.jackson.JsonParseException:

Invalid UTF-8 start byte 0xa3
at [Source: [B@1fe0d66; line: 1, column: 744]
at

org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)

at

org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)

at

org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)

at

org.elasticsearch.common.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)

at

org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)

at

org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)

at

org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)

at

org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)

at

org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)

at

org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)

at

org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:44)

at

org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:299)

... 11 more

Thanks
Vineeth

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(system) #19