Illegal unicode escape sequence error

Preeti_Jain · April 3, 2014, 12:43pm

Hi,

We are using elasticsearch version 1.0.1
For updating one of the ES docs, following script is being passed to
updaterequestbuilder via setScript method.

ctx._source.operation= {
"operationID": 290,
"opsThreatLevel": "Low",
"opsName": "OPERATION_SIN",
"opsStartDate": "2014-04-01T00:00:00",
"opsRefNumber": "10245678",
"opsEndDate": "2014-04-23T00:00:00",
"opsDescription": "\u003cp\u003eFake Operation\u003c/p\u003e",
"opsComments": "\u003cp\u003efake clues\u003c/p\u003e",
"dateCreated": "2014-04-02T00:00:00"
}

When the update request gets executed we get following exception

Exception in thread "main"
org.elasticsearch.ElasticsearchIllegalArgumentException: failed to execute
script
at
org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:153)
at
org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:80)
at
org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:189)
at
org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:185)
at
org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:64)
at
org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$1.run(TransportInstanceSingleOperationAction.java:192)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: [Error: illegal unicode escape sequence]
[Near : {... Description": "\u003cp\u003eFake Operation\u003c/p ....}]

The unicode characters are not getting recognized. How to resolve this??

Regards,

Preeti

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/341a0af5-3df9-46bd-bd43-bb18aec8aafb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

brian_yoder · April 3, 2014, 9:54pm

Preeti,

Well, I tried to look at the addScript method code but that trail was
getting a bit too long.

However, I do have a JSON parser (wrapped inside a Bash script) that uses
the stream parser in the version of Jackson supplied with ES, and I can get
it to parse your JSON (stored in the script.json file) and then emit the
proper Unicode (UTF-8 to the console, even on Mac OS if there are Chinese
characters. Yay!) but without the escape sequences (they have been resolved
to their Unicode character values):

*$ parse-json.sh -j script.json *
{
"operationID" : 290,
"opsThreatLevel" : "Low",
"opsName" : "OPERATION_SIN",
"opsStartDate" : "2014-04-01T00:00:00",
"opsRefNumber" : "10245678",
"opsEndDate" : "2014-04-23T00:00:00",
"opsDescription" : "

Fake Operation

",
"opsComments" : "

fake clues

",
"dateCreated" : "2014-04-02T00:00:00"
}

So I'm sure that your JSON is valid and that your Unicode escape sequences
are correct. Perhaps you might try adding this document and not the
original one with the escape sequences, and see what kind of exception
message is generated.

Regards,
Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7fcb33f3-f706-4878-b568-1c865fdb2f79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

brian_yoder · April 7, 2014, 2:52pm

Preeti,

I believe that your problem is in calling the setScript method. My Java
code uses the IndexRequestBuilder.setSource method. Try that instead. (I
have no idea what setScript does, or is for. The Javadocs need a lot more
care and feeding.)

Vai the HTTP interface, I verified that Elasticsearch 1.1 does the right
thing. And I don't think that any previous version of ES would have had any
issues at all.

For this test, I created a document source within the
test/schema/ucode.json file. Note that I changed the fields to match my
current [sgen] (schema generation) test index, but the data in the text
field is yours:

{
"uid" : 777,
"cn" : "Aurelio Phzee",
"sex" : "M",
"married" : true,
"date" : "1952-04-15T15:22:51Z",
"location" : [ -117.172581, 32.67819 ],
"text" : [ "\u003cp\u003eFake Operation\u003c/p\u003e",
"\u003cp\u003efake clues\u003c/p\u003e" ]
}

Add the document:

$ curl -XPOST 'http://localhost:9200/sgen/person/777' --data-binary
@test/schema/ucode.json

Query it to verify that it was stored, and pretty-print the response.
Elasticsearch returns the orignal source with the \unnnn escape sequences:

$ curl -XGET 'http://localhost:9200/sgen/person/777?pretty=true'

{
"_index" : "sgen",
"_type" : "person",
"_id" : "777",
"_version" : 1,
"found" : true, "_source" : {
"uid" : 777,
"cn" : "Aurelio Phzee",
"sex" : "M",
"married" : true,
"date" : "1952-04-15T15:22:51Z",
"location" : [ -117.172581, 32.67819 ],
"text" : [ "\u003cp\u003eFake Operation\u003c/p\u003e",
"\u003cp\u003efake clues\u003c/p\u003e" ]
}

}

Query and parse the response into pretty JSON. The parse-json.sh Bash
script wraps my own JSON parser, and the -j option selects pretty-printed
JSON (other options show the low-level tokens found by the Jackson stream
parser that is embedded within ES):

$ curl -XGET 'http://localhost:9200/sgen/person/777' | parse-json.sh -j

{
"_index" : "sgen",
"_type" : "person",
"_id" : "777",
"_version" : 1,
"found" : true,
"_source" : {
"uid" : 777,
"cn" : "Aurelio Phzee",
"sex" : "M",
"married" : true,
"date" : "1952-04-15T15:22:51Z",
"location" : [ -117.172581, 32.67819 ],
"text" : [ "

Fake Operation

", "

fake clues

" ]
}
}

So I see no errors in handling this on the Elasticsearch side. Must be
somewhere else, and my guess is setScript vrs. setSource. I hope this helps!

Brian

On Thursday, April 3, 2014 8:43:51 AM UTC-4, Preeti Jain wrote:

Hi,

We are using elasticsearch version 1.0.1
For updating one of the ES docs, following script is being passed to
updaterequestbuilder via setScript method.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/01fc1102-d2eb-4ab7-b258-cc9bb62e8b6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Preeti_Jain · April 8, 2014, 1:22am

Thank you so much Brian for all your efforts.
The first time indexing of data containing unicode characters work
perfectly fine for us.
We are able to index data and query it without any issues.

The issue occurs when we are trying to update data. Strange thing is we are
using the same routine to convert our data in JSON, both, for first time
indexing and for update.
You seem to be right in saying that issue is with setScript method.

For now we have added one more step to convert unicode to characters(like \u003
to <) and then pass it on to setScript for update
The updated content gets posted without any issue.

Thanks,
Preeti
On Thursday, April 3, 2014 6:13:51 PM UTC+5:30, Preeti Jain wrote:

Hi,

We are using elasticsearch version 1.0.1
For updating one of the ES docs, following script is being passed to
updaterequestbuilder via setScript method.

ctx._source.operation= {
"operationID": 290,
"opsThreatLevel": "Low",
"opsName": "OPERATION_SIN",
"opsStartDate": "2014-04-01T00:00:00",
"opsRefNumber": "10245678",
"opsEndDate": "2014-04-23T00:00:00",
"opsDescription": "\u003cp\u003eFake Operation\u003c/p\u003e",
"opsComments": "\u003cp\u003efake clues\u003c/p\u003e",
"dateCreated": "2014-04-02T00:00:00"
}

When the update request gets executed we get following exception

Exception in thread "main"
org.elasticsearch.ElasticsearchIllegalArgumentException: failed to execute
script
at
org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:153)
at
org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:80)
at
org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:189)
at
org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:185)
at
org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:64)
at
org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$1.run(TransportInstanceSingleOperationAction.java:192)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: [Error: illegal unicode escape sequence]
[Near : {... Description": "\u003cp\u003eFake Operation\u003c/p ....}]

The unicode characters are not getting recognized. How to resolve this??

Regards,

Preeti

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4cd5fc57-3dee-4e98-ba4b-a6a1dc884c01%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

brian_yoder · April 8, 2014, 8:58pm

Preeti,

I just updated my Java update command to allow the source JSON to be
specified (it typically created the source from one or more name=value
pairs). It reads the JSON from a file so that there is no Bash nor non-ES
Java interpretation of the \unnnn sequences.

I updated the same record several times in a row without any failures or
exceptions of any kind. Queries always return the most recently updated
version of the document and the responses reflect all changes that were
made as well as the expected version number.

So, I am sure that using the setScript method is incorrect, and you
should be using the setSource method for your use case.

Regards,
Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8db339c5-d305-4c9a-be49-d118b66ae9ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Preeti_Jain · April 10, 2014, 5:14pm

Hi Brian,

Could you please share your java code?
Did you update the entire document or just one field? Our requirement is
update specific fields as well so just wondering how setSource would work
there?

Regards,
Preeti

On Wednesday, April 9, 2014 3:43:00 AM UTC+5:30, InquiringMind wrote:

Preeti,

I just updated my Java update command to allow the source JSON to be
specified (it typically created the source from one or more name=value
pairs). It reads the JSON from a file so that there is no Bash nor non-ES
Java interpretation of the \unnnn sequences.

I updated the same record several times in a row without any failures or
exceptions of any kind. Queries always return the most recently updated
version of the document and the responses reflect all changes that were
made as well as the expected version number.

So, I am sure that using the setScript method is incorrect, and you
should be using the setSource method for your use case.

Regards,
Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/93e2f65e-84c9-490c-b7c9-cfa356d81b98%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

brian_yoder · April 10, 2014, 6:40pm

Preeti,

Could you please share your java code?

Can't. But I can describe it.

Did you update the entire document or just one field? Our requirement is
update specific fields as well so just wondering how setSource would work
there?

I updated the entire document.

But, for updating a subset of just one or more fields in an existing
document, I first get the document (and remember its version). I then make
my changes to the source and index it against that version. So it works as
if I'm updating just a subset of the document.

If the update fails due to a version error, then I re-get the document and
repeat.

For some small number of retries before giving up, this has always worked
for ad-hoc updates. And it has the nice benefit that if one user updates
field A and another updates field B, the result will always contain the
most recent versions of field A and B regardless of the order and overlap
of the initial requests. Very cool of ES!

Yes, it's some extra work on my part. But it was worth it to us.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c45edec-50d4-4e7c-8564-fd2ce804d76a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.