How to load a json file that contains special characters


(Aussie Pete2015) #1

Hi all,

I have created a json file using Talend to load the text from sas code and transform it into json,
I have created an index however, the import fails because the sas code contains many different symbols

e.g.
{"index":{"_index":"sascode_idx", "_type":"content", "_id": "1"}}
{"BuildAllTriangles":[{"content":"/**************************************************************************\r\n* PROGRAM NAME : BuildAllTriangles.sas\r\n* PROGRAMMER : Peter Birk\r\n* DATE WRITTEN : 20120912\r\n* DESCRIPTION : \r\n\r\n\tMake lots of liability triangles in an improbably short amount of\r\n\tdevelopment time available.\r\n\r\n* DEPENDENCIES :\r\n\r\n\tRawFiles\Reference\CC\Reserving Triangle Delivery.xls\r\n\r\n\tMacro variables from SplitTransByReservingClass:\r\n\r\n\t&&SplitData&n..\r\n\t&SplitDataCount.\r\n\r\n* OUTPUTS"}]}

As you can see this is a json array which I've verified via http://jsonviewer.stack.hu/

I can load this json file into MongoDB but obviously there is an issue with elasticsearch and the characters in the content.

How can I modify the content to be accepted into Elasticsearch without altering the content too drastically?

Cheers


(Guilherme Maranhao) #2

Hi Aussie,

We've faced a similar issue in our indexing process. What we've done was removing all the special characters with a gsub method (our interface language to Elasticsearch is Ruby):

content = content.gsub(/[\“\”\"\'\\\']/m, ' ').gsub(/[\n\t\r]/m, ' ').gsub(/\s+/m, ' ').strip

Hope it works for you!

Guilherme


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.