Invalid UTF-8 middle byte 0x22\n

So, I am trying to create a flask api that gets the name of the field of search and the term that we are going to search, then, the script writes the info into a file and uses subprocess to run a search, see:

file = open("filtro_match.json")
                filtro = file.read()
                file.close()
                lista = word.split("|")
                filtro = filtro.replace("$campo",str(lista[0]))
                filtro = filtro.replace("$busca",str(lista[1]))
                #return "Ok"
                file = open("filtro_match_temp.json", "w")
                file.write(filtro)
                file.close()
                resultado = subprocess.check_output('curl -XGET "localhost:9200/pessoas/_search?pretty" -H "Content-Type: application/json" --data @filtro_match_temp.json')
                temp = resultado.decode('utf-8')
                obj = json.loads(temp)

However, everytime that I try to run it,I receive this error:

"type" : "json_parse_exception",
        "reason" : "Invalid UTF-8 middle byte 0x22\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@f868020; line: 1, column: 39]"

I have no idea from where this "Invalid UTF-8 middle byte" came from, I write the file with python, can someone help me out?

A guess is that your json file is encoded with another format than UTF8

Yeah, I realized a little later....

But what I can do? I am brazilian and we have a lot of non utf-8 in our names (and my project deal both with people and place names). Is there someway to make elastic or json accept the non utf-8 characters to make a search? It do allows it to be stored but not to make a search.

If there is no way, could you help me with updating the database? I mean, how I can update one field (let's say "name") of a entrance in the database without updating all the rest?

PS: thanks for the answer!

I don't think so.

I don't know how to do this is Python but I guess you could do something like this (Java code):

String json = "YOUR CONTENT ENCODED WITH YOUR ENCODING";
String jsonUtf8 = new String(json.getBytes("YOUR_ENCODING"), "UTF-8");

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.