What encoding to use in XContentBuilder?

In Java API, the class org.elasticsearch.common.xcontent.XContentBuilder
have method:

public XContentBuilder rawField(String fieldName, InputStream content) throws IOException {

    generator.writeRawField(fieldName, content, bos);
    return this;
}

I open an InputStream from filesytem (GridFS) and pass it as content to this method. I am interested in knowing which string encoding does ES expect in the InputStream that I pass in this method?

I am trying to do something like this...

String myString = "abcd";
InputStream myStream = new ByteArrayInputStream(myString.getBytes("UTF-8"));
//Creates file with name tempText in Gridfs with content passed in myStream
GridFSFile storedFile = gridops.store(myStream, "tempText");

//Later on at a different place in code I do...

GridFsResource resource = gridops.getResource("tempText");

InputStream fetchedInputStream = resource.getInputStream();
...

//Preparing document for indexing
XContentBuilder doc = jsonBuilder().startObject().rawField("content", fetchedInputStream).endObject();

// and then indexing this doc with esTranpsortClient...
Enter code here...

Now while performing search match query from REST api in terminal using curl on content field I am not able to retrieve any hits. I think the problem seems to be with encoding of inputstream that I am passing. Please help!

--

Did you install mapper attachment plugin and define a mapping with type:attachment ?

David

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 nov. 2012 à 08:36, Atharva Patel patelatharva@gmail.com a écrit :

In Java API, the class org.elasticsearch.common.xcontent.XContentBuilder have method:
public XContentBuilder rawField(String fieldName, InputStream content) throws IOException {
generator.writeRawField(fieldName, content, bos);
return this;
}

I open an InputStream from filesytem (GridFS) and pass it as content to this method. I am interested in knowing which string encoding does ES expect in the InputStream that I pass in this method?

I am trying to do something like this...

String myString = "abcd";
InputStream myStream = new ByteArrayInputStream(myString.getBytes("UTF-8"));
//Creates file with name tempText in Gridfs with content passed in myStream
GridFSFile storedFile = gridops.store(myStream, "tempText");

//Later on at a different place in code I do...

GridFsResource resource = gridops.getResource("tempText");

InputStream fetchedInputStream = resource.getInputStream();
...

//Preparing document for indexing
XContentBuilder doc = jsonBuilder().startObject().rawField("content", fetchedInputStream).endObject();

// and then indexing this doc with esTranpsortClient...
Enter code here...

Now while performing search match query from REST api in terminal using curl on content field I am not able to retrieve any hits. I think the problem seems to be with encoding of inputstream that I am passing. Please help!

--

Thanks for the quick reply!

No, I haven't. Is it essential to work with rawFields when I have
InputStream of string in my hand and using Java API? Also does that
indicate expected encoding is base64 for InputStream?

Thanks!

On Wednesday, 28 November 2012 13:28:56 UTC+5:30, David Pilato wrote:

Did you install mapper attachment plugin and define a mapping with
type:attachment ?

David

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 nov. 2012 à 08:36, Atharva Patel <patela...@gmail.com <javascript:>>
a écrit :

In Java API, the class org.elasticsearch.common.xcontent.XContentBuilder
have method:

public XContentBuilder rawField(String fieldName, InputStream content) throws IOException {

    generator.writeRawField(fieldName, content, bos);
    return this;
}

I open an InputStream from filesytem (GridFS) and pass it as content to this method. I am interested in knowing which string encoding does ES expect in the InputStream that I pass in this method?

I am trying to do something like this...

String myString = "abcd";
InputStream myStream = new ByteArrayInputStream(myString.getBytes("UTF-8"));
//Creates file with name tempText in Gridfs with content passed in myStream
GridFSFile storedFile = gridops.store(myStream, "tempText");

//Later on at a different place in code I do...

GridFsResource resource = gridops.getResource("tempText");

InputStream fetchedInputStream = resource.getInputStream();
...

//Preparing document for indexing
XContentBuilder doc = jsonBuilder().startObject().rawField("content", fetchedInputStream).endObject();

// and then indexing this doc with esTranpsortClient...
Enter code here...

Now while performing search match query from REST api in terminal using curl on content field I am not able to retrieve any hits. I think the problem seems to be with encoding of inputstream that I am passing. Please help!

--

--

Okay I have finally got it working : ) Thanks to the suggestion by David
Pilato, I have to install attachment mapper plugin in ES and restart it to
take effect. Deleted the older mapping for *content *field in ES and then
set it to type: attachment.

Here is the sample code I am inserting here.
/*

  • To change this template, choose Tools | Templates
  • and open the template in the editor.
    */
    package stories.testinges;

import java.io.ByteArrayInputStream;
import java.io.UnsupportedEncodingException;
import java.util.Enumeration;
import java.util.logging.Level;
import java.util.logging.Logger;
import static org.elasticsearch.common.xcontent.XContentFactory.*;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.json.simple.JSONObject;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.SequenceInputStream;
import java.util.ArrayList;
import java.util.Collections;
import org.elasticsearch.common.Base64;

/**
*

  • @author Atharva
    */
    public class TestingInputStreamES {

    public static void main(String[] args) {
    ByteArrayInputStream byteArrayInputStream = null;
    InputStream b64stream = null;
    InputStream seq = null;
    FileInputStream fin = null;
    try {
    TransportClient esclient = new TransportClient();
    esclient.addTransportAddress(new
    InetSocketTransportAddress("localhost", 9300));
    String myString = "apple tastes great";
    File tempFile = File.createTempFile("mystringfile", ".es");

         BufferedWriter bw = new BufferedWriter(new 
    

FileWriter(tempFile));

        String myStringEscaped = JSONObject.escape(myString);
        System.out.println(myStringEscaped);
        bw.write(myString);
        bw.flush();
        bw.close();
        String prefix = "\"";
        ByteArrayInputStream prefixStream = new 

ByteArrayInputStream(prefix.getBytes("UTF-8"));
ByteArrayInputStream suffixStream = new
ByteArrayInputStream(prefix.getBytes("UTF-8"));
byteArrayInputStream = new
ByteArrayInputStream(myStringEscaped.getBytes("UTF-8"));
fin = new FileInputStream(tempFile);
//This will also work
//b64stream = new Base64.InputStream(byteArrayInputStream,
Base64.ENCODE);
b64stream = new Base64.InputStream(fin, Base64.ENCODE);
ArrayList insList = new ArrayList(3);
insList.add(prefixStream);
insList.add(b64stream);
insList.add(suffixStream);
Enumeration enumeration =
Collections.enumeration(insList);
seq = new SequenceInputStream(enumeration);
//XContentBuilder doc =
jsonBuilder().startObject().rawField("content",byteArrayInputStream).endObject();
XContentBuilder doc =
jsonBuilder().startObject().field("junkField","garbage").rawField("content",
seq).endObject();
esclient.prepareIndex("test", "testtype",
"001").setSource(doc).execute().actionGet();
esclient.close();
tempFile.deleteOnExit();

    } catch (UnsupportedEncodingException ex) {

Logger.getLogger(TestingInputStreamES.class.getName()).log(Level.SEVERE,
null, ex);
} catch (Exception ex) {

Logger.getLogger(TestingInputStreamES.class.getName()).log(Level.SEVERE,
null, ex);
} finally {
try {
byteArrayInputStream.close();
b64stream.close();
fin.close();
} catch (IOException ex) {

Logger.getLogger(TestingInputStreamES.class.getName()).log(Level.SEVERE,
null, ex);
} catch(Exception ex){

Logger.getLogger(TestingInputStreamES.class.getName()).log(Level.SEVERE,
null, ex);
}

    }


}

}

On Wednesday, 28 November 2012 13:06:37 UTC+5:30, Atharva Patel wrote:

In Java API, the class org.elasticsearch.common.xcontent.XContentBuilder
have method:

public XContentBuilder rawField(String fieldName, InputStream content) throws IOException {

    generator.writeRawField(fieldName, content, bos);
    return this;
}

I open an InputStream from filesytem (GridFS) and pass it as content to this method. I am interested in knowing which string encoding does ES expect in the InputStream that I pass in this method?

I am trying to do something like this...

String myString = "abcd";
InputStream myStream = new ByteArrayInputStream(myString.getBytes("UTF-8"));
//Creates file with name tempText in Gridfs with content passed in myStream
GridFSFile storedFile = gridops.store(myStream, "tempText");

//Later on at a different place in code I do...

GridFsResource resource = gridops.getResource("tempText");

InputStream fetchedInputStream = resource.getInputStream();
...

//Preparing document for indexing
XContentBuilder doc = jsonBuilder().startObject().rawField("content", fetchedInputStream).endObject();

// and then indexing this doc with esTranpsortClient...
Enter code here...

Now while performing search match query from REST api in terminal using curl on content field I am not able to retrieve any hits. I think the problem seems to be with encoding of inputstream that I am passing. Please help!

--

I will also like to bring it into attention that if the _source is enabled
for this type of document, then in REST search response you will be seeing
base64 encoded string in the field value of the attachment type field (here
content). But it will be searchable by queries as it has been decoded and
indexed by ES. (Just to save you from some panic ; ))

On Wednesday, 28 November 2012 22:11:01 UTC+5:30, Atharva Patel wrote:

Okay I have finally got it working : ) Thanks to the suggestion by David
Pilato, I have to install attachment mapper plugin in ES and restart it to
take effect. Deleted the older mapping for *content *field in ES and then
set it to type: attachment.

Here is the sample code I am inserting here.
/*

  • To change this template, choose Tools | Templates
  • and open the template in the editor.
    */
    package stories.testinges;

import java.io.ByteArrayInputStream;
import java.io.UnsupportedEncodingException;
import java.util.Enumeration;
import java.util.logging.Level;
import java.util.logging.Logger;
import static org.elasticsearch.common.xcontent.XContentFactory.*;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.json.simple.JSONObject;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.SequenceInputStream;
import java.util.ArrayList;
import java.util.Collections;
import org.elasticsearch.common.Base64;

/**
*

  • @author Atharva
    */
    public class TestingInputStreamES {

    public static void main(String[] args) {
    ByteArrayInputStream byteArrayInputStream = null;
    InputStream b64stream = null;
    InputStream seq = null;
    FileInputStream fin = null;
    try {
    TransportClient esclient = new TransportClient();
    esclient.addTransportAddress(new
    InetSocketTransportAddress("localhost", 9300));
    String myString = "apple tastes great";
    File tempFile = File.createTempFile("mystringfile", ".es");

         BufferedWriter bw = new BufferedWriter(new 
    

FileWriter(tempFile));

        String myStringEscaped = JSONObject.escape(myString);
        System.out.println(myStringEscaped);
        bw.write(myString);
        bw.flush();
        bw.close();
        String prefix = "\"";
        ByteArrayInputStream prefixStream = new 

ByteArrayInputStream(prefix.getBytes("UTF-8"));
ByteArrayInputStream suffixStream = new
ByteArrayInputStream(prefix.getBytes("UTF-8"));
byteArrayInputStream = new
ByteArrayInputStream(myStringEscaped.getBytes("UTF-8"));
fin = new FileInputStream(tempFile);
//This will also work
//b64stream = new Base64.InputStream(byteArrayInputStream,
Base64.ENCODE);
b64stream = new Base64.InputStream(fin, Base64.ENCODE);
ArrayList insList = new ArrayList(3);
insList.add(prefixStream);
insList.add(b64stream);
insList.add(suffixStream);
Enumeration enumeration =
Collections.enumeration(insList);
seq = new SequenceInputStream(enumeration);
//XContentBuilder doc =
jsonBuilder().startObject().rawField("content",byteArrayInputStream).endObject();
XContentBuilder doc =
jsonBuilder().startObject().field("junkField","garbage").rawField("content",
seq).endObject();
esclient.prepareIndex("test", "testtype",
"001").setSource(doc).execute().actionGet();
esclient.close();
tempFile.deleteOnExit();

    } catch (UnsupportedEncodingException ex) {

Logger.getLogger(TestingInputStreamES.class.getName()).log(Level.SEVERE,
null, ex);
} catch (Exception ex) {

Logger.getLogger(TestingInputStreamES.class.getName()).log(Level.SEVERE,
null, ex);
} finally {
try {
byteArrayInputStream.close();
b64stream.close();
fin.close();
} catch (IOException ex) {

Logger.getLogger(TestingInputStreamES.class.getName()).log(Level.SEVERE,
null, ex);
} catch(Exception ex){

Logger.getLogger(TestingInputStreamES.class.getName()).log(Level.SEVERE,
null, ex);
}

    }


}

}

On Wednesday, 28 November 2012 13:06:37 UTC+5:30, Atharva Patel wrote:

In Java API, the class org.elasticsearch.common.xcontent.XContentBuilder
have method:

public XContentBuilder rawField(String fieldName, InputStream content) throws IOException {

    generator.writeRawField(fieldName, content, bos);
    return this;
}

I open an InputStream from filesytem (GridFS) and pass it as content to this method. I am interested in knowing which string encoding does ES expect in the InputStream that I pass in this method?

I am trying to do something like this...

String myString = "abcd";
InputStream myStream = new ByteArrayInputStream(myString.getBytes("UTF-8"));
//Creates file with name tempText in Gridfs with content passed in myStream
GridFSFile storedFile = gridops.store(myStream, "tempText");

//Later on at a different place in code I do...

GridFsResource resource = gridops.getResource("tempText");

InputStream fetchedInputStream = resource.getInputStream();
...

//Preparing document for indexing
XContentBuilder doc = jsonBuilder().startObject().rawField("content", fetchedInputStream).endObject();

// and then indexing this doc with esTranpsortClient...
Enter code here...

Now while performing search match query from REST api in terminal using curl on content field I am not able to retrieve any hits. I think the problem seems to be with encoding of inputstream that I am passing. Please help!

--