Indexing Binary vs text


(IronMan2014) #1

I have couple of simple questions that I would like to clear up:

#1: For transportClient & cluster of two hosts: Do I have to add both hosts
to the client, or is it enough to add just one of them and the yml(s) will
take care of the clustering?

.addTransportAddress(new InetSocketTransportAddress(host[0], port))

.addTransportAddress(new InetSocketTransportAddress(host[1], port));

#2: Assume I have the following document structure:

jdoc{
"title":"my title"
"uid":"ux1234"
"tags":"ES"
"date":"1/1/2011"
"content":"Content of doc goes here"
}

//This is for my Binary attachment for Binaries (PDF)

putMappingResponse = new PutMappingRequestBuilder(
client.admin().indices() ).setIndices(INDEX_NAME).setType(INDEX_TYPE).
setSource(

                                      XContentFactory.jsonBuilder().

startObject()

                                        .startObject(INDEX_TYPE)

                                        .startObject("properties")

                                          //pdf

                                            .startObject("file")

                                                            .field( 

"type", "attachment" )

                                               .startObject("fields")

                                                   .startObject("title")

                                                       .field("store", 

"yes")

                                                   .endObject()

                                                   .startObject("file")

                                                       .field("store", 

"yes")

                                                       .field( 

"term_vector", "with_positions_offsets" )

                                                   .endObject()

                                               .endObject()

                                            .endObject()

                                          .endObject()

                                        .endObject()

                                      .endObject()

                                  ).execute().actionGet();

void indexDocument(JSONObject jdoc){

bulkProcessor.add(Requests.indexRequest(INDEX_NAME).type(INDEX_TYPE).id(
jDoc.getString("uid")).source(jDoc.toString()));
}

void indexBinaryDocument(JSONObject jdoc){

XContentBuilder source = jsonBuilder().startObject()

                                     .field("file", jDoc.getString(

CONTENT)) //from tika Binary 64

                                     .field("uid",jDoc.getString(UID))

                                     .field("date",jDoc.getString(DATE))
                                     ....

                                    .endObject();

bulkProcessor.add(Requests.indexRequest(INDEX_NAME).type(INDEX_TYPE).source
(source));
}

My Question:

Based on the document, I either call indexDocument for normal text docs or
indexBinaryDocument. However, this is confusing, I want to be able to call
one index function like "indexDocument" above without having to specify
source again for binary, In other words, if the document is binary, why do
I have to tell it about the "file" field again, couldn't I just replace the
"content" field with the 64 base encoded text, everything else in the
document is the same, only the content field is different? Somehow I feel
both of should one of the same?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2b59fd33-9d10-4b65-8b7a-f40d03bdbc83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2