I index PDFs using apache with the following mapping.
.field( "type", "attachment" )
.field("fields")
.startObject()
.startObject("file")
.field("store", "yes")
.endObject()
I want to index photos, I am able to extract text using OCR. I am confused
how to index the text though, do I treat it like any document and not as an
attachment? I have text as "String" when extracted and not base 64 like in
the case of pdfs?
I am confused to how it gets stored and how does it work if I need to make
it available during search? Can someone explain on how I do this?
XContentFactory.jsonBuilder().startObject()
.startObject(INDEX_TYPE)
.startObject("_source").field("enabled","no").endObject() //This
line will not store/not store the base 64 whole _source
.startObject("properties")
So, My photo object becomes something like this, what about the source (the
image itself ?)
jsonObject
{
"content":"text extracted from image"
"name":"my_photo.png"
}
//add to the bulk indexer for indexing
bulkProcessor.add(Requests.indexRequest(INDEX_NAME).type(INDEX_TYPE).id(
jsonObject.getString("name")).source(jsonObject.toString()));
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2012d7c6-b499-4318-8ae7-512879e5e8b8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.