I index PDFs using apache with the following mapping.
.field( "type", "attachment" )
I want to index photos, I am able to extract text using OCR. I am confused
how to index the text though, do I treat it like any document and not as an
attachment? I have text as "String" when extracted and not base 64 like in
the case of pdfs?
I am confused to how it gets stored and how does it work if I need to make
it available during search? Can someone explain on how I do this?
.startObject(INDEX_TYPE) .startObject("_source").field("enabled","no").endObject() //This
line will not store/not store the base 64 whole _source
So, My photo object becomes something like this, what about the source (the
image itself ?)
"content":"text extracted from image"
//add to the bulk indexer for indexing
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2012d7c6-b499-4318-8ae7-512879e5e8b8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.