Thanks David for your time.
I will work on it.
Yes. The mapper attachment plugin need Tika and provide it already.
As far as I remember CouchDB, you should encode your file in BASE64 and
send it as attachment.
See http://wiki.apache.org/couchdb/HTTP_Document_API#Attachments for
details.
Look at Inline Attachments. I think that it's the one I tested.
You should be able in CouchDB to retrieve your document by getting
http://localhost:5984/yourdb/yourjson/mydoc.pdf
If not, check with the CouchDB documentation (it's outside this mailing
list scope).
I will try to check on my side in the next days to see if the plugin works
as I was thinking it should.
BTW, please answer to the mailing list as someone else could also help you.
David.
Le 26 juillet 2012 à 16:09, odarboe mrcprolifica@gmail.com a écrit :
Hi David,
Ooh from what i understood up to here, i thought the mapper needs tika to
be able to search through different types of attachment files (pdf,
etc...).
I did not modify anything.
Currently i am able to convert any attachment to a base 64. I think what
is not clear to me is how to use the base 64 file after converting. should
i attach it as a document in couchdb or ? (I have hundreds of files to
attach).
My aim is to able able to index and search in all attachments in my
couchdb database. The attachment type include pdf, jpg, doc, dox, xls.
On Thursday, July 26, 2012 1:41:43 PM UTC, David Pilato wrote:
Hmmmm...
Just wondering why you are talking about tika.
Mapper-attachment is already providing tika. Do you modify something on
your side ?
With the couchDb river, I only extract the binary content from the couchDb
attachment and then I encode it in base64 before sending it to ES.
So if your attachment in couchDb is a PDF content, it should be available
for search in ES.
So, could you explain a bit more what you are meaning when you said that
you use Tika 1.1?
David.
Le 26 juillet 2012 à 15:27, odarboe < mrcprolifica@gmail.com> a écrit :
Hi David,
Thanks It works using the river couchdb version, now i am able to search
from text file attachments. Good.
But currently i don't have any hits on the attachments that are pdf or
base64. What do you thing i am missing. I am using tika 1.1
Thanks
On Wednesday, July 25, 2012 7:39:22 PM UTC, David Pilato wrote:
I uploaded a new version here : https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip
https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip
Do you want to test it before I submit a pull request?https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip
https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip
BTW, I suggest that you use mapper attachment plugin 1.4.0 :https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip
GitHub - elastic/elasticsearch-mapper-attachments: Mapper Attachments Type plugin for Elasticsearch
https://github.com/elasticsearch/elasticsearch-mapper-attachments
David. https://github.com/elasticsearch/elasticsearch-mapper-attachments
https://github.com/elasticsearch/elasticsearch-mapper-attachments
De : https://github.com/elasticsearch/elasticsearch-mapper-attachmentselasticsearch@googlegroups.com
[mailto: elasticsearch@googlegroups.comelasticsearch@googlegroups.com] De
la part de David Pilato
Envoyé : mercredi 25 juillet 2012 21:13
À : elasticsearch@googlegroups.comelasticsearch@googlegroups.com
Objet : RE: ES full text search on couchdb attachments documents
Hi,
Attachments from CouchDB are not indexed as attachments.
I started something about it some months ago but I don’t remember why I
did not submit a pull request:
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
If you need it, I can try to reopen it and see if I can submit a pull
request.https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
David.https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
De :https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachmentselasticsearch@googlegroups.com
[ elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com] De
la part de MRC
Envoyé : mercredi 25 juillet 2012 17:03
À : elasticsearch@googlegroups.comelasticsearch@googlegroups.com
Objet : ES full text search on couchdb attachments documents
Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials
and post on index and mapping attached document in a couchdb database for
weeks without success.
After running the codes below i don't have any hits from words that exist
in the couchdb attached files.
Software:
ES version 0.19.2
Plugin:
attachment mapper (ver1.0),
river-couchdb,
head
Step
I have 3 attached documents in couchdb. (1 pdf, 1 txt and json base64 file
of the pdf file)
databasename:mrctestdb
Code to create river
1 - curl -XPUT 'http://localhost:9200/_river/mrcriver/_meta' -d '
{
"type": "couchdb",
"couch-db": {
"host": "localhost",
"port": 5984,
"user": "admin",
"password": "admin",
"db": "mrctestdb",
"filter": null
},
"index": {
"index": "mrctestdb",
"type": "mrctestdb"
}
}'
Attachment mapping
2 -curl -X PUT http://localhost:9200/_river/mrcriver/_metahttp://127.0.0.1:9200/mrctestdb/mrctestdb/_mapping
-d '
{
"mrctestdb": {
"properties": {
"_attachments": {
"properties": {
""a.txt"": {
"type": "attachment",
"index": "analyzed"
},
""b.json"": {
"type": "attachment",
"index": "analyzed"
},
""x.pdf"": {
"type": "attachment",
"index": "analyzed"
}
}
},
"name": {
"type": "string"
}
}
}
}'
Search code: Search for MRC which is a word in the pdf file and json
3 - curl -XGET ' http://127.0.0.1:9200/mrctestdb/mrctestdb/_mappinghttp://localhost:9200/mrctestdb/mrctestdb/_search'
-d '{"query" : {"text" : { "_all" : "MRC" } }}'
When i search for text in the attachment file i have 0 hits.
Thank you in advance. http://localhost:9200/mrctestdb/mrctestdb/_search
http://localhost:9200/mrctestdb/mrctestdb/_search
--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs