ES full text search on couchdb attachments documents


(odarboe) #1

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials
and post on index and mapping attached document in a couchdb database for
weeks without success.
After running the codes below i don't have any hits from words that exist
in the couchdb attached files.

*Software: *
ES version 0.19.2

Plugin:
attachment mapper (ver1.0),
river-couchdb,
head

Step
I have 3 attached documents in couchdb. (1 pdf, 1 txt and json base64 file
of the pdf file)

databasename:mrctestdb

Code to create river
1 - curl -XPUT 'http://localhost:9200/_river/mrcriver/_meta' -d '
{
"type": "couchdb",
"couch-db": {
"host": "localhost",
"port": 5984,
"user": "admin",
"password": "admin",
"db": "mrctestdb",
"filter": null
},
"index": {
"index": "mrctestdb",
"type": "mrctestdb"
}
}'

Attachment mapping
2 -curl -X PUT http://127.0.0.1:9200/mrctestdb/mrctestdb/_mapping -d '
{
"mrctestdb": {
"properties": {
"_attachments": {
"properties": {
""a.txt"": {
"type": "attachment",
"index": "analyzed"
},
""b.json"": {
"type": "attachment",
"index": "analyzed"
},
""x.pdf"": {
"type": "attachment",
"index": "analyzed"
}
}
},
"name": {
"type": "string"
}
}
}
}'

Search code: Search for MRC which is a word in the pdf file and json
3 - curl -XGET 'http://localhost:9200/mrctestdb/mrctestdb/_search' -d
'{"query" : {"text" : { "_all" : "MRC" } }}'

When i search for text in the attachment file i have 0 hits.

Thank you in advance.


(David Pilato) #2

Hi,

Attachments from CouchDB are not indexed as attachments.

I started something about it some months ago but I don’t remember why I did not submit a pull request: https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

If you need it, I can try to reopen it and see if I can submit a pull request.

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de MRC
Envoyé : mercredi 25 juillet 2012 17:03
À : elasticsearch@googlegroups.com
Objet : ES full text search on couchdb attachments documents

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials and post on index and mapping attached document in a couchdb database for weeks without success.
After running the codes below i don't have any hits from words that exist in the couchdb attached files.

Software:
ES version 0.19.2

Plugin:
attachment mapper (ver1.0),
river-couchdb,
head

Step
I have 3 attached documents in couchdb. (1 pdf, 1 txt and json base64 file of the pdf file)

databasename:mrctestdb

Code to create river
1 - curl -XPUT 'http://localhost:9200/_river/mrcriver/_meta' -d '
{
"type": "couchdb",
"couch-db": {
"host": "localhost",
"port": 5984,
"user": "admin",
"password": "admin",
"db": "mrctestdb",
"filter": null
},
"index": {
"index": "mrctestdb",
"type": "mrctestdb"
}
}'

Attachment mapping
2 -curl -X PUT http://127.0.0.1:9200/mrctestdb/mrctestdb/_mapping -d '
{
"mrctestdb": {
"properties": {
"_attachments": {
"properties": {
""a.txt"": {
"type": "attachment",
"index": "analyzed"
},
""b.json"": {
"type": "attachment",
"index": "analyzed"
},
""x.pdf"": {
"type": "attachment",
"index": "analyzed"
}
}
},
"name": {
"type": "string"
}
}
}
}'

Search code: Search for MRC which is a word in the pdf file and json
3 - curl -XGET 'http://localhost:9200/mrctestdb/mrctestdb/_search' -d '{"query" : {"text" : { "_all" : "MRC" } }}'

When i search for text in the attachment file i have 0 hits.

Thank you in advance.


(David Pilato) #3

I uploaded a new version here : https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

Do you want to test it before I submit a pull request?

BTW, I suggest that you use mapper attachment plugin 1.4.0 :

https://github.com/elasticsearch/elasticsearch-mapper-attachments https://github.com/elasticsearch/elasticsearch-mapper-attachments

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de David Pilato
Envoyé : mercredi 25 juillet 2012 21:13
À : elasticsearch@googlegroups.com
Objet : RE: ES full text search on couchdb attachments documents

Hi,

Attachments from CouchDB are not indexed as attachments.

I started something about it some months ago but I don’t remember why I did not submit a pull request: https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

If you need it, I can try to reopen it and see if I can submit a pull request.

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de MRC
Envoyé : mercredi 25 juillet 2012 17:03
À : elasticsearch@googlegroups.com
Objet : ES full text search on couchdb attachments documents

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials and post on index and mapping attached document in a couchdb database for weeks without success.
After running the codes below i don't have any hits from words that exist in the couchdb attached files.

Software:
ES version 0.19.2

Plugin:
attachment mapper (ver1.0),
river-couchdb,
head

Step
I have 3 attached documents in couchdb. (1 pdf, 1 txt and json base64 file of the pdf file)

databasename:mrctestdb

Code to create river
1 - curl -XPUT 'http://localhost:9200/_river/mrcriver/_meta' -d '
{
"type": "couchdb",
"couch-db": {
"host": "localhost",
"port": 5984,
"user": "admin",
"password": "admin",
"db": "mrctestdb",
"filter": null
},
"index": {
"index": "mrctestdb",
"type": "mrctestdb"
}
}'

Attachment mapping
2 -curl -X PUT http://127.0.0.1:9200/mrctestdb/mrctestdb/_mapping -d '
{
"mrctestdb": {
"properties": {
"_attachments": {
"properties": {
""a.txt"": {
"type": "attachment",
"index": "analyzed"
},
""b.json"": {
"type": "attachment",
"index": "analyzed"
},
""x.pdf"": {
"type": "attachment",
"index": "analyzed"
}
}
},
"name": {
"type": "string"
}
}
}
}'

Search code: Search for MRC which is a word in the pdf file and json
3 - curl -XGET 'http://localhost:9200/mrctestdb/mrctestdb/_search' -d '{"query" : {"text" : { "_all" : "MRC" } }}'

When i search for text in the attachment file i have 0 hits.

Thank you in advance.


(odarboe) #4

Thank you David for your response.
I will test it before you make the pull request. I will get back to you.
thanks

On Wednesday, July 25, 2012 7:39:22 PM UTC, David Pilato wrote:

I uploaded a new version here :
https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

Do you want to test it before I submit a pull request?

BTW, I suggest that you use mapper attachment plugin 1.4.0 :

https://github.com/elasticsearch/elasticsearch-mapper-attachments

David.

De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de David Pilato
Envoyé : mercredi 25 juillet 2012 21:13
À : elasticsearch@googlegroups.com
Objet : RE: ES full text search on couchdb attachments documents

Hi,

Attachments from CouchDB are not indexed as attachments.

I started something about it some months ago but I don’t remember why I
did not submit a pull request:
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

If you need it, I can try to reopen it and see if I can submit a pull
request.

David.

De : elasticsearch@googlegroups.com [
mailto:elasticsearch@googlegroups.com elasticsearch@googlegroups.com] De
la part de
MRC
Envoyé : mercredi 25 juillet 2012 17:03
À : elasticsearch@googlegroups.com
Objet : ES full text search on couchdb attachments documents

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials
and post on index and mapping attached document in a couchdb database for
weeks without success.
After running the codes below i don't have any hits from words that exist
in the couchdb attached files.

*Software: *
ES version 0.19.2

Plugin:
attachment mapper (ver1.0),
river-couchdb,
head

Step
I have 3 attached documents in couchdb. (1 pdf, 1 txt and json base64 file
of the pdf file)

databasename:mrctestdb

Code to create river
1 - curl -XPUT 'http://localhost:9200/_river/mrcriver/_meta' -d '
{
"type": "couchdb",
"couch-db": {
"host": "localhost",
"port": 5984,
"user": "admin",
"password": "admin",
"db": "mrctestdb",
"filter": null
},
"index": {
"index": "mrctestdb",
"type": "mrctestdb"
}
}'

Attachment mapping
2 -curl -X PUT http://127.0.0.1:9200/mrctestdb/mrctestdb/_mapping -d '
{
"mrctestdb": {
"properties": {
"_attachments": {
"properties": {
""a.txt"": {
"type": "attachment",
"index": "analyzed"
},
""b.json"": {
"type": "attachment",
"index": "analyzed"
},
""x.pdf"": {
"type": "attachment",
"index": "analyzed"
}
}
},
"name": {
"type": "string"
}
}
}
}'

Search code: Search for MRC which is a word in the pdf file and json
3 - curl -XGET 'http://localhost:9200/mrctestdb/mrctestdb/_search' -d
'{"query" : {"text" : { "_all" : "MRC" } }}'

When i search for text in the attachment file i have 0 hits.

Thank you in advance.


(David Pilato) #5

Hmmmm...

Just wondering why you are talking about tika.
Mapper-attachment is already providing tika. Do you modify something on your
side ?

With the couchDb river, I only extract the binary content from the couchDb
attachment and then I encode it in base64 before sending it to ES.

So if your attachment in couchDb is a PDF content, it should be available for
search in ES.

So, could you explain a bit more what you are meaning when you said that you use
Tika 1.1?

David.

Le 26 juillet 2012 à 15:27, odarboe mrcprolifica@gmail.com a écrit :

Hi David,

Thanks It works using the river couchdb version, now i am able to search from
text file attachments. Good.
But currently i don't have any hits on the attachments that are pdf or
base64. What do you thing i am missing. I am using tika 1.1

Thanks

On Wednesday, July 25, 2012 7:39:22 PM UTC, David Pilato wrote:

I uploaded a new version here :
https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip
https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

<https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip>

Do you want to test it before I submit a pull request?
https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

<https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip>

BTW, I suggest that you use mapper attachment plugin 1.4.0 :
https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

https://github.com/elasticsearch/elasticsearch-mapper-attachments
https://github.com/elasticsearch/elasticsearch-mapper-attachments

 <https://github.com/elasticsearch/elasticsearch-mapper-attachments>

David.
https://github.com/elasticsearch/elasticsearch-mapper-attachments

 <https://github.com/elasticsearch/elasticsearch-mapper-attachments>

De : https://github.com/elasticsearch/elasticsearch-mapper-attachments
elasticsearch@googlegroups.com mailto:elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de David Pilato
Envoyé : mercredi 25 juillet 2012 21:13
À : mailto:elasticsearch@googlegroups.com
elasticsearch@googlegroups.com mailto:elasticsearch@googlegroups.com
Objet : RE: ES full text search on couchdb attachments documents

Hi,

Attachments from CouchDB are not indexed as attachments.

I started something about it some months ago but I don’t remember why I
did not submit a pull request:
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

<https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments>

If you need it, I can try to reopen it and see if I can submit a pull
request.
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

<https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments>

David.
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

<https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments>


<https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments>


<https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments>

De :
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
elasticsearch@googlegroups.com mailto:elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de MRC
Envoyé : mercredi 25 juillet 2012 17:03
À : mailto:elasticsearch@googlegroups.com
elasticsearch@googlegroups.com mailto:elasticsearch@googlegroups.com
Objet : ES full text search on couchdb attachments documents

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials
and post on index and mapping attached document in a couchdb database for
weeks without success.
After running the codes below i don't have any hits from words that exist
in the couchdb attached files.

Software:
ES version 0.19.2

Plugin:
attachment mapper (ver1.0),
river-couchdb,
head

Step
I have 3 attached documents in couchdb. (1 pdf, 1 txt and json base64
file of the pdf file)

databasename:mrctestdb

Code to create river
1 - curl -XPUT 'http://localhost:9200/_river/mrcriver/_meta' -d '
{
"type": "couchdb",
"couch-db": {
"host": "localhost",
"port": 5984,
"user": "admin",
"password": "admin",
"db": "mrctestdb",
"filter": null
},
"index": {
"index": "mrctestdb",
"type": "mrctestdb"
}
}'

Attachment mapping
2 -curl -X PUT http://localhost:9200/_river/mrcriver/_meta
http://127.0.0.1:9200/mrctestdb/mrctestdb/_mapping -d '
{
"mrctestdb": {
"properties": {
"_attachments": {
"properties": {
""a.txt"": {
"type": "attachment",
"index": "analyzed"
},
""b.json"": {
"type": "attachment",
"index": "analyzed"
},
""x.pdf"": {
"type": "attachment",
"index": "analyzed"
}
}
},
"name": {
"type": "string"
}
}
}
}'

Search code: Search for MRC which is a word in the pdf file and json
3 - curl -XGET ' http://127.0.0.1:9200/mrctestdb/mrctestdb/_mapping
http://localhost:9200/mrctestdb/mrctestdb/_search' -d '{"query" : {"text" :
{ "_all" : "MRC" } }}'

When i search for text in the attachment file i have 0 hits.

Thank you in advance. http://localhost:9200/mrctestdb/mrctestdb/_search

http://localhost:9200/mrctestdb/mrctestdb/_search

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


(David Pilato) #6

Yes. The mapper attachment plugin need Tika and provide it already.

As far as I remember CouchDB, you should encode your file in BASE64 and send it
as attachment.
See http://wiki.apache.org/couchdb/HTTP_Document_API#Attachments
http://wiki.apache.org/couchdb/HTTP_Document_API#Attachments for details.
Look at Inline Attachments. I think that it's the one I tested.

You should be able in CouchDB to retrieve your document by getting
http://localhost:5984/yourdb/yourjson/mydoc.pdf
If not, check with the CouchDB documentation (it's outside this mailing list
scope).

I will try to check on my side in the next days to see if the plugin works as I
was thinking it should.

BTW, please answer to the mailing list as someone else could also help you.

David.

Le 26 juillet 2012 à 16:09, odarboe mrcprolifica@gmail.com a écrit :

Hi David,
Ooh from what i understood up to here, i thought the mapper needs tika to be
able to search through different types of attachment files (pdf, etc...).
I did not modify anything.

Currently i am able to convert any attachment to a base 64. I think what is
not clear to me is how to use the base 64 file after converting. should i
attach it as a document in couchdb or ? (I have hundreds of files to attach).

My aim is to able able to index and search in all attachments in my couchdb
database. The attachment type include pdf, jpg, doc, dox, xls.

On Thursday, July 26, 2012 1:41:43 PM UTC, David Pilato wrote:

Hmmmm...

Just wondering why you are talking about tika.
Mapper-attachment is already providing tika. Do you modify something on
your side ?

With the couchDb river, I only extract the binary content from the
couchDb attachment and then I encode it in base64 before sending it to ES.

So if your attachment in couchDb is a PDF content, it should be available
for search in ES.

So, could you explain a bit more what you are meaning when you said that
you use Tika 1.1?

David.

Le 26 juillet 2012 à 15:27, odarboe < mrcprolifica@gmail.com
mailto:mrcprolifica@gmail.com > a écrit :

> > > Hi David,
Thanks It works using the river couchdb version, now i am able to

search from text file attachments. Good.
But currently i don't have any hits on the attachments that are pdf or
base64. What do you thing i am missing. I am using tika 1.1

Thanks

On Wednesday, July 25, 2012 7:39:22 PM UTC, David Pilato wrote:
  > > > > 
  I uploaded a new version here :

https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip
https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

   <https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip>

  Do you want to test it before I submit a pull request?

https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

   <https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip>

  BTW, I suggest that you use mapper attachment plugin 1.4.0 :

https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

  https://github.com/elasticsearch/elasticsearch-mapper-attachments

https://github.com/elasticsearch/elasticsearch-mapper-attachments

   <https://github.com/elasticsearch/elasticsearch-mapper-attachments>

  David.

https://github.com/elasticsearch/elasticsearch-mapper-attachments

   <https://github.com/elasticsearch/elasticsearch-mapper-attachments>

  De :

https://github.com/elasticsearch/elasticsearch-mapper-attachments
elasticsearch@googlegroups.com [mailto:
mailto:elasticsearch@googlegroups.com elasticsearch@googlegroups.com]
De la part de David Pilato
Envoyé : mercredi 25 juillet 2012 21:13
À : mailto:elasticsearch@googlegroups.com
elasticsearch@googlegroups.com mailto:elasticsearch@googlegroups.com
Objet : RE: ES full text search on couchdb attachments documents

  Hi,





  Attachments from CouchDB are not indexed as attachments.

  I started something about it some months ago but I don’t remember

why I did not submit a pull request:
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

   <https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments>

  If you need it, I can try to reopen it and see if I can submit a

pull request.
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

   <https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments>

  David.

https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

   <https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments>


   <https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments>


   <https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments>

  De :

https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
elasticsearch@googlegroups.com [ mailto:elasticsearch@googlegroups.com
mailto:elasticsearch@googlegroups.com] De la part de MRC
Envoyé : mercredi 25 juillet 2012 17:03
À : mailto:elasticsearch@googlegroups.com
elasticsearch@googlegroups.com mailto:elasticsearch@googlegroups.com
Objet : ES full text search on couchdb attachments documents

  Dear All,
  I am new to elasticsearch. I have tried to follow the different

tutorials and post on index and mapping attached document in a couchdb
database for weeks without success.
After running the codes below i don't have any hits from words
that exist in the couchdb attached files.

  Software:
  ES version 0.19.2

  Plugin:
  attachment mapper (ver1.0),
  river-couchdb,
  head

  Step
  I have 3 attached documents in couchdb. (1 pdf, 1 txt and json

base64 file of the pdf file)

  databasename:mrctestdb

  Code to create river
  1 - curl -XPUT 'http://localhost:9200/_river/mrcriver/_meta' -d '
  {
    "type": "couchdb",
    "couch-db": {
      "host": "localhost",
      "port": 5984,
      "user": "admin",
      "password": "admin",
      "db": "mrctestdb",
      "filter": null
    },
    "index": {
      "index": "mrctestdb",
      "type": "mrctestdb"
    }
  }'

  Attachment mapping
  2 -curl -X PUT <http://localhost:9200/_river/mrcriver/_meta>

http://127.0.0.1:9200/mrctestdb/mrctestdb/_mapping -d '
{
"mrctestdb": {
"properties": {
"_attachments": {
"properties": {
""a.txt"": {
"type": "attachment",
"index": "analyzed"
},
""b.json"": {
"type": "attachment",
"index": "analyzed"
},
""x.pdf"": {
"type": "attachment",
"index": "analyzed"
}
}
},
"name": {
"type": "string"
}
}
}
}'

  Search code: Search for MRC which is a word in the pdf file  and

json
3 - curl -XGET '
http://127.0.0.1:9200/mrctestdb/mrctestdb/_mapping
http://localhost:9200/mrctestdb/mrctestdb/_search' -d '{"query" :
{"text" : { "_all" : "MRC" } }}'

  When i search for text in the attachment file i have 0 hits.

  Thank you in advance.

http://localhost:9200/mrctestdb/mrctestdb/_search

> > > 
 <http://localhost:9200/mrctestdb/mrctestdb/_search>

--
David Pilato
http://www.scrutmydocs.org/ http://www.scrutmydocs.org/
http://dev.david.pilato.fr/ http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


(odarboe) #7

Thanks David for your time.
I will work on it.

On Thursday, July 26, 2012 2:49:36 PM UTC, David Pilato wrote:

Yes. The mapper attachment plugin need Tika and provide it already.

As far as I remember CouchDB, you should encode your file in BASE64 and
send it as attachment.

See http://wiki.apache.org/couchdb/HTTP_Document_API#Attachments for
details.

Look at Inline Attachments. I think that it's the one I tested.

You should be able in CouchDB to retrieve your document by getting
http://localhost:5984/yourdb/yourjson/mydoc.pdf

If not, check with the CouchDB documentation (it's outside this mailing
list scope).

I will try to check on my side in the next days to see if the plugin works
as I was thinking it should.

BTW, please answer to the mailing list as someone else could also help you.

David.

Le 26 juillet 2012 à 16:09, odarboe mrcprolifica@gmail.com a écrit :

Hi David,
Ooh from what i understood up to here, i thought the mapper needs tika to
be able to search through different types of attachment files (pdf,
etc...).
I did not modify anything.

Currently i am able to convert any attachment to a base 64. I think what
is not clear to me is how to use the base 64 file after converting. should
i attach it as a document in couchdb or ? (I have hundreds of files to
attach).

My aim is to able able to index and search in all attachments in my
couchdb database. The attachment type include pdf, jpg, doc, dox, xls.

On Thursday, July 26, 2012 1:41:43 PM UTC, David Pilato wrote:

Hmmmm...

Just wondering why you are talking about tika.

Mapper-attachment is already providing tika. Do you modify something on
your side ?

With the couchDb river, I only extract the binary content from the couchDb
attachment and then I encode it in base64 before sending it to ES.

So if your attachment in couchDb is a PDF content, it should be available
for search in ES.

So, could you explain a bit more what you are meaning when you said that
you use Tika 1.1?

David.

Le 26 juillet 2012 à 15:27, odarboe < mrcprolifica@gmail.com> a écrit :

Hi David,

Thanks It works using the river couchdb version, now i am able to search
from text file attachments. Good.
But currently i don't have any hits on the attachments that are pdf or
base64. What do you thing i am missing. I am using tika 1.1

Thanks

On Wednesday, July 25, 2012 7:39:22 PM UTC, David Pilato wrote:

I uploaded a new version here : https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

Do you want to test it before I submit a pull request?https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

BTW, I suggest that you use mapper attachment plugin 1.4.0 :https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

https://github.com/elasticsearch/elasticsearch-mapper-attachments

https://github.com/elasticsearch/elasticsearch-mapper-attachments

David. https://github.com/elasticsearch/elasticsearch-mapper-attachments

https://github.com/elasticsearch/elasticsearch-mapper-attachments

De : https://github.com/elasticsearch/elasticsearch-mapper-attachmentselasticsearch@googlegroups.com
[mailto: elasticsearch@googlegroups.comelasticsearch@googlegroups.com] De
la part de
David Pilato
Envoyé : mercredi 25 juillet 2012 21:13
À : elasticsearch@googlegroups.comelasticsearch@googlegroups.com
Objet : RE: ES full text search on couchdb attachments documents

Hi,

Attachments from CouchDB are not indexed as attachments.

I started something about it some months ago but I don’t remember why I
did not submit a pull request:
https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

If you need it, I can try to reopen it and see if I can submit a pull
request.https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

David.https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

De :https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachmentselasticsearch@googlegroups.com
[ elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com] De
la part de
MRC
Envoyé : mercredi 25 juillet 2012 17:03
À : elasticsearch@googlegroups.comelasticsearch@googlegroups.com
Objet : ES full text search on couchdb attachments documents

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials
and post on index and mapping attached document in a couchdb database for
weeks without success.
After running the codes below i don't have any hits from words that exist
in the couchdb attached files.

Software:
ES version 0.19.2

Plugin:
attachment mapper (ver1.0),
river-couchdb,
head

Step
I have 3 attached documents in couchdb. (1 pdf, 1 txt and json base64 file
of the pdf file)

databasename:mrctestdb

Code to create river
1 - curl -XPUT 'http://localhost:9200/_river/mrcriver/_meta' -d '
{
"type": "couchdb",
"couch-db": {
"host": "localhost",
"port": 5984,
"user": "admin",
"password": "admin",
"db": "mrctestdb",
"filter": null
},
"index": {
"index": "mrctestdb",
"type": "mrctestdb"
}
}'

Attachment mapping
2 -curl -X PUT http://localhost:9200/_river/mrcriver/_metahttp://127.0.0.1:9200/mrctestdb/mrctestdb/_mapping
-d '
{
"mrctestdb": {
"properties": {
"_attachments": {
"properties": {
""a.txt"": {
"type": "attachment",
"index": "analyzed"
},
""b.json"": {
"type": "attachment",
"index": "analyzed"
},
""x.pdf"": {
"type": "attachment",
"index": "analyzed"
}
}
},
"name": {
"type": "string"
}
}
}
}'

Search code: Search for MRC which is a word in the pdf file and json
3 - curl -XGET ' http://127.0.0.1:9200/mrctestdb/mrctestdb/_mappinghttp://localhost:9200/mrctestdb/mrctestdb/_search'
-d '{"query" : {"text" : { "_all" : "MRC" } }}'

When i search for text in the attachment file i have 0 hits.

Thank you in advance. http://localhost:9200/mrctestdb/mrctestdb/_search

http://localhost:9200/mrctestdb/mrctestdb/_search

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


(system) #8