How to create index for a attachment of a doc in couchDB with ES?

Hi,

Just wondering why you try to send a mapping to ES ?

BTW, your mapping is incorrect. You don’t have to define each attachment.

In the README file, I wrote:

$ curl -X PUT http://127.0.0.1:9200/my_db/my_db/_mapping -d '{

"my_db": {

"properties": {

  "_attachments": {

    "properties": {

      "attachment": {

        "type": "attachment"

      }

    }

  },

  "yourfield" : {

    "type": "string"

  }

}

}

}'

I suggest that you try to simply get attachments from couchDb with the example I wrote and then, if working and if you really need to, play with mappings.

Try also to make simplier searches to start. Something like:

curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_all" : "temperature" } }

(note the lowercase t on temperature).

Documentation is here : https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

HTH

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de Chi Dung Tran
Envoyé : dimanche 8 janvier 2012 02:16
À : "elasticsearch@googlegroups.com"
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hi

But it doesn'e work for me.

I receive the same results as the mail I have already sent (below)

Please explain it to me.
You can see, I donot have hits.

Did I make a mistake or not?

Thanks

----- Mail transféré -----
De : Chi Dung Tran dungtctin4@yahoo.com
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Jeudi 29 Décembre 2011 16h42
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hello

I have tested it but it doesnot work well.

I attach 4 files to the 2 couchdb documents like that:

{
"_id": "Doc1",
"_rev": "5-4d607b7d88985097462ae9b2f67bc5ac",
"message": "Elastic Search",
"_attachments": {
"exam.docx": {
"content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"revpos": 4,
"digest": "md5-ecdBcsbc6w7mC1EOgd5SIg==",
"length": 10007,
"stub": true
},
"2230681.pdf": {
"content_type": "application/pdf",
"revpos": 2,
"digest": "md5-BUhqhHiVqKybxrfGsQTixQ==",
"length": 956146,
"stub": true
}
}
}

{
"_id": "Doc2",
"_rev": "7-dd58025abc2002566b6f458ad3d83d4d",
"message": test attachments",
"_attachments": {
"TestAttachments.txt": {
"content_type": "text/plain",
"revpos": 6,
"digest": "md5-aLTD+adMRHPw2+WMIN/42Q==",
"length": 89,
"stub": true
},
"DynamicPublishingUseCases.doc": {
"content_type": "application/msword",
"revpos": 2,
"digest": "md5-FRdhydLr57C+q3ff6xLEmA==",
"length": 22528,
"stub": true
}
}
}

Here is my test with elastic search:

curl -X PUT "localhost:9200/test_idx_couchdb_attachments"
{"ok":true,"acknowledged":true}

curl -XPUT 'http://localhost:9200/_river/test_river_couchdb_attachments/_meta' -d '{"type" : "couchdb", "couchdb" : {"host" : "localhost","port" : 5984,"db" : "my_test_couchdb_attachments","filter" : null,"ignore_attachments":false}},"index" : {"index" : "test_idx_couchdb_attachments", "type" : "test_mapping_couchdb_attachments" } }'
{"ok":true,"_index":"_river","_type":"test_river_couchdb_attachments","_id":"_meta","_version":1}

At first, I type:

curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
"2230681.pdf": {
"type": "attachment", "index" : "analyzed"
},
"DynamicPublishingUseCases.doc": {
"type": "attachment", "index" : "analyzed"
},
"TestAttachments.txt": {
"type": "attachment", "index" : "analyzed"
},
"exam.docx": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'

and I receive results:

{"error":"MergeMappingException[Merge failed with failures {[Can't merge a non object mapping [TestAttachments.txt] with an object mapping [TestAttachments.txt], Can't merge a non object mapping [2230681.pdf] with an object mapping [2230681.pdf], Can't merge a non object mapping [exam.docx] with an object mapping [exam.docx], Can't merge a non object mapping [DynamicPublishingUseCases.doc] with an object mapping [DynamicPublishingUseCases.doc]]}]

So I change and I succeed:

curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
""2230681.pdf"": {
"type": "attachment", "index" : "analyzed"
},
""DynamicPublishingUseCases.doc"": {
"type": "attachment", "index" : "analyzed"
},
""TestAttachments.txt"": {
"type": "attachment", "index" : "analyzed"
},
""exam.docx"": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
{"ok":true,"acknowledged":true}

curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"wildcard" : { "_all" : "*" } } }'

This query works well by returning two documents

these queries donot work well with errors or no expected results:curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments."2230681.pdf".content" : "Temperature" } } }'{"took":0,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }'{"error":"SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[CHYAFYCERMGHlBvKHiEagA][my_test_couchdb_attachments][3]: SearchParseException[[my_test_couchdb_attachments][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }]]]; nested: QueryParsingException[[my_test_couchdb_attachments] Failed to parse]; nested: JsonParseException[Unexpected character ('T' (code 84)): was expecting a colon to separate field name and value\n at [Source: [B@582a85; line: 1, column: 56]]; }]curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text" : { "_attachments."DynamicPublishingUseCases.doc"" : "Rendering" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] }}curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text_phrase" : { "_attachments."TestAttachments.txt"" : "Couchdb" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] }}I have used Elastic search 0.18.4 with river Couchdb 1.1.0 and mapper-attachments plugins 1.1.0Please explain to me and how to index couchdb attachments and make it searchable?Maybe I should submit the pdf files whose content is encoded with base64? All my files in my test arenot encoded.Thanks a lot


De : David Pilato david@pilato.fr
À : elasticsearch elasticsearch@googlegroups.com
Envoyé le : Mercredi 28 Décembre 2011 14h12
Objet : Re: how to create index for a attachment of a doc in couchDB with ES?
Did anyone test it ?BTW, I updated the README file :https://github.com/dadoonet/elasticsearch-river-couchdb/blob/attachments/README.mdPlease let me know (CouchDB river users) if there is any regression orif I can submit the pull request.Thanks,David.On 22 déc, 00:11, "David Pilato" da...@pilato.fr wrote:> Hi there,>> I just finished something to deal with couchDb attachments using elasticsearch-mapper-attachments.>> Before going further, is it possible for you to fork my code [1], compile it and launch the main test class CouchdbRiverBinaryAttachementTest and send some docs with one or more attachments and see if you can search for it ?>> I tried with a very simple PDF file and it seems to work fine.>> I start to write a little documentation about it [2] (see at the end).>> You can also download the plugin [3] and install it instead of the previous couchDb plugin.>> BTW, you should have installed before the elasticsearch-mapper-attachments plugin [4].>> Please let me know if it’s working or not for you.>> Cheers,>> David.>> [1]https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments> [2]https://github.com/dadoonet/elasticsearch.github.com/blob/b77ebec4e44...> [3]https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/ela...> [4]https://github.com/elasticsearch/elasticsearch-mapper-attachments

Thanks for the feedback. Good to know.

David.

-----Message d'origine-----
De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Goog Cheng
Envoyé : dimanche 8 janvier 2012 02:07
À : elasticsearch
Objet : Re: how to create index for a attachment of a doc in couchDB
with ES?

work for me! I'm a chinese student, the english is so-so , sorry!

Thanks for your answer.

I have tested exactly the same as your documentation and your suggestion below. But I always receive zero hits. I really believe that the river did not retrieve and analyze attached files to index it later although I am using 1.0.0 couchdb-river version.
Myabe the difference is the attached pdf file. Could you send me your file (if it is not secret or private)
Thanks a lot


De : David Pilato david@pilato.fr
À : elasticsearch@googlegroups.com
Envoyé le : Dimanche 8 Janvier 2012 23h37
Objet : RE: how to create index for a attachment of a doc in couchDB with ES?

Hi,

Just wondering why you try to send a mapping to ES ?
BTW, your mapping is incorrect. You don’t have to define each attachment.

In the README file, I wrote:
$ curl -X PUT http://127.0.0.1:9200/my_db/my_db/_mapping -d '{
"my_db": {
"properties": {
"_attachments": {
"properties": {
"attachment": {
"type": "attachment"
}
}
},
"yourfield" : {
"type": "string"
}
}
}
}'

I suggest that you try to simply get attachments from couchDb with the example I wrote and then, if working and if you really need to, play with mappings.
Try also to make simplier searches to start. Something like:
curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_all" : "temperature" } }

(note the lowercase t on temperature).

Documentation is here : https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

HTH
David.

De :elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de Chi Dung Tran
Envoyé : dimanche 8 janvier 2012 02:16
À : "elasticsearch@googlegroups.com"
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hi
But it doesn'e work for me.
I receive the same results as the mail I have already sent (below)
Please explain it to me.
You can see, I donot have hits.
Did I make a mistake or not?
Thanks

----- Mail transféré -----
De : Chi Dung Tran dungtctin4@yahoo.com
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Jeudi 29 Décembre 2011 16h42
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hello
I have tested it but it doesnot work well.
I attach 4 files to the 2 couchdb documents like that:
{
"_id": "Doc1",
"_rev": "5-4d607b7d88985097462ae9b2f67bc5ac",
"message": "Elastic Search",
"_attachments": {
"exam.docx": {
"content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"revpos": 4,
"digest": "md5-ecdBcsbc6w7mC1EOgd5SIg==",
"length": 10007,
"stub": true
},
"2230681.pdf": {
"content_type": "application/pdf",
"revpos": 2,
"digest": "md5-BUhqhHiVqKybxrfGsQTixQ==",
"length": 956146,
"stub": true
}
}
}
{
"_id": "Doc2",
"_rev": "7-dd58025abc2002566b6f458ad3d83d4d",
"message": test attachments",
"_attachments": {
"TestAttachments.txt": {
"content_type": "text/plain",
"revpos": 6,
"digest": "md5-aLTD+adMRHPw2+WMIN/42Q==",
"length": 89,
"stub": true
},
"DynamicPublishingUseCases.doc": {
"content_type": "application/msword",
"revpos": 2,
"digest": "md5-FRdhydLr57C+q3ff6xLEmA==",
"length": 22528,
"stub": true
}
}
}

Here is my test with elastic search:
curl -X PUT "localhost:9200/test_idx_couchdb_attachments"
{"ok":true,"acknowledged":true}

curl -XPUT 'http://localhost:9200/_river/test_river_couchdb_attachments/_meta' -d '{"type" : "couchdb", "couchdb" : {"host" : "localhost","port" : 5984,"db" : "my_test_couchdb_attachments","filter" : null,"ignore_attachments":false}},"index" : {"index" : "test_idx_couchdb_attachments", "type" : "test_mapping_couchdb_attachments" } }'
{"ok":true,"_index":"_river","_type":"test_river_couchdb_attachments","_id":"_meta","_version":1}
At first, I type:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
"2230681.pdf": {
"type": "attachment", "index" : "analyzed"
},
"DynamicPublishingUseCases.doc": {
"type": "attachment", "index" : "analyzed"
},
"TestAttachments.txt": {
"type": "attachment", "index" : "analyzed"
},
"exam.docx": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
and I receive results:
{"error":"MergeMappingException[Merge failed with failures {[Can't merge a non object mapping [TestAttachments.txt] with an object mapping [TestAttachments.txt], Can't merge a non object mapping [2230681.pdf] with an object mapping [2230681.pdf], Can't merge a non object mapping [exam.docx] with an object mapping [exam.docx], Can't merge a non object mapping [DynamicPublishingUseCases.doc] with an object mapping [DynamicPublishingUseCases.doc]]}]
So I change and I succeed:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
""2230681.pdf"": {
"type": "attachment", "index" : "analyzed"
},
""DynamicPublishingUseCases.doc"": {
"type": "attachment", "index" : "analyzed"
},
""TestAttachments.txt"": {
"type": "attachment", "index" : "analyzed"
},
""exam.docx"": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
{"ok":true,"acknowledged":true}
curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"wildcard" : { "_all" : "*" } } }'
This query works well by returning two documents
these queries donot work well with errors or no expected results:curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments."2230681.pdf".content" : "Temperature" } } }'{"took":0,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }'{"error":"SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[CHYAFYCERMGHlBvKHiEagA][my_test_couchdb_attachments][3]: SearchParseException[[my_test_couchdb_attachments][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }]]]; nested:
QueryParsingException[[my_test_couchdb_attachments] Failed to parse]; nested: JsonParseException[Unexpected character ('T' (code 84)): was expecting a colon to separate field name and value\n at [Source: [B@582a85; line: 1, column: 56]]; }]curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text" : { "_attachments."DynamicPublishingUseCases.doc"" : "Rendering" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] }}curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text_phrase" : { "_attachments."TestAttachments.txt"" : "Couchdb" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed"
: 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] }}I have used Elastic search 0.18.4 with river Couchdb 1.1.0 and mapper-attachments plugins 1.1.0Please explain to me and how to index couchdb attachments and make it searchable?Maybe I should submit the pdf files whose content is encoded with base64? All my files in my test arenot encoded.Thanks a lot


De :David Pilato david@pilato.fr
À : elasticsearch elasticsearch@googlegroups.com
Envoyé le : Mercredi 28 Décembre 2011 14h12
Objet : Re: how to create index for a attachment of a doc in couchDB with ES?
Did anyone test it ?BTW, I updated the README file :https://github.com/dadoonet/elasticsearch-river-couchdb/blob/attachments/README.mdPlease let me know (CouchDB river users) if there is any regression orif I can submit the pull request.Thanks,David.On 22 déc, 00:11, "David Pilato" da...@pilato.fr wrote:> Hi there,>> I just finished something to deal with couchDb attachments using elasticsearch-mapper-attachments.>> Before going further, is it possible for you to fork my code [1], compile it and launch the main test class CouchdbRiverBinaryAttachementTest and send some docs with one or more attachments and see if you can search for it ?>> I tried with a very simple PDF file and it seems to work fine.>> I start to write a little documentation about it [2] (see at the end).>> You can also download the plugin [3] and install it instead of the previous couchDb plugin.>> BTW, you should have installed before the elasticsearch-mapper-attachments plugin
[4].>> Please let me know if it’s working or not for you.>> Cheers,>> David.>> [1]https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments> [2]https://github.com/dadoonet/elasticsearch.github.com/blob/b77ebec4e44...> [3]https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/ela...> [4]https://github.com/elasticsearch/elasticsearch-mapper-attachments

You have to download my jar in my Github repo because 1.0.0 doesn't have the attachment function.

HTH
David
@dadoonet

Le 10 janv. 2012 à 17:32, Chi Dung Tran dungtctin4@yahoo.com a écrit :

Thanks for your answer.
I have tested exactly the same as your documentation and your suggestion below. But I always receive zero hits. I really believe that the river did not retrieve and analyze attached files to index it later although I am using 1.0.0 couchdb-river version.
Myabe the difference is the attached pdf file. Could you send me your file (if it is not secret or private)
Thanks a lot

De : David Pilato david@pilato.fr
À : elasticsearch@googlegroups.com
Envoyé le : Dimanche 8 Janvier 2012 23h37
Objet : RE: how to create index for a attachment of a doc in couchDB with ES?

Hi,

Just wondering why you try to send a mapping to ES ?
BTW, your mapping is incorrect. You don’t have to define each attachment.

In the README file, I wrote:
$ curl -X PUT http://127.0.0.1:9200/my_db/my_db/_mapping -d '{
"my_db": {
"properties": {
"_attachments": {
"properties": {
"attachment": {
"type": "attachment"
}
}
},
"yourfield" : {
"type": "string"
}
}
}
}'

I suggest that you try to simply get attachments from couchDb with the example I wrote and then, if working and if you really need to, play with mappings.
Try also to make simplier searches to start. Something like:
curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_all" : "temperature" } }

(note the lowercase t on temperature).

Documentation is here : https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

HTH
David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de Chi Dung Tran
Envoyé : dimanche 8 janvier 2012 02:16
À : "elasticsearch@googlegroups.com"
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hi
But it doesn'e work for me.
I receive the same results as the mail I have already sent (below)
Please explain it to me.
You can see, I donot have hits.
Did I make a mistake or not?
Thanks

----- Mail transféré -----
De : Chi Dung Tran dungtctin4@yahoo.com
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Jeudi 29 Décembre 2011 16h42
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hello
I have tested it but it doesnot work well.
I attach 4 files to the 2 couchdb documents like that:
{
"_id": "Doc1",
"_rev": "5-4d607b7d88985097462ae9b2f67bc5ac",
"message": "Elastic Search",
"_attachments": {
"exam.docx": {
"content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"revpos": 4,
"digest": "md5-ecdBcsbc6w7mC1EOgd5SIg==",
"length": 10007,
"stub": true
},
"2230681.pdf": {
"content_type": "application/pdf",
"revpos": 2,
"digest": "md5-BUhqhHiVqKybxrfGsQTixQ==",
"length": 956146,
"stub": true
}
}
}
{
"_id": "Doc2",
"_rev": "7-dd58025abc2002566b6f458ad3d83d4d",
"message": test attachments",
"_attachments": {
"TestAttachments.txt": {
"content_type": "text/plain",
"revpos": 6,
"digest": "md5-aLTD+adMRHPw2+WMIN/42Q==",
"length": 89,
"stub": true
},
"DynamicPublishingUseCases.doc": {
"content_type": "application/msword",
"revpos": 2,
"digest": "md5-FRdhydLr57C+q3ff6xLEmA==",
"length": 22528,
"stub": true
}
}
}

Here is my test with Elasticsearch:
curl -X PUT "localhost:9200/test_idx_couchdb_attachments"
{"ok":true,"acknowledged":true}

curl -XPUT 'http://localhost:9200/_river/test_river_couchdb_attachments/_meta' -d '{"type" : "couchdb", "couchdb" : {"host" : "localhost","port" : 5984,"db" : "my_test_couchdb_attachments","filter" : null,"ignore_attachments":false}},"index" : {"index" : "test_idx_couchdb_attachments", "type" : "test_mapping_couchdb_attachments" } }'
{"ok":true,"_index":"_river","_type":"test_river_couchdb_attachments","_id":"_meta","_version":1}
At first, I type:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
"2230681.pdf": {
"type": "attachment", "index" : "analyzed"
},
"DynamicPublishingUseCases.doc": {
"type": "attachment", "index" : "analyzed"
},
"TestAttachments.txt": {
"type": "attachment", "index" : "analyzed"
},
"exam.docx": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
and I receive results:
{"error":"MergeMappingException[Merge failed with failures {[Can't merge a non object mapping [TestAttachments.txt] with an object mapping [TestAttachments.txt], Can't merge a non object mapping [2230681.pdf] with an object mapping [2230681.pdf], Can't merge a non object mapping [exam.docx] with an object mapping [exam.docx], Can't merge a non object mapping [DynamicPublishingUseCases.doc] with an object mapping [DynamicPublishingUseCases.doc]]}]
So I change and I succeed:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
""2230681.pdf"": {
"type": "attachment", "index" : "analyzed"
},
""DynamicPublishingUseCases.doc"": {
"type": "attachment", "index" : "analyzed"
},
""TestAttachments.txt"": {
"type": "attachment", "index" : "analyzed"
},
""exam.docx"": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
{"ok":true,"acknowledged":true}
curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"wildcard" : { "_all" : "*" } } }'
This query works well by returning two documents
these queries donot work well with errors or no expected results:curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments."2230681.pdf".content" : "Temperature" } } }'{"took":0,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }'{"error":"SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[CHYAFYCERMGHlBvKHiEagA][my_test_couchdb_attachments][3]: SearchParseException[[my_test_couchdb_attachments][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }]]]; nested: QueryParsingException[[my_test_couchdb_attachments] Failed to parse]; nested: JsonParseException[Unexpected character ('T' (code 84)): was expecting a colon to separate field name and value\n at [Source: [B@582a85; line: 1, column: 56]]; }]curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text" : { "_attachments."DynamicPublishingUseCases.doc"" : "Rendering" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text_phrase" : { "_attachments."TestAttachments.txt"" : "Couchdb" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}I have used Elastic search 0.18.4 with river Couchdb 1.1.0 and mapper-attachments plugins 1.1.0Please explain to me and how to index couchdb attachments and make it searchable?Maybe I should submit the pdf files whose content is encoded with base64? All my files in my test arenot encoded.Thanks a lot


De : David Pilato david@pilato.fr
À : elasticsearch elasticsearch@googlegroups.com
Envoyé le : Mercredi 28 Décembre 2011 14h12
Objet : Re: how to create index for a attachment of a doc in couchDB with ES?
Did anyone test it ?BTW, I updated the README file :https://github.com/dadoonet/elasticsearch-river-couchdb/blob/attachments/README.mdPlease let me know (CouchDB river users) if there is any regression orif I can submit the pull request.Thanks,David.On 22 déc, 00:11, "David Pilato" da...@pilato.fr wrote:> Hi there,>> I just finished something to deal with couchDb attachments using elasticsearch-mapper-attachments.>> Before going further, is it possible for you to fork my code [1], compile it and launch the main test class CouchdbRiverBinaryAttachementTest and send some docs with one or more attachments and see if you can search for it ?>> I tried with a very simple PDF file and it seems to work fine.>> I start to write a little documentation about it [2] (see at the end).>> You can also download the plugin [3] and install it instead of the previous couchDb plugin.>> BTW, you should have installed before the elasticsearch-mapper-attachments plugin [4].>> Please let me know if it’s working or not for you.>> Cheers,>> David.>> [1]https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments> [2]GitHub - dadoonet/elasticsearch.github.com at b77ebec4e44c5d794d68cfd3c79fd2b3db2b120c...> [3]https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/ela...> [4]GitHub - elastic/elasticsearch-mapper-attachments: Mapper Attachments Type plugin for Elasticsearch

It works well with documents of only one attachment
and ignores documents of two attachments.
Hopes it will be upgraded in futur versions

Thanks



De : David Pilato david@pilato.fr
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Mardi 10 Janvier 2012 17h39
Objet : Re: Re : how to create index for a attachment of a doc in couchDB with ES?

You have to download my jar in my Github repo because 1.0.0 doesn't have the attachment function.

HTH
David
@dadoonet

Le 10 janv. 2012 à 17:32, Chi Dung Tran dungtctin4@yahoo.com a écrit :

Thanks for your answer.

I have tested exactly the same as your documentation and your suggestion below. But I always receive zero hits. I really believe that the river did not retrieve and analyze attached files to index it later although I am using 1.0.0 couchdb-river version.
Myabe the difference is the attached pdf file. Could you send me your file (if it is not secret or private)
Thanks a lot


De : David Pilato david@pilato.fr
À : elasticsearch@googlegroups.com
Envoyé le : Dimanche 8 Janvier 2012 23h37
Objet : RE: how to create index for a attachment of a doc in couchDB with ES?

Hi,

Just wondering why you try to send a mapping to ES ?
BTW, your mapping is incorrect. You don’t have to define each attachment.

In the README file, I wrote:
$ curl -X PUT http://127.0.0.1:9200/my_db/my_db/_mapping -d '{
"my_db": {
"properties": {
"_attachments": {
"properties": {
"attachment": {
"type": "attachment"
}
}
},
"yourfield" : {
"type": "string"
}
}
}
}'

I suggest that you try to simply get attachments from couchDb with the example I wrote and then, if working and if you really need to, play with mappings.
Try also to make simplier searches to start. Something like:
curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_all" : "temperature" } }

(note the lowercase t on temperature).

Documentation is here : https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

HTH
David.

De :elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de Chi Dung Tran
Envoyé : dimanche 8 janvier 2012 02:16
À : "elasticsearch@googlegroups.com"
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hi
But it doesn'e work for me.
I receive the same results as the mail I have already sent (below)
Please explain it to me.
You can see, I donot have hits.
Did I make a mistake or not?
Thanks

----- Mail transféré -----
De : Chi Dung Tran dungtctin4@yahoo.com
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Jeudi 29 Décembre 2011 16h42
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hello
I have tested it but it doesnot work well.
I attach 4 files to the 2 couchdb documents like that:
{
"_id": "Doc1",
"_rev": "5-4d607b7d88985097462ae9b2f67bc5ac",
"message": "Elastic Search",
"_attachments": {
"exam.docx":
{
"content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"revpos": 4,
"digest": "md5-ecdBcsbc6w7mC1EOgd5SIg==",
"length": 10007,
"stub": true
},
"2230681.pdf": {
"content_type": "application/pdf",
"revpos": 2,
"digest": "md5-BUhqhHiVqKybxrfGsQTixQ==",
"length":
956146,
"stub": true
}
}
}
{
"_id": "Doc2",
"_rev": "7-dd58025abc2002566b6f458ad3d83d4d",
"message": test attachments",
"_attachments": {
"TestAttachments.txt": {
"content_type": "text/plain",
"revpos": 6,
"digest": "md5-aLTD+adMRHPw2+WMIN/42Q==",
"length": 89,

"stub": true

  },
  "DynamicPublishingUseCases.doc": {
      "content_type": "application/msword",
      "revpos": 2,
      "digest": "md5-FRdhydLr57C+q3ff6xLEmA==",
      "length": 22528,
      "stub": true
  }

}
}

Here is my test with Elasticsearch:
curl -X PUT "localhost:9200/test_idx_couchdb_attachments"
{"ok":true,"acknowledged":true}

curl -XPUT 'http://localhost:9200/_river/test_river_couchdb_attachments/_meta' -d '{"type" : "couchdb", "couchdb" : {"host" : "localhost","port" : 5984,"db" : "my_test_couchdb_attachments","filter" : null,"ignore_attachments":false}},"index" : {"index" : "test_idx_couchdb_attachments", "type" : "test_mapping_couchdb_attachments" } }'
{"ok":true,"_index":"_river","_type":"test_river_couchdb_attachments","_id":"_meta","_version":1}
At first, I type:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
"2230681.pdf": {
"type": "attachment", "index" :
"analyzed"
},
"DynamicPublishingUseCases.doc": {
"type": "attachment", "index" : "analyzed"
},
"TestAttachments.txt": {
"type": "attachment", "index" : "analyzed"
},
"exam.docx": {
"type": "attachment", "index" : "analyzed"
}
}
},

"message" : {

   "type": "string", "index" : "analyzed"
 }

}
}
}'
and I receive results:
{"error":"MergeMappingException[Merge failed with failures {[Can't merge a non object mapping [TestAttachments.txt] with an object mapping [TestAttachments.txt], Can't merge a non object mapping [2230681.pdf] with an object mapping [2230681.pdf], Can't merge a non object mapping [exam.docx] with an object mapping [exam.docx], Can't merge a non object mapping [DynamicPublishingUseCases.doc] with an object mapping [DynamicPublishingUseCases.doc]]}]
So I change and I succeed:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
""2230681.pdf"": {
"type": "attachment", "index" :
"analyzed"
},
""DynamicPublishingUseCases.doc"": {
"type": "attachment", "index" : "analyzed"
},
""TestAttachments.txt"": {
"type": "attachment", "index" : "analyzed"
},
""exam.docx"": {
"type": "attachment", "index" : "analyzed"
}
}

},

 "message" : {
   "type": "string", "index" : "analyzed"
 }

}
}
}'
{"ok":true,"acknowledged":true}
curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"wildcard" : { "_all" : "*" } } }'
This query works well by returning two documents
these queries donot work well with errors or no expected results:curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments."2230681.pdf".content" : "Temperature" } } }'{"took":0,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }'{"error":"SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[CHYAFYCERMGHlBvKHiEagA][my_test_couchdb_attachments][3]: SearchParseException[[my_test_couchdb_attachments][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }]]]; nested:
QueryParsingException[[my_test_couchdb_attachments] Failed to parse]; nested: JsonParseException[Unexpected character ('T' (code 84)): was expecting a colon to separate field name and value\n at [Source: [B@582a85; line: 1, column: 56]]; }]curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text" : { "_attachments."DynamicPublishingUseCases.doc"" : "Rendering" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text_phrase" : { "_attachments."TestAttachments.txt"" : "Couchdb" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed"
: 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}I have used Elastic search 0.18.4 with river Couchdb 1.1.0 and mapper-attachments plugins 1.1.0Please explain to me and how to index couchdb attachments and make it searchable?Maybe I should submit the pdf files whose content is encoded with base64? All my files in my test arenot encoded.Thanks a lot


De :David Pilato david@pilato.fr
À : elasticsearch elasticsearch@googlegroups.com
Envoyé le : Mercredi 28 Décembre 2011 14h12
Objet : Re: how to create index for a attachment of a doc in couchDB with ES?
Did anyone test it ?BTW, I updated the README file :https://github.com/dadoonet/elasticsearch-river-couchdb/blob/attachments/README.mdPlease let me know (CouchDB river users) if there is any regression orif I can submit the pull request.Thanks,David.On 22 déc, 00:11, "David Pilato" da...@pilato.fr wrote:> Hi there,>> I just finished something to deal with couchDb attachments using elasticsearch-mapper-attachments.>> Before going further, is it possible for you to fork my code [1], compile it and launch the main test class CouchdbRiverBinaryAttachementTest and send some docs with one or more attachments and see if you can search for it ?>> I tried with a very simple PDF file and it seems to work fine.>> I start to write a little documentation about it [2] (see at the end).>> You can also download the plugin [3] and install it instead of the previous couchDb plugin.>> BTW, you should have installed before the elasticsearch-mapper-attachments plugin
[4].>> Please let me know if it’s working or not for you.>> Cheers,>> David.>> [1]https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments> [2]GitHub - dadoonet/elasticsearch.github.com at b77ebec4e44c5d794d68cfd3c79fd2b3db2b120c...> [3]https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/ela...> [4]GitHub - elastic/elasticsearch-mapper-attachments: Mapper Attachments Type plugin for Elasticsearch

Strange. As far as I remember my tests, it should work with multiple attachments.
I will have a look.

David :wink:
@dadoonet

Le 12 janv. 2012 à 00:48, Chi Dung Tran dungtctin4@yahoo.com a écrit :

It works well with documents of only one attachment
and ignores documents of two attachments.
Hopes it will be upgraded in futur versions
Thanks


De : David Pilato david@pilato.fr
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Mardi 10 Janvier 2012 17h39
Objet : Re: Re : how to create index for a attachment of a doc in couchDB with ES?

You have to download my jar in my Github repo because 1.0.0 doesn't have the attachment function.

HTH
David
@dadoonet

Le 10 janv. 2012 à 17:32, Chi Dung Tran dungtctin4@yahoo.com a écrit :

Thanks for your answer.
I have tested exactly the same as your documentation and your suggestion below. But I always receive zero hits. I really believe that the river did not retrieve and analyze attached files to index it later although I am using 1.0.0 couchdb-river version.
Myabe the difference is the attached pdf file. Could you send me your file (if it is not secret or private)
Thanks a lot

De : David Pilato david@pilato.fr
À : elasticsearch@googlegroups.com
Envoyé le : Dimanche 8 Janvier 2012 23h37
Objet : RE: how to create index for a attachment of a doc in couchDB with ES?

Hi,

Just wondering why you try to send a mapping to ES ?
BTW, your mapping is incorrect. You don’t have to define each attachment.

In the README file, I wrote:
$ curl -X PUT http://127.0.0.1:9200/my_db/my_db/_mapping -d '{
"my_db": {
"properties": {
"_attachments": {
"properties": {
"attachment": {
"type": "attachment"
}
}
},
"yourfield" : {
"type": "string"
}
}
}
}'

I suggest that you try to simply get attachments from couchDb with the example I wrote and then, if working and if you really need to, play with mappings.
Try also to make simplier searches to start. Something like:
curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_all" : "temperature" } }

(note the lowercase t on temperature).

Documentation is here : https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

HTH
David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de Chi Dung Tran
Envoyé : dimanche 8 janvier 2012 02:16
À : "elasticsearch@googlegroups.com"
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hi
But it doesn'e work for me.
I receive the same results as the mail I have already sent (below)
Please explain it to me.
You can see, I donot have hits.
Did I make a mistake or not?
Thanks

----- Mail transféré -----
De : Chi Dung Tran dungtctin4@yahoo.com
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Jeudi 29 Décembre 2011 16h42
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hello
I have tested it but it doesnot work well.
I attach 4 files to the 2 couchdb documents like that:
{
"_id": "Doc1",
"_rev": "5-4d607b7d88985097462ae9b2f67bc5ac",
"message": "Elastic Search",
"_attachments": {
"exam.docx": {
"content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"revpos": 4,
"digest": "md5-ecdBcsbc6w7mC1EOgd5SIg==",
"length": 10007,
"stub": true
},
"2230681.pdf": {
"content_type": "application/pdf",
"revpos": 2,
"digest": "md5-BUhqhHiVqKybxrfGsQTixQ==",
"length": 956146,
"stub": true
}
}
}
{
"_id": "Doc2",
"_rev": "7-dd58025abc2002566b6f458ad3d83d4d",
"message": test attachments",
"_attachments": {
"TestAttachments.txt": {
"content_type": "text/plain",
"revpos": 6,
"digest": "md5-aLTD+adMRHPw2+WMIN/42Q==",
"length": 89,
"stub": true
},
"DynamicPublishingUseCases.doc": {
"content_type": "application/msword",
"revpos": 2,
"digest": "md5-FRdhydLr57C+q3ff6xLEmA==",
"length": 22528,
"stub": true
}
}
}

Here is my test with Elasticsearch:
curl -X PUT "localhost:9200/test_idx_couchdb_attachments"
{"ok":true,"acknowledged":true}

curl -XPUT 'http://localhost:9200/_river/test_river_couchdb_attachments/_meta' -d '{"type" : "couchdb", "couchdb" : {"host" : "localhost","port" : 5984,"db" : "my_test_couchdb_attachments","filter" : null,"ignore_attachments":false}},"index" : {"index" : "test_idx_couchdb_attachments", "type" : "test_mapping_couchdb_attachments" } }'
{"ok":true,"_index":"_river","_type":"test_river_couchdb_attachments","_id":"_meta","_version":1}
At first, I type:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
"2230681.pdf": {
"type": "attachment", "index" : "analyzed"
},
"DynamicPublishingUseCases.doc": {
"type": "attachment", "index" : "analyzed"
},
"TestAttachments.txt": {
"type": "attachment", "index" : "analyzed"
},
"exam.docx": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
and I receive results:
{"error":"MergeMappingException[Merge failed with failures {[Can't merge a non object mapping [TestAttachments.txt] with an object mapping [TestAttachments.txt], Can't merge a non object mapping [2230681.pdf] with an object mapping [2230681.pdf], Can't merge a non object mapping [exam.docx] with an object mapping [exam.docx], Can't merge a non object mapping [DynamicPublishingUseCases.doc] with an object mapping [DynamicPublishingUseCases.doc]]}]
So I change and I succeed:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
""2230681.pdf"": {
"type": "attachment", "index" : "analyzed"
},
""DynamicPublishingUseCases.doc"": {
"type": "attachment", "index" : "analyzed"
},
""TestAttachments.txt"": {
"type": "attachment", "index" : "analyzed"
},
""exam.docx"": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
{"ok":true,"acknowledged":true}
curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"wildcard" : { "_all" : "*" } } }'
This query works well by returning two documents
these queries donot work well with errors or no expected results:curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments."2230681.pdf".content" : "Temperature" } } }'{"took":0,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }'{"error":"SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[CHYAFYCERMGHlBvKHiEagA][my_test_couchdb_attachments][3]: SearchParseException[[my_test_couchdb_attachments][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }]]]; nested: QueryParsingException[[my_test_couchdb_attachments] Failed to parse]; nested: JsonParseException[Unexpected character ('T' (code 84)): was expecting a colon to separate field name and value\n at [Source: [B@582a85; line: 1, column: 56]]; }]curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text" : { "_attachments."DynamicPublishingUseCases.doc"" : "Rendering" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text_phrase" : { "_attachments."TestAttachments.txt"" : "Couchdb" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}I have used Elastic search 0.18.4 with river Couchdb 1.1.0 and mapper-attachments plugins 1.1.0Please explain to me and how to index couchdb attachments and make it searchable?Maybe I should submit the pdf files whose content is encoded with base64? All my files in my test arenot encoded.Thanks a lot


De : David Pilato david@pilato.fr
À : elasticsearch elasticsearch@googlegroups.com
Envoyé le : Mercredi 28 Décembre 2011 14h12
Objet : Re: how to create index for a attachment of a doc in couchDB with ES?
Did anyone test it ?BTW, I updated the README file :https://github.com/dadoonet/elasticsearch-river-couchdb/blob/attachments/README.mdPlease let me know (CouchDB river users) if there is any regression orif I can submit the pull request.Thanks,David.On 22 déc, 00:11, "David Pilato" da...@pilato.fr wrote:> Hi there,>> I just finished something to deal with couchDb attachments using elasticsearch-mapper-attachments.>> Before going further, is it possible for you to fork my code [1], compile it and launch the main test class CouchdbRiverBinaryAttachementTest and send some docs with one or more attachments and see if you can search for it ?>> I tried with a very simple PDF file and it seems to work fine.>> I start to write a little documentation about it [2] (see at the end).>> You can also download the plugin [3] and install it instead of the previous couchDb plugin.>> BTW, you should have installed before the elasticsearch-mapper-attachments plugin [4].>> Please let me know if it’s working or not for you.>> Cheers,>> David.>> [1]https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments> [2]GitHub - dadoonet/elasticsearch.github.com at b77ebec4e44c5d794d68cfd3c79fd2b3db2b120c...> [3]https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/ela...> [4]GitHub - elastic/elasticsearch-mapper-attachments: Mapper Attachments Type plugin for Elasticsearch

I donn't remember error message but it involves indexer...pool...concurrent threads or something like that


De : David Pilato david@pilato.fr
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Jeudi 12 Janvier 2012 1h10
Objet : Re: Re : Re : how to create index for a attachment of a doc in couchDB with ES?

Strange. As far as I remember my tests, it should work with multiple attachments.
I will have a look.

David :wink:
@dadoonet

Le 12 janv. 2012 à 00:48, Chi Dung Tran dungtctin4@yahoo.com a écrit :

It works well with documents of only one attachment

and ignores documents of two attachments.
Hopes it will be upgraded in futur versions
Thanks



De : David Pilato david@pilato.fr
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Mardi 10 Janvier 2012 17h39
Objet : Re: Re : how to create index for a attachment of a doc in couchDB with ES?

You have to download my jar in my Github repo because 1.0.0 doesn't have the attachment function.

HTH
David
@dadoonet

Le 10 janv. 2012 à 17:32, Chi Dung Tran dungtctin4@yahoo.com a écrit :

Thanks for your answer.

I have tested exactly the same as your documentation and your suggestion below. But I always receive zero hits. I really believe that the river did not retrieve and analyze attached files to index it later although I am using 1.0.0 couchdb-river version.
Myabe the difference is the attached pdf file. Could you send me your file (if it is not secret or private)
Thanks a lot


De : David Pilato david@pilato.fr
À : elasticsearch@googlegroups.com
Envoyé le : Dimanche 8 Janvier 2012 23h37
Objet : RE: how to create index for a attachment of a doc in couchDB with ES?

Hi,

Just wondering why you try to send a mapping to ES ?
BTW, your mapping is incorrect. You don’t have to define each attachment.

In the README file, I wrote:
$ curl -X PUT http://127.0.0.1:9200/my_db/my_db/_mapping -d '{
"my_db": {
"properties": {
"_attachments": {
"properties": {
"attachment": {
"type": "attachment"
}
}
},
"yourfield" : {
"type": "string"
}
}
}
}'

I suggest that you try to simply get attachments from couchDb with the example I wrote and then, if working and if you really need to, play with mappings.
Try also to make simplier searches to start. Something like:
curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_all" : "temperature" } }

(note the lowercase t on temperature).

Documentation is here : https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

HTH
David.

De :elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de Chi Dung Tran
Envoyé : dimanche 8 janvier 2012 02:16
À : "elasticsearch@googlegroups.com"
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hi
But it doesn'e work for me.
I receive the same results as the mail I have already sent (below)
Please explain it to me.
You can see, I donot have hits.
Did I make a mistake or not?
Thanks

----- Mail transféré -----
De : Chi Dung Tran dungtctin4@yahoo.com
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Jeudi 29 Décembre 2011 16h42
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hello
I have tested it but it doesnot work well.
I attach 4 files to the 2 couchdb documents like that:
{
"_id": "Doc1",
"_rev": "5-4d607b7d88985097462ae9b2f67bc5ac",
"message": "Elastic Search",
"_attachments":
{
"exam.docx":
{
"content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"revpos": 4,
"digest": "md5-ecdBcsbc6w7mC1EOgd5SIg==",
"length": 10007,
"stub": true
},
"2230681.pdf": {
"content_type": "application/pdf",
"revpos": 2,
"digest": "md5-BUhqhHiVqKybxrfGsQTixQ==",
"length":
956146,
"stub": true
}
}
}
{
"_id": "Doc2",
"_rev": "7-dd58025abc2002566b6f458ad3d83d4d",
"message": test attachments",
"_attachments": {
"TestAttachments.txt": {
"content_type": "text/plain",
"revpos": 6,
"digest": "md5-aLTD+adMRHPw2+WMIN/42Q==",
"length": 89,

"stub": true

  },
  "DynamicPublishingUseCases.doc": {
      "content_type": "application/msword",
      "revpos": 2,
      "digest": "md5-FRdhydLr57C+q3ff6xLEmA==",
      "length": 22528,
      "stub": true
  }

}
}

Here is my test with Elasticsearch:
curl -X PUT "localhost:9200/test_idx_couchdb_attachments"
{"ok":true,"acknowledged":true}

curl -XPUT 'http://localhost:9200/_river/test_river_couchdb_attachments/_meta' -d '{"type" : "couchdb", "couchdb" : {"host" : "localhost","port" : 5984,"db" : "my_test_couchdb_attachments","filter" : null,"ignore_attachments":false}},"index" : {"index" : "test_idx_couchdb_attachments", "type" : "test_mapping_couchdb_attachments" } }'
{"ok":true,"_index":"_river","_type":"test_river_couchdb_attachments","_id":"_meta","_version":1}
At first, I type:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties":
{
"2230681.pdf": {
"type": "attachment", "index" :
"analyzed"
},
"DynamicPublishingUseCases.doc": {
"type": "attachment", "index" : "analyzed"
},
"TestAttachments.txt": {
"type": "attachment", "index" : "analyzed"
},
"exam.docx": {
"type": "attachment", "index" : "analyzed"
}
}
},

"message" : {

   "type": "string", "index" : "analyzed"
 }

}
}
}'
and I receive results:
{"error":"MergeMappingException[Merge failed with failures {[Can't merge a non object mapping [TestAttachments.txt] with an object mapping [TestAttachments.txt], Can't merge a non object mapping [2230681.pdf] with an object mapping [2230681.pdf], Can't merge a non object mapping [exam.docx] with an object mapping [exam.docx], Can't merge a non object mapping [DynamicPublishingUseCases.doc] with an object mapping [DynamicPublishingUseCases.doc]]}]
So I change and I succeed:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
""2230681.pdf"":
{
"type": "attachment", "index" :
"analyzed"
},
""DynamicPublishingUseCases.doc"": {
"type": "attachment", "index" : "analyzed"
},
""TestAttachments.txt"": {
"type": "attachment", "index" : "analyzed"
},
""exam.docx"": {
"type": "attachment", "index" : "analyzed"
}
}

},

 "message" : {
   "type": "string", "index" : "analyzed"
 }

}
}
}'
{"ok":true,"acknowledged":true}
curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"wildcard" : { "_all" : "*" } } }'
This query works well by returning two documents
these queries donot work well with errors or no expected results:curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments."2230681.pdf".content" : "Temperature" } } }'{"took":0,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }'{"error":"SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[CHYAFYCERMGHlBvKHiEagA][my_test_couchdb_attachments][3]: SearchParseException[[my_test_couchdb_attachments][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }]]]; nested:
QueryParsingException[[my_test_couchdb_attachments] Failed to parse]; nested: JsonParseException[Unexpected character ('T' (code 84)): was expecting a colon to separate field name and value\n at [Source: [B@582a85; line: 1, column: 56]]; }]curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text" : { "_attachments."DynamicPublishingUseCases.doc"" : "Rendering" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text_phrase" : { "_attachments."TestAttachments.txt"" : "Couchdb" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed"
: 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}I have used Elastic search 0.18.4 with river Couchdb 1.1.0 and mapper-attachments plugins 1.1.0Please explain to me and how to index couchdb attachments and make it searchable?Maybe I should submit the pdf files whose content is encoded with base64? All my files in my test arenot encoded.Thanks a lot


De :David Pilato david@pilato.fr
À : elasticsearch elasticsearch@googlegroups.com
Envoyé le : Mercredi 28 Décembre 2011 14h12
Objet : Re: how to create index for a attachment of a doc in couchDB with ES?
Did anyone test it ?BTW, I updated the README file :https://github.com/dadoonet/elasticsearch-river-couchdb/blob/attachments/README.mdPlease let me know (CouchDB river users) if there is any regression orif I can submit the pull request.Thanks,David.On 22 déc, 00:11, "David Pilato" da...@pilato.fr wrote:> Hi there,>> I just finished something to deal with couchDb attachments using elasticsearch-mapper-attachments.>> Before going further, is it possible for you to fork my code [1], compile it and launch the main test class CouchdbRiverBinaryAttachementTest and send some docs with one or more attachments and see if you can search for it ?>> I tried with a very simple PDF file and it seems to work fine.>> I start to write a little documentation about it [2] (see at the end).>> You can also download the plugin [3] and install it instead of the previous couchDb plugin.>> BTW, you should have installed before the elasticsearch-mapper-attachments plugin
[4].>> Please let me know if it’s working or not for you.>> Cheers,>> David.>> [1]https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments> [2]GitHub - dadoonet/elasticsearch.github.com at b77ebec4e44c5d794d68cfd3c79fd2b3db2b120c...> [3]https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/ela...> [4]GitHub - elastic/elasticsearch-mapper-attachments: Mapper Attachments Type plugin for Elasticsearch

So it doesn't really ignore second attachment but it fails to index it ?
Could you gist a curl recreation ?

David :wink:
@dadoonet

Le 12 janv. 2012 à 01:40, Chi Dung Tran dungtctin4@yahoo.com a écrit :

I donn't remember error message but it involves indexer...pool...concurrent threads or something like that

De : David Pilato david@pilato.fr
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Jeudi 12 Janvier 2012 1h10
Objet : Re: Re : Re : how to create index for a attachment of a doc in couchDB with ES?

Strange. As far as I remember my tests, it should work with multiple attachments.
I will have a look.

David :wink:
@dadoonet

Le 12 janv. 2012 à 00:48, Chi Dung Tran dungtctin4@yahoo.com a écrit :

It works well with documents of only one attachment
and ignores documents of two attachments.
Hopes it will be upgraded in futur versions
Thanks


De : David Pilato david@pilato.fr
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Mardi 10 Janvier 2012 17h39
Objet : Re: Re : how to create index for a attachment of a doc in couchDB with ES?

You have to download my jar in my Github repo because 1.0.0 doesn't have the attachment function.

HTH
David
@dadoonet

Le 10 janv. 2012 à 17:32, Chi Dung Tran dungtctin4@yahoo.com a écrit :

Thanks for your answer.
I have tested exactly the same as your documentation and your suggestion below. But I always receive zero hits. I really believe that the river did not retrieve and analyze attached files to index it later although I am using 1.0.0 couchdb-river version.
Myabe the difference is the attached pdf file. Could you send me your file (if it is not secret or private)
Thanks a lot

De : David Pilato david@pilato.fr
À : elasticsearch@googlegroups.com
Envoyé le : Dimanche 8 Janvier 2012 23h37
Objet : RE: how to create index for a attachment of a doc in couchDB with ES?

Hi,

Just wondering why you try to send a mapping to ES ?
BTW, your mapping is incorrect. You don’t have to define each attachment.

In the README file, I wrote:
$ curl -X PUT http://127.0.0.1:9200/my_db/my_db/_mapping -d '{
"my_db": {
"properties": {
"_attachments": {
"properties": {
"attachment": {
"type": "attachment"
}
}
},
"yourfield" : {
"type": "string"
}
}
}
}'

I suggest that you try to simply get attachments from couchDb with the example I wrote and then, if working and if you really need to, play with mappings.
Try also to make simplier searches to start. Something like:
curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_all" : "temperature" } }

(note the lowercase t on temperature).

Documentation is here : https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

HTH
David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de Chi Dung Tran
Envoyé : dimanche 8 janvier 2012 02:16
À : "elasticsearch@googlegroups.com"
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hi
But it doesn'e work for me.
I receive the same results as the mail I have already sent (below)
Please explain it to me.
You can see, I donot have hits.
Did I make a mistake or not?
Thanks

----- Mail transféré -----
De : Chi Dung Tran dungtctin4@yahoo.com
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Jeudi 29 Décembre 2011 16h42
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hello
I have tested it but it doesnot work well.
I attach 4 files to the 2 couchdb documents like that:
{
"_id": "Doc1",
"_rev": "5-4d607b7d88985097462ae9b2f67bc5ac",
"message": "Elastic Search",
"_attachments": {
"exam.docx": {
"content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"revpos": 4,
"digest": "md5-ecdBcsbc6w7mC1EOgd5SIg==",
"length": 10007,
"stub": true
},
"2230681.pdf": {
"content_type": "application/pdf",
"revpos": 2,
"digest": "md5-BUhqhHiVqKybxrfGsQTixQ==",
"length": 956146,
"stub": true
}
}
}
{
"_id": "Doc2",
"_rev": "7-dd58025abc2002566b6f458ad3d83d4d",
"message": test attachments",
"_attachments": {
"TestAttachments.txt": {
"content_type": "text/plain",
"revpos": 6,
"digest": "md5-aLTD+adMRHPw2+WMIN/42Q==",
"length": 89,
"stub": true
},
"DynamicPublishingUseCases.doc": {
"content_type": "application/msword",
"revpos": 2,
"digest": "md5-FRdhydLr57C+q3ff6xLEmA==",
"length": 22528,
"stub": true
}
}
}

Here is my test with Elasticsearch:
curl -X PUT "localhost:9200/test_idx_couchdb_attachments"
{"ok":true,"acknowledged":true}

curl -XPUT 'http://localhost:9200/_river/test_river_couchdb_attachments/_meta' -d '{"type" : "couchdb", "couchdb" : {"host" : "localhost","port" : 5984,"db" : "my_test_couchdb_attachments","filter" : null,"ignore_attachments":false}},"index" : {"index" : "test_idx_couchdb_attachments", "type" : "test_mapping_couchdb_attachments" } }'
{"ok":true,"_index":"_river","_type":"test_river_couchdb_attachments","_id":"_meta","_version":1}
At first, I type:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
"2230681.pdf": {
"type": "attachment", "index" : "analyzed"
},
"DynamicPublishingUseCases.doc": {
"type": "attachment", "index" : "analyzed"
},
"TestAttachments.txt": {
"type": "attachment", "index" : "analyzed"
},
"exam.docx": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
and I receive results:
{"error":"MergeMappingException[Merge failed with failures {[Can't merge a non object mapping [TestAttachments.txt] with an object mapping [TestAttachments.txt], Can't merge a non object mapping [2230681.pdf] with an object mapping [2230681.pdf], Can't merge a non object mapping [exam.docx] with an object mapping [exam.docx], Can't merge a non object mapping [DynamicPublishingUseCases.doc] with an object mapping [DynamicPublishingUseCases.doc]]}]
So I change and I succeed:
curl -X PUT http://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_mapping -d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
""2230681.pdf"": {
"type": "attachment", "index" : "analyzed"
},
""DynamicPublishingUseCases.doc"": {
"type": "attachment", "index" : "analyzed"
},
""TestAttachments.txt"": {
"type": "attachment", "index" : "analyzed"
},
""exam.docx"": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
{"ok":true,"acknowledged":true}
curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"wildcard" : { "_all" : "*" } } }'
This query works well by returning two documents
these queries donot work well with errors or no expected results:curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments."2230681.pdf".content" : "Temperature" } } }'{"took":0,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search' -d '{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }'{"error":"SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[CHYAFYCERMGHlBvKHiEagA][my_test_couchdb_attachments][3]: SearchParseException[[my_test_couchdb_attachments][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }]]]; nested: QueryParsingException[[my_test_couchdb_attachments] Failed to parse]; nested: JsonParseException[Unexpected character ('T' (code 84)): was expecting a colon to separate field name and value\n at [Source: [B@582a85; line: 1, column: 56]]; }]curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text" : { "_attachments."DynamicPublishingUseCases.doc"" : "Rendering" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_attachments/_search?pretty=true' -d '{"query" : {"text_phrase" : { "_attachments."TestAttachments.txt"" : "Couchdb" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}I have used Elastic search 0.18.4 with river Couchdb 1.1.0 and mapper-attachments plugins 1.1.0Please explain to me and how to index couchdb attachments and make it searchable?Maybe I should submit the pdf files whose content is encoded with base64? All my files in my test arenot encoded.Thanks a lot


De : David Pilato david@pilato.fr
À : elasticsearch elasticsearch@googlegroups.com
Envoyé le : Mercredi 28 Décembre 2011 14h12
Objet : Re: how to create index for a attachment of a doc in couchDB with ES?
Did anyone test it ?BTW, I updated the README file :https://github.com/dadoonet/elasticsearch-river-couchdb/blob/attachments/README.mdPlease let me know (CouchDB river users) if there is any regression orif I can submit the pull request.Thanks,David.On 22 déc, 00:11, "David Pilato" da...@pilato.fr wrote:> Hi there,>> I just finished something to deal with couchDb attachments using elasticsearch-mapper-attachments.>> Before going further, is it possible for you to fork my code [1], compile it and launch the main test class CouchdbRiverBinaryAttachementTest and send some docs with one or more attachments and see if you can search for it ?>> I tried with a very simple PDF file and it seems to work fine.>> I start to write a little documentation about it [2] (see at the end).>> You can also download the plugin [3] and install it instead of the previous couchDb plugin.>> BTW, you should have installed before the elasticsearch-mapper-attachments plugin [4].>> Please let me know if it’s working or not for you.>> Cheers,>> David.>> [1]https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments> [2]GitHub - dadoonet/elasticsearch.github.com at b77ebec4e44c5d794d68cfd3c79fd2b3db2b120c...> [3]https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/ela...> [4]GitHub - elastic/elasticsearch-mapper-attachments: Mapper Attachments Type plugin for Elasticsearch

David,I am using the es1.18.7 ,it didn't work after ur readme , It
seems that it have to download AND use ur jar , I tried 2 days and
hits no ,hope u update the readme ,THANKS VERY MUCH!!!

On Jan 11, 12:39 am, David Pilato da...@pilato.fr wrote:

You have to download my jar in my Github repo because 1.0.0 doesn't have the attachment function.

HTH
David
@dadoonet

Le 10 janv. 2012 à 17:32, Chi Dung Tran dungtct...@yahoo.com a écrit :

Thanks for your answer.
I have tested exactly the same as your documentation and your suggestion below. But I always receive zero hits. I really believe that the river did not retrieve and analyze attached files to index it later although I am using 1.0.0 couchdb-river version.
Myabe the difference is the attached pdf file. Could you send me your file (if it is not secret or private)
Thanks a lot

De : David Pilato da...@pilato.fr
À : elasticsearch@googlegroups.com
Envoyé le : Dimanche 8 Janvier 2012 23h37
Objet : RE: how to create index for a attachment of a doc in couchDB with ES?

Hi,

Just wondering why you try to send a mapping to ES ?
BTW, your mapping is incorrect. You don’t have to define each attachment.

In the README file, I wrote:
$ curl -X PUThttp://127.0.0.1:9200/my_db/my_db/_mapping-d '{
"my_db": {
"properties": {
"_attachments": {
"properties": {
"attachment": {
"type": "attachment"
}
}
},
"yourfield" : {
"type": "string"
}
}
}
}'

I suggest that you try to simply get attachments from couchDb with the example I wrote and then, if working and if you really need to, play with mappings.
Try also to make simplier searches to start. Something like:
curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"text" : { "_all" : "temperature" } }

(note the lowercase t on temperature).

Documentation is here :https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

HTH
David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de Chi Dung Tran
Envoyé : dimanche 8 janvier 2012 02:16
À : "elasticsearch@googlegroups.com"
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hi
But it doesn'e work for me.
I receive the same results as the mail I have already sent (below)
Please explain it to me.
You can see, I donot have hits.
Did I make a mistake or not?
Thanks

----- Mail transféré -----
De : Chi Dung Tran dungtct...@yahoo.com
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Jeudi 29 Décembre 2011 16h42
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hello
I have tested it but it doesnot work well.
I attach 4 files to the 2 couchdb documents like that:
{
"_id": "Doc1",
"_rev": "5-4d607b7d88985097462ae9b2f67bc5ac",
"message": "Elastic Search",
"_attachments": {
"exam.docx": {
"content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"revpos": 4,
"digest": "md5-ecdBcsbc6w7mC1EOgd5SIg==",
"length": 10007,
"stub": true
},
"2230681.pdf": {
"content_type": "application/pdf",
"revpos": 2,
"digest": "md5-BUhqhHiVqKybxrfGsQTixQ==",
"length": 956146,
"stub": true
}
}
}
{
"_id": "Doc2",
"_rev": "7-dd58025abc2002566b6f458ad3d83d4d",
"message": test attachments",
"_attachments": {
"TestAttachments.txt": {
"content_type": "text/plain",
"revpos": 6,
"digest": "md5-aLTD+adMRHPw2+WMIN/42Q==",
"length": 89,
"stub": true
},
"DynamicPublishingUseCases.doc": {
"content_type": "application/msword",
"revpos": 2,
"digest": "md5-FRdhydLr57C+q3ff6xLEmA==",
"length": 22528,
"stub": true
}
}
}

Here is my test with Elasticsearch:
curl -X PUT "localhost:9200/test_idx_couchdb_attachments"
{"ok":true,"acknowledged":true}

curl -XPUT 'http://localhost:9200/_river/test_river_couchdb_attachments/_meta'-d '{"type" : "couchdb", "couchdb" : {"host" : "localhost","port" : 5984,"db" : "my_test_couchdb_attachments","filter" : null,"ignore_attachments":false}},"index" : {"index" : "test_idx_couchdb_attachments", "type" : "test_mapping_couchdb_attachments" } }'
{"ok":true,"_index":"_river","_type":"test_river_couchdb_attachments","_id" :"_meta","_version":1}
At first, I type:
curl -X PUThttp://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
"2230681.pdf": {
"type": "attachment", "index" : "analyzed"
},
"DynamicPublishingUseCases.doc": {
"type": "attachment", "index" : "analyzed"
},
"TestAttachments.txt": {
"type": "attachment", "index" : "analyzed"
},
"exam.docx": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
and I receive results:
{"error":"MergeMappingException[Merge failed with failures {[Can't merge a non object mapping [TestAttachments.txt] with an object mapping [TestAttachments.txt], Can't merge a non object mapping [2230681.pdf] with an object mapping [2230681.pdf], Can't merge a non object mapping [exam.docx] with an object mapping [exam.docx], Can't merge a non object mapping [DynamicPublishingUseCases.doc] with an object mapping [DynamicPublishingUseCases.doc]]}]
So I change and I succeed:
curl -X PUThttp://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
""2230681.pdf"": {
"type": "attachment", "index" : "analyzed"
},
""DynamicPublishingUseCases.doc"": {
"type": "attachment", "index" : "analyzed"
},
""TestAttachments.txt"": {
"type": "attachment", "index" : "analyzed"
},
""exam.docx"": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
{"ok":true,"acknowledged":true}
curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"wildcard" : { "_all" : "*" } } }'
This query works well by returning two documents
these queries donot work well with errors or no expected results:curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"text" : { "_attachments."2230681.pdf".content" : "Temperature" } } }'{"took":0,"timed_out":false,"_shards":{"total":5,"successful":5,"failed": 0},"hits":{"total":0,"max_score":null,"hits":}}curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }'{"error":"SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[CHYAFYCERMGHlBvKHiEagA][my_test_couchdb_attachments][3]: SearchParseException[[my_test_couchdb_attachments][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }]]]; nested: QueryParsingException[[my_test_couchdb_attachments] Failed to parse]; nested: JsonParseException[Unexpected character ('T' (code 84)): was expecting a colon to separate field name and value\n at [Source: [B@582a85; line: 1, column: 56]]; }]curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"text" : { "_attachments."DynamicPublishingUseCases.doc"" : "Rendering" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"text_phrase" : { "_attachments."TestAttachments.txt"" : "Couchdb" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}I have used Elastic search 0.18.4 with river Couchdb 1.1.0 and mapper-attachments plugins 1.1.0Please explain to me and how to index couchdb attachments and make it searchable?Maybe I should submit the pdf files whose content is encoded with base64? All my files in my test arenot encoded.Thanks a lot


De : David Pilato da...@pilato.fr
À : elasticsearch elasticsearch@googlegroups.com
Envoyé le : Mercredi 28 Décembre 2011 14h12
Objet : Re: how to create index for a attachment of a doc in couchDB with ES?
Did anyone test it ?BTW, I updated the README file :https://github.com/dadoonet/elasticsearch-river-couchdb/blob/attachme...let me know (CouchDB river users) if there is any regression orif I can submit the pull request.Thanks,David.On 22 déc, 00:11, "David Pilato" da...@pilato.fr wrote:> Hi there,>> I just finished something to deal with couchDb attachments using elasticsearch-mapper-attachments.>> Before going further, is it possible for you to fork my code [1], compile it and launch the main test class CouchdbRiverBinaryAttachementTest and send

...

read more »

in the case of multiple attachments , it halt to index the
attachments .

error:
Exception in thread "elasticsearch[Raman]couchdb_river_indexer-pool-21-
thread-1" java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
at java.util.HashMap$KeyIterator.next(HashMap.java:828)
at
org.elasticsearch.river.couchdb.CouchdbRiver.processLine(CouchdbRiver.java:
307)
at org.elasticsearch.river.couchdb.CouchdbRiver.access
$3(CouchdbRiver.java:249)
at org.elasticsearch.river.couchdb.CouchdbRiver
$Indexer.run(CouchdbRiver.java:426)
at java.lang.Thread.run(Thread.java:662)

On Jan 12, 8:10 am, David Pilato da...@pilato.fr wrote:

Strange. As far as I remember my tests, it should work with multiple attachments.
I will have a look.

David :wink:
@dadoonet

Le 12 janv. 2012 à 00:48, Chi Dung Tran dungtct...@yahoo.com a écrit :

It works well with documents of only one attachment
and ignores documents of two attachments.
Hopes it will be upgraded in futur versions
Thanks


De : David Pilato da...@pilato.fr
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Mardi 10 Janvier 2012 17h39
Objet : Re: Re : how to create index for a attachment of a doc in couchDB with ES?

You have to download my jar in my Github repo because 1.0.0 doesn't have the attachment function.

HTH
David
@dadoonet

Le 10 janv. 2012 à 17:32, Chi Dung Tran dungtct...@yahoo.com a écrit :

Thanks for your answer.
I have tested exactly the same as your documentation and your suggestion below. But I always receive zero hits. I really believe that the river did not retrieve and analyze attached files to index it later although I am using 1.0.0 couchdb-river version.
Myabe the difference is the attached pdf file. Could you send me your file (if it is not secret or private)
Thanks a lot

De : David Pilato da...@pilato.fr
À : elasticsearch@googlegroups.com
Envoyé le : Dimanche 8 Janvier 2012 23h37
Objet : RE: how to create index for a attachment of a doc in couchDB with ES?

Hi,

Just wondering why you try to send a mapping to ES ?
BTW, your mapping is incorrect. You don’t have to define each attachment.

In the README file, I wrote:
$ curl -X PUThttp://127.0.0.1:9200/my_db/my_db/_mapping-d '{
"my_db": {
"properties": {
"_attachments": {
"properties": {
"attachment": {
"type": "attachment"
}
}
},
"yourfield" : {
"type": "string"
}
}
}
}'

I suggest that you try to simply get attachments from couchDb with the example I wrote and then, if working and if you really need to, play with mappings.
Try also to make simplier searches to start. Something like:
curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"text" : { "_all" : "temperature" } }

(note the lowercase t on temperature).

Documentation is here :https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments

HTH
David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de Chi Dung Tran
Envoyé : dimanche 8 janvier 2012 02:16
À : "elasticsearch@googlegroups.com"
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hi
But it doesn'e work for me.
I receive the same results as the mail I have already sent (below)
Please explain it to me.
You can see, I donot have hits.
Did I make a mistake or not?
Thanks

----- Mail transféré -----
De : Chi Dung Tran dungtct...@yahoo.com
À : "elasticsearch@googlegroups.com" elasticsearch@googlegroups.com
Envoyé le : Jeudi 29 Décembre 2011 16h42
Objet : Re : how to create index for a attachment of a doc in couchDB with ES?

Hello
I have tested it but it doesnot work well.
I attach 4 files to the 2 couchdb documents like that:
{
"_id": "Doc1",
"_rev": "5-4d607b7d88985097462ae9b2f67bc5ac",
"message": "Elastic Search",
"_attachments": {
"exam.docx": {
"content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"revpos": 4,
"digest": "md5-ecdBcsbc6w7mC1EOgd5SIg==",
"length": 10007,
"stub": true
},
"2230681.pdf": {
"content_type": "application/pdf",
"revpos": 2,
"digest": "md5-BUhqhHiVqKybxrfGsQTixQ==",
"length": 956146,
"stub": true
}
}
}
{
"_id": "Doc2",
"_rev": "7-dd58025abc2002566b6f458ad3d83d4d",
"message": test attachments",
"_attachments": {
"TestAttachments.txt": {
"content_type": "text/plain",
"revpos": 6,
"digest": "md5-aLTD+adMRHPw2+WMIN/42Q==",
"length": 89,
"stub": true
},
"DynamicPublishingUseCases.doc": {
"content_type": "application/msword",
"revpos": 2,
"digest": "md5-FRdhydLr57C+q3ff6xLEmA==",
"length": 22528,
"stub": true
}
}
}

Here is my test with Elasticsearch:
curl -X PUT "localhost:9200/test_idx_couchdb_attachments"
{"ok":true,"acknowledged":true}

curl -XPUT 'http://localhost:9200/_river/test_river_couchdb_attachments/_meta'-d '{"type" : "couchdb", "couchdb" : {"host" : "localhost","port" : 5984,"db" : "my_test_couchdb_attachments","filter" : null,"ignore_attachments":false}},"index" : {"index" : "test_idx_couchdb_attachments", "type" : "test_mapping_couchdb_attachments" } }'
{"ok":true,"_index":"_river","_type":"test_river_couchdb_attachments","_id" :"_meta","_version":1}
At first, I type:
curl -X PUThttp://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
"2230681.pdf": {
"type": "attachment", "index" : "analyzed"
},
"DynamicPublishingUseCases.doc": {
"type": "attachment", "index" : "analyzed"
},
"TestAttachments.txt": {
"type": "attachment", "index" : "analyzed"
},
"exam.docx": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
and I receive results:
{"error":"MergeMappingException[Merge failed with failures {[Can't merge a non object mapping [TestAttachments.txt] with an object mapping [TestAttachments.txt], Can't merge a non object mapping [2230681.pdf] with an object mapping [2230681.pdf], Can't merge a non object mapping [exam.docx] with an object mapping [exam.docx], Can't merge a non object mapping [DynamicPublishingUseCases.doc] with an object mapping [DynamicPublishingUseCases.doc]]}]
So I change and I succeed:
curl -X PUThttp://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
""2230681.pdf"": {
"type": "attachment", "index" : "analyzed"
},
""DynamicPublishingUseCases.doc"": {
"type": "attachment", "index" : "analyzed"
},
""TestAttachments.txt"": {
"type": "attachment", "index" : "analyzed"
},
""exam.docx"": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
{"ok":true,"acknowledged":true}
curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"wildcard" : { "_all" : "*" } } }'
This query works well by returning two documents
these queries donot work well with errors or no expected results:curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"text" : { "_attachments."2230681.pdf".content" : "Temperature" } } }'{"took":0,"timed_out":false,"_shards":{"total":5,"successful":5,"failed": 0},"hits":{"total":0,"max_score":null,"hits":}}curl -XGET 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }'{"error":"SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[CHYAFYCERMGHlBvKHiEagA][my_test_couchdb_attachments][3]: SearchParseException[[my_test_couchdb_attachments][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query" : {"text" : { "_attachments.[2230681.pdf] : "Temperature" } } }]]]; nested: QueryParsingException[[my_test_couchdb_attachments] Failed to parse]; nested: JsonParseException[Unexpected character ('T' (code 84)): was expecting a colon to separate field name and value\n at [Source: [B@582a85; line: 1, column: 56]]; }]curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"text" : { "_attachments."DynamicPublishingUseCases.doc"" : "Rendering" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}curl -XPOST 'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att...-d '{"query" : {"text_phrase" : { "_attachments."TestAttachments.txt"" : "Couchdb" } } }'{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : }}I have used Elastic search 0.18.4 with river Couchdb 1.1.0 and mapper-attachments plugins 1.1.0Please explain to me and how to index couchdb attachments and make it searchable?Maybe I should submit the pdf files whose content is encoded with base64? All my files in my test arenot encoded.Thanks

...

read more »

Heya,

Thanks !
I pushed an update here :
https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsea
rch-river-couchdb-1.2.0-SNAPSHOT.zip

You must use it with the latest Elasticsearch version :

Please let me know if it works fine now.

David.

-----Message d'origine-----
De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Goog Jobs
Envoyé : samedi 25 février 2012 05:05
À : elasticsearch
Objet : Re: Re : Re : how to create index for a attachment of a doc in
couchDB with ES?

in the case of multiple attachments , it halt to index the
attachments .

error:
Exception in thread "elasticsearch[Raman]couchdb_river_indexer-pool-21-
thread-1" java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
at java.util.HashMap$KeyIterator.next(HashMap.java:828)
at
org.elasticsearch.river.couchdb.CouchdbRiver.processLine(CouchdbRiver.j
ava:
307)
at org.elasticsearch.river.couchdb.CouchdbRiver.access
$3(CouchdbRiver.java:249)
at org.elasticsearch.river.couchdb.CouchdbRiver
$Indexer.run(CouchdbRiver.java:426)
at java.lang.Thread.run(Thread.java:662)

On Jan 12, 8:10 am, David Pilato da...@pilato.fr wrote:

Strange. As far as I remember my tests, it should work with multiple
attachments.
I will have a look.

David :wink:
@dadoonet

Le 12 janv. 2012 à 00:48, Chi Dung Tran dungtct...@yahoo.com a
écrit :

It works well with documents of only one attachment
and ignores documents of two attachments.
Hopes it will be upgraded in futur versions
Thanks


De : David Pilato da...@pilato.fr
À : "elasticsearch@googlegroups.com"
elasticsearch@googlegroups.com
Envoyé le : Mardi 10 Janvier 2012 17h39
Objet : Re: Re : how to create index for a attachment of a doc in
couchDB with ES?

You have to download my jar in my Github repo because 1.0.0 doesn't
have the attachment function.

HTH
David
@dadoonet

Le 10 janv. 2012 à 17:32, Chi Dung Tran dungtct...@yahoo.com a
écrit :

Thanks for your answer.
I have tested exactly the same as your documentation and your
suggestion below. But I always receive zero hits. I really believe that
the river did not retrieve and analyze attached files to index it later
although I am using 1.0.0 couchdb-river version.
Myabe the difference is the attached pdf file. Could you send me
your file (if it is not secret or private)
Thanks a lot

De : David Pilato da...@pilato.fr
À : elasticsearch@googlegroups.com
Envoyé le : Dimanche 8 Janvier 2012 23h37
Objet : RE: how to create index for a attachment of a doc in
couchDB with ES?

Hi,

Just wondering why you try to send a mapping to ES ?
BTW, your mapping is incorrect. You don’t have to define each
attachment.

In the README file, I wrote:
$ curl -X PUThttp://127.0.0.1:9200/my_db/my_db/_mapping-d '{
"my_db": {
"properties": {
"_attachments": {
"properties": {
"attachment": {
"type": "attachment"
}
}
},
"yourfield" : {
"type": "string"
}
}
}
}'

I suggest that you try to simply get attachments from couchDb with
the example I wrote and then, if working and if you really need to,
play with mappings.
Try also to make simplier searches to start. Something like:
curl -XGET
'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att.
..-d '{"query" : {"text" : { "_all" : "temperature" } }

(note the lowercase t on temperature).

Documentation is here :https://github.com/dadoonet/elasticsearch-
river-couchdb/tree/attachments

HTH
David.

De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Chi Dung Tran
Envoyé : dimanche 8 janvier 2012 02:16
À : "elasticsearch@googlegroups.com"
Objet : Re : how to create index for a attachment of a doc in
couchDB with ES?

Hi
But it doesn'e work for me.
I receive the same results as the mail I have already sent (below)
Please explain it to me.
You can see, I donot have hits.
Did I make a mistake or not?
Thanks

----- Mail transféré -----
De : Chi Dung Tran dungtct...@yahoo.com
À : "elasticsearch@googlegroups.com"
elasticsearch@googlegroups.com
Envoyé le : Jeudi 29 Décembre 2011 16h42
Objet : Re : how to create index for a attachment of a doc in
couchDB with ES?

Hello
I have tested it but it doesnot work well.
I attach 4 files to the 2 couchdb documents like that:
{
"_id": "Doc1",
"_rev": "5-4d607b7d88985097462ae9b2f67bc5ac",
"message": "Elastic Search",
"_attachments": {
"exam.docx": {
"content_type": "application/vnd.openxmlformats-
officedocument.wordprocessingml.document",
"revpos": 4,
"digest": "md5-ecdBcsbc6w7mC1EOgd5SIg==",
"length": 10007,
"stub": true
},
"2230681.pdf": {
"content_type": "application/pdf",
"revpos": 2,
"digest": "md5-BUhqhHiVqKybxrfGsQTixQ==",
"length": 956146,
"stub": true
}
}
}
{
"_id": "Doc2",
"_rev": "7-dd58025abc2002566b6f458ad3d83d4d",
"message": test attachments",
"_attachments": {
"TestAttachments.txt": {
"content_type": "text/plain",
"revpos": 6,
"digest": "md5-aLTD+adMRHPw2+WMIN/42Q==",
"length": 89,
"stub": true
},
"DynamicPublishingUseCases.doc": {
"content_type": "application/msword",
"revpos": 2,
"digest": "md5-FRdhydLr57C+q3ff6xLEmA==",
"length": 22528,
"stub": true
}
}
}

Here is my test with Elasticsearch:
curl -X PUT "localhost:9200/test_idx_couchdb_attachments"
{"ok":true,"acknowledged":true}

curl -XPUT
'http://localhost:9200/_river/test_river_couchdb_attachments/_meta'-d
'{"type" : "couchdb", "couchdb" : {"host" : "localhost","port" :
5984,"db" : "my_test_couchdb_attachments","filter" :
null,"ignore_attachments":false}},"index" : {"index" :
"test_idx_couchdb_attachments", "type" :
"test_mapping_couchdb_attachments" } }'

{"ok":true,"_index":"_river","_type":"test_river_couchdb_attachments","
_id" :"_meta","_version":1}

At first, I type:
curl -X
PUThttp://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_at
t...-d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
"2230681.pdf": {
"type": "attachment", "index" : "analyzed"
},
"DynamicPublishingUseCases.doc": {
"type": "attachment", "index" : "analyzed"
},
"TestAttachments.txt": {
"type": "attachment", "index" : "analyzed"
},
"exam.docx": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
and I receive results:
{"error":"MergeMappingException[Merge failed with failures {[Can't
merge a non object mapping [TestAttachments.txt] with an object mapping
[TestAttachments.txt], Can't merge a non object mapping [2230681.pdf]
with an object mapping [2230681.pdf], Can't merge a non object mapping
[exam.docx] with an object mapping [exam.docx], Can't merge a non
object mapping [DynamicPublishingUseCases.doc] with an object mapping
[DynamicPublishingUseCases.doc]]}]
So I change and I succeed:
curl -X
PUThttp://127.0.0.1:9200/my_test_couchdb_attachments/my_test_couchdb_at
t...-d '{
"my_test_couchdb_attachments": {
"properties": {
"_attachments": {
"properties": {
""2230681.pdf"": {
"type": "attachment", "index" : "analyzed"
},
""DynamicPublishingUseCases.doc"": {
"type": "attachment", "index" : "analyzed"
},
""TestAttachments.txt"": {
"type": "attachment", "index" : "analyzed"
},
""exam.docx"": {
"type": "attachment", "index" : "analyzed"
}
}
},
"message" : {
"type": "string", "index" : "analyzed"
}
}
}
}'
{"ok":true,"acknowledged":true}
curl -XPOST
'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att.
..-d '{"query" : {"wildcard" : { "_all" : "*" } } }'
This query works well by returning two documents
these queries donot work well with errors or no expected
results:curl -XGET
'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att.
..-d '{"query" : {"text" : { "_attachments."2230681.pdf".content" :
"Temperature" } }
}'{"took":0,"timed_out":false,"_shards":{"total":5,"successful":5,"fail
ed": 0},"hits":{"total":0,"max_score":null,"hits":}}curl -XGET
'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att.
..-d '{"query" : {"text" : { "_attachments.[2230681.pdf] :
"Temperature" } } }'{"error":"SearchPhaseExecutionException[Failed to
execute phase [query], total failure; shardFailures
{[CHYAFYCERMGHlBvKHiEagA][my_test_couchdb_attachments][3]:
SearchParseException[[my_test_couchdb_attachments][3]: from[-1],size[-
1]: Parse Failure [Failed to parse source [{"query" : {"text" : {
"_attachments.[2230681.pdf] : "Temperature" } } }]]]; nested:
QueryParsingException[[my_test_couchdb_attachments] Failed to parse];
nested: JsonParseException[Unexpected character ('T' (code 84)): was
expecting a colon to separate field name and value\n at [Source:
[B@582a85; line: 1, column: 56]]; }]curl -XPOST
'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att.
..-d '{"query" : {"text" : {
"_attachments."DynamicPublishingUseCases.doc"" : "Rendering" } } }'{
"took" : 0, "timed_out" : false, "_shards" : { "total" : 5,
"successful" : 5, "failed" : 0 }, "hits" : { "total" : 0,
"max_score" : null, "hits" : }}curl -XPOST
'http://localhost:9200/my_test_couchdb_attachments/my_test_couchdb_att.
..-d '{"query" : {"text_phrase" : {
"_attachments."TestAttachments.txt"" : "Couchdb" } } }'{ "took" : 0,
"timed_out" : false, "_shards" : { "total" : 5, "successful" :
5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" :
null, "hits" : }}I have used Elastic search 0.18.4 with river
Couchdb 1.1.0 and mapper-attachments plugins 1.1.0Please explain to me
and how to index couchdb attachments and make it searchable?Maybe I
should submit the pdf files whose content is encoded with base64? All
my files in my test arenot encoded.Thanks

...

read more »