How to create index for a attachment of pdf by using elasticsearch-river-couchdb(1.2.0) (don't have any hits)

Jordon · June 18, 2013, 8:36am

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials
and post on index and mapping attached document in a couchdb database for
days without success. After running the codes below i don't have any hits
from words that exist in the couchdb attached files.

software version:
CouchDB: 1.2.1
elasticsearch-river-couchdb: 1.2.0
elasticsearch: 0.90
elasticsearch-mapper-attachments: 1.7.0

1.create a index
curl -XPUT "http://127.0.0.1:9200/pdfcouchindex"

2.create a mapping
curl -XPUT 'http://127.0.0.1:9200/pdfcouchindex/pdftype/_mapping' -d '{
"pdftype": {
"properties" : {
"_attachments": {
"properties": {
""attachment"": {
"type": "attachment","index" : "analyzed"
}
}
},
"title": {
"type": "string"
}
}
}
}'

create the river
curl -XPUT "http://127.0.0.1:9200/_river/pdfcouchindex/_meta" -d '
{
"type": "couchdb",
"couchdb": {
"host": "127.0.0.1",
"port": "5984",
"db": "pdfcouch",
"filter": null
},
"index": {
"index": "pdfcouchindex",
"type": "pdftype"
}
}'
Retrieve the indexed document by the keyword
(1) curl -XPOST
'http://localhost:9200/pdfcouchindex/pdftype/_search?pretty=true' -d
'{"query" : {"text" : { "_all" : "Abstraction" } } }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

(2)http://localhost:9200/pdfcouchindex/pdftype/_search?q=*&pretty=true
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e00678a",
"_score" : 1.0, "_source" :
{"_rev":"1-6dc6e91540ad3ccc4a0d54f66c7574ac","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e00678a","address":"","_attachments":{"attachment":{"stub":true,"length":366987,"digest":"md5-G52n3UVybIbCOuQe5eX2dg==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
}, {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e006d81",
"_score" : 1.0, "_source" :
{"_rev":"1-2de491d3982a9f7ae0fb9b928dfc6c2e","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e006d81","address":"","_attachments":{"attachment":{"stub":true,"length":2357103,"digest":"md5-29s/N7EvL0E97d2uDqOkGw==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
} ]
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Fatima_Castiglione_M · June 18, 2013, 5:35pm

Same thing happened to me. Just could not make it work.
My CouchDB was working Ok, the _rivers were created, all of it. But nothing.
And could not find nobody to give me any orientation.

Then I switched to using Elasticsearch + the ScrutMyDocs app. Now, I have
some things working, at least I can index documents.

2013/6/18 Jordon quwu.ustb@gmail.com

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials
and post on index and mapping attached document in a couchdb database for
days without success. After running the codes below i don't have any hits
from words that exist in the couchdb attached files.

software version:
CouchDB: 1.2.1
elasticsearch-river-couchdb: 1.2.0
elasticsearch: 0.90
elasticsearch-mapper-attachments: 1.7.0

1.create a index
curl -XPUT "http://127.0.0.1:9200/pdfcouchindex"

2.create a mapping
curl -XPUT 'http://127.0.0.1:9200/pdfcouchindex/pdftype/_mapping' -d '{
"pdftype": {
"properties" : {
"_attachments": {
"properties": {
""attachment"": {
"type": "attachment","index" : "analyzed"
}
}
},
"title": {
"type": "string"
}
}
}
}'

create the river
curl -XPUT "http://127.0.0.1:9200/_river/pdfcouchindex/_meta" -d '
{
"type": "couchdb",
"couchdb": {
"host": "127.0.0.1",
"port": "5984",
"db": "pdfcouch",
"filter": null
},
"index": {
"index": "pdfcouchindex",
"type": "pdftype"
}
}'

Retrieve the indexed document by the keyword
(1) curl -XPOST '
http://localhost:9200/pdfcouchindex/pdftype/_search?pretty=true' -d
'{"query" : {"text" : { "_all" : "Abstraction" } } }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" :
}
}

(2)http://localhost:9200/pdfcouchindex/pdftype/_search?q=*&pretty=true
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e00678a",
"_score" : 1.0, "_source" :
{"_rev":"1-6dc6e91540ad3ccc4a0d54f66c7574ac","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e00678a","address":"","_attachments":{"attachment":{"stub":true,"length":366987,"digest":"md5-G52n3UVybIbCOuQe5eX2dg==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
}, {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e006d81",
"_score" : 1.0, "_source" :
{"_rev":"1-2de491d3982a9f7ae0fb9b928dfc6c2e","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e006d81","address":"","_attachments":{"attachment":{"stub":true,"length":2357103,"digest":"md5-29s/N7EvL0E97d2uDqOkGw==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
} ]
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Fátima Castiglione Maldonado
castiglionemaldonado@gmail.com

                 ____
               ,'_   |

|||
< ) .------.
-----------,------.-' ,-' -.
| | | ,' . ,' | | ,' .
| ,-' | /
,'-' . ---.|_________
.--' -----. | _____________________ -. ----- | | ___| | | \ ,- \ | | ___| |===========================((|) | | | | | | _____________________/ - / |
--._ -----' | _________________,-' ----- | .-._ ,' __.---' | /
| -. | \ / . | | . ,' | | | . ,'
_____,------------------. -._ _,-' <___________________________) ------'
| | |
`.___|

=================================

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · June 19, 2013, 8:32am

CouchDB river does not read attachment content for now. See https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/25
Why this? Because couchDb _changes API does not provide attachments. So we need to have new round trips to couchDb to index attachments as well.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 18 juin 2013 à 10:36, Jordon quwu.ustb@gmail.com a écrit :

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials and post on index and mapping attached document in a couchdb database for days without success. After running the codes below i don't have any hits from words that exist in the couchdb attached files.

software version:
CouchDB: 1.2.1
elasticsearch-river-couchdb: 1.2.0
elasticsearch: 0.90
elasticsearch-mapper-attachments: 1.7.0

1.create a index
curl -XPUT "http://127.0.0.1:9200/pdfcouchindex"

2.create a mapping
curl -XPUT 'http://127.0.0.1:9200/pdfcouchindex/pdftype/_mapping' -d '{
"pdftype": {
"properties" : {
"_attachments": {
"properties": {
""attachment"": {
"type": "attachment","index" : "analyzed"
}
}
},
"title": {
"type": "string"
}
}
}
}'

create the river
curl -XPUT "http://127.0.0.1:9200/_river/pdfcouchindex/_meta" -d '
{
"type": "couchdb",
"couchdb": {
"host": "127.0.0.1",
"port": "5984",
"db": "pdfcouch",
"filter": null
},
"index": {
"index": "pdfcouchindex",
"type": "pdftype"
}
}'

Retrieve the indexed document by the keyword
(1) curl -XPOST 'http://localhost:9200/pdfcouchindex/pdftype/_search?pretty=true' -d '{"query" : {"text" : { "_all" : "Abstraction" } } }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" :
}
}

(2)http://localhost:9200/pdfcouchindex/pdftype/_search?q=*&pretty=true
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e00678a",
"_score" : 1.0, "_source" : {"_rev":"1-6dc6e91540ad3ccc4a0d54f66c7574ac","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e00678a","address":"","_attachments":{"attachment":{"stub":true,"length":366987,"digest":"md5-G52n3UVybIbCOuQe5eX2dg==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
}, {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e006d81",
"_score" : 1.0, "_source" : {"_rev":"1-2de491d3982a9f7ae0fb9b928dfc6c2e","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e006d81","address":"","_attachments":{"attachment":{"stub":true,"length":2357103,"digest":"md5-29s/N7EvL0E97d2uDqOkGw==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
} ]
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · June 19, 2013, 8:34am

Fatima,

Scrutmydocs only index documents from your local file system using FSRiver.
It's not exactly the same use case as described by Jordon as Jordon has already its attachments in CouchDB as far as I understand it.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 18 juin 2013 à 19:35, Fatima Castiglione Maldonado 发 castiglionemaldonado@gmail.com a écrit :

Same thing happened to me. Just could not make it work.
My CouchDB was working Ok, the _rivers were created, all of it. But nothing.
And could not find nobody to give me any orientation.

Then I switched to using Elasticsearch + the ScrutMyDocs app. Now, I have some things working, at least I can index documents.

2013/6/18 Jordon quwu.ustb@gmail.com
Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials and post on index and mapping attached document in a couchdb database for days without success. After running the codes below i don't have any hits from words that exist in the couchdb attached files.

software version:
CouchDB: 1.2.1
elasticsearch-river-couchdb: 1.2.0
elasticsearch: 0.90
elasticsearch-mapper-attachments: 1.7.0

1.create a index
curl -XPUT "http://127.0.0.1:9200/pdfcouchindex"

2.create a mapping
curl -XPUT 'http://127.0.0.1:9200/pdfcouchindex/pdftype/_mapping' -d '{
"pdftype": {
"properties" : {
"_attachments": {
"properties": {
""attachment"": {
"type": "attachment","index" : "analyzed"
}
}
},
"title": {
"type": "string"
}
}
}
}'

create the river
curl -XPUT "http://127.0.0.1:9200/_river/pdfcouchindex/_meta" -d '
{
"type": "couchdb",
"couchdb": {
"host": "127.0.0.1",
"port": "5984",
"db": "pdfcouch",
"filter": null
},
"index": {
"index": "pdfcouchindex",
"type": "pdftype"
}
}'

Retrieve the indexed document by the keyword
(1) curl -XPOST 'http://localhost:9200/pdfcouchindex/pdftype/_search?pretty=true' -d '{"query" : {"text" : { "_all" : "Abstraction" } } }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" :
}
}

(2)http://localhost:9200/pdfcouchindex/pdftype/_search?q=*&pretty=true
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e00678a",
"_score" : 1.0, "_source" : {"_rev":"1-6dc6e91540ad3ccc4a0d54f66c7574ac","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e00678a","address":"","_attachments":{"attachment":{"stub":true,"length":366987,"digest":"md5-G52n3UVybIbCOuQe5eX2dg==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
}, {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e006d81",
"_score" : 1.0, "_source" : {"_rev":"1-2de491d3982a9f7ae0fb9b928dfc6c2e","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e006d81","address":"","_attachments":{"attachment":{"stub":true,"length":2357103,"digest":"md5-29s/N7EvL0E97d2uDqOkGw==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
} ]
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Fátima Castiglione Maldonado
castiglionemaldonado@gmail.com
                 ____
               ,'_   |
_______|||
< ) .------.
-----------,------.-' ,-' -.
             |    |  |              ,'                `.
            ,'    |  |            ,'                    `.
            |  _,-'  |__         /                        \
          _,'-'    `.   `---.___|_____________             \

      .--'  -----.  | _____________________   `-. -----     |
      |    ___|  |  |                      \  ,- \          |
      |    ___|  |===========================((|) |         |
      |       |  |  | _____________________/  `- /          |

      `--._ -----'  |        _________________,-' -----     |
           `.-._   ,' __.---'   |                          /
            |   `-.  |           \                        /
            `.    |  |            `.                    ,'

             |    |  |              `.                ,'
_____,------------------. -._ _,-' <___________________________) ------'
| _| |
               `.____|
=================================

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jordon · June 19, 2013, 3:01pm

Dear David Pilato:
You good job for us to save a lot of time, thank you very much for your
generous help!

在 2013年6月19日星期三UTC-4上午4时32分52秒，David Pilato写道：

CouchDB river does not read attachment content for now. See
https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/25
Why this? Because couchDb _changes API does not provide attachments. So we
need to have new round trips to couchDb to index attachments as well.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 18 juin 2013 à 10:36, Jordon <quwu...@gmail.com <javascript:>> a écrit
:

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials
and post on index and mapping attached document in a couchdb database for
days without success. After running the codes below i don't have any hits
from words that exist in the couchdb attached files.

software version:
CouchDB: 1.2.1
elasticsearch-river-couchdb: 1.2.0
elasticsearch: 0.90
elasticsearch-mapper-attachments: 1.7.0

1.create a index
curl -XPUT "http://127.0.0.1:9200/pdfcouchindex"

2.create a mapping
curl -XPUT 'http://127.0.0.1:9200/pdfcouchindex/pdftype/_mapping' -d '{
"pdftype": {
"properties" : {
"_attachments": {
"properties": {
""attachment"": {
"type": "attachment","index" : "analyzed"
}
}
},
"title": {
"type": "string"
}
}
}
}'

create the river
curl -XPUT "http://127.0.0.1:9200/_river/pdfcouchindex/_meta" -d '
{
"type": "couchdb",
"couchdb": {
"host": "127.0.0.1",
"port": "5984",
"db": "pdfcouch",
"filter": null
},
"index": {
"index": "pdfcouchindex",
"type": "pdftype"
}
}'

Retrieve the indexed document by the keyword
(1) curl -XPOST '
http://localhost:9200/pdfcouchindex/pdftype/_search?pretty=true' -d
'{"query" : {"text" : { "_all" : "Abstraction" } } }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" :
}
}

(2)http://localhost:9200/pdfcouchindex/pdftype/_search?q=*&pretty=true
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e00678a",
"_score" : 1.0, "_source" :
{"_rev":"1-6dc6e91540ad3ccc4a0d54f66c7574ac","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e00678a","address":"","_attachments":{"attachment":{"stub":true,"length":366987,"digest":"md5-G52n3UVybIbCOuQe5eX2dg==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
}, {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e006d81",
"_score" : 1.0, "_source" :
{"_rev":"1-2de491d3982a9f7ae0fb9b928dfc6c2e","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e006d81","address":"","_attachments":{"attachment":{"stub":true,"length":2357103,"digest":"md5-29s/N7EvL0E97d2uDqOkGw==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
} ]
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Fatima_Castiglione_M · June 20, 2013, 4:15am

Sorry, maybe I did not explain myself right.

I tried first pushing my documents into CouchDB, and indexing them with
Elasticsearch.
I could see them from CouchDB, but could not make the CouchDB _river to
index them ( not the file system _reader).

While I was looking for a solution, I came across ScrutMyDocs, and the app
were much more close to what I needed.

So then I quitted using CouchDB and the CouchDB _river, and started using
StrutMyDocs and the file system _river. Now I got part of it working, and
your help has been very appreciated.

Next, I would like to modify ScrutMyDocs (I'm working on that) and to share
the new version in GitHub.

2013/6/19 Jordon quwu.ustb@gmail.com

Dear David Pilato:
You good job for us to save a lot of time, thank you very much for your
generous help!

在 2013年6月19日星期三UTC-4上午4时32分52秒，David Pilato写道：

CouchDB river does not read attachment content for now. See
https://github.com/**elasticsearch/elasticsearch-**
river-couchdb/issues/25https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/25
Why this? Because couchDb _changes API does not provide attachments. So
we need to have new round trips to couchDb to index attachments as well.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
|** @scrutmydocs https://twitter.com/scrutmydocs

Le 18 juin 2013 à 10:36, Jordon quwu...@gmail.com a écrit :

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials
and post on index and mapping attached document in a couchdb database for
days without success. After running the codes below i don't have any hits
from words that exist in the couchdb attached files.

software version:
CouchDB: 1.2.1
elasticsearch-river-couchdb: 1.2.0
elasticsearch: 0.90
elasticsearch-mapper-**attachments: 1.7.0

1.create a index
curl -XPUT "http://127.0.0.1:9200/**pdfcouchindex http://127.0.0.1:9200/pdfcouchindex
"

2.create a mapping
curl -XPUT 'http://127.0.0.1:9200/**pdfcouchindex/pdftype/_**mapping'<http://127.0.0.1:9200/pdfcouchindex/pdftype/_mapping'>-d '{
"pdftype": {
"properties" : {
"_attachments": {
"properties": {
""attachment"": {
"type": "attachment","index" : "analyzed"
}
}
},
"title": {
"type": "string"
}
}
}
}'

create the river
curl -XPUT "http://127.0.0.1:9200/_river/**pdfcouchindex/_meta http://127.0.0.1:9200/_river/pdfcouchindex/_meta"
-d '
{
"type": "couchdb",
"couchdb": {
"host": "127.0.0.1",
"port": "5984",
"db": "pdfcouch",
"filter": null
},
"index": {
"index": "pdfcouchindex",
"type": "pdftype"
}
}'

Retrieve the indexed document by the keyword
(1) curl -XPOST 'http://localhost:9200/**pdfcouchindex/pdftype/_search?**
pretty=true'http://localhost:9200/pdfcouchindex/pdftype/_search?pretty=true'-d '{"query" : {"text" : { "_all" : "Abstraction" } } }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" :
}
}

(2)http://localhost:9200/**pdfcouchindex/pdftype/_search?**
q=*&pretty=truehttp://localhost:9200/pdfcouchindex/pdftype/_search?q=*&pretty=true
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e00678a",
"_score" : 1.0, "_source" : {"_rev":"1-**
6dc6e91540ad3ccc4a0d54f66c7574ac","authors":"","
organization":"","title":"mongodb","keywords":"","issn":
"","abstracts":"","id":"60699284e3faaeef260fc43e7e0067
8a","address":"","attachments":{"attachment":{"
stub":true,"length":366987,"digest":"md5-G52n3UVybIbCOuQe5eX2dg==","
revpos":1,"content_type":"application/pdf"}},"
foundation":"","media":""}
}, {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e006d81",
"_score" : 1.0, "_source" : {"_rev":"1-
2de491d3982a9f7ae0fb9b928dfc6c2e","authors":"","**
organization":"","title":"mongodb","keywords":"","issn":
"","abstracts":"","id":"60699284e3faaeef260fc43e7e006d
81","address":"","attachments":{"attachment":{"
stub":true,"length":2357103,"digest":"md5-29s/
N7EvL0E97d2uDqOkGw==","revpos"**:1,"content_type":"**application/pdf"}},"
**foundation":"","media":""}
} ]
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Fátima Castiglione Maldonado
castiglionemaldonado@gmail.com

                 ____
               ,'_   |

|||
< ) .------.
-----------,------.-' ,-' -.
| | | ,' . ,' | | ,' .
| ,-' | /
,'-' . ---.|_________
.--' -----. | _____________________ -. ----- | | ___| | | \ ,- \ | | ___| |===========================((|) | | | | | | _____________________/ - / |
--._ -----' | _________________,-' ----- | .-._ ,' __.---' | /
| -. | \ / . | | . ,' | | | . ,'
_____,------------------. -._ _,-' <___________________________) ------'
| | |
`.___|

=================================

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · June 20, 2013, 10:53am

I understood that.

My remark was more about the fact that you wanted to index binary documents you have on your file system.

For that need, you decided to push them in couchDb and add a couchDb river (which does not work due to issue #25). But you could have done it:

manually: read the file system, create JSon with base64 encoded content and push to elasticsearch
using FS River: FS River index documents from your local file system
using scrutmydocs: basically we put all that stuff together in a web app (elasticsearch, fsriver, mapper attachment plugin)

About scrutmydocs, just be aware that it's a moving part right now. I will probably release it (0.3.0) in next days/weeks.
So please update your own fork on a regular basis as it can move a little.

Thanks for your coming contributions!

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 20 juin 2013 à 06:15, Fatima Castiglione Maldonado 发 castiglionemaldonado@gmail.com a écrit :

Sorry, maybe I did not explain myself right.

I tried first pushing my documents into CouchDB, and indexing them with Elasticsearch.
I could see them from CouchDB, but could not make the CouchDB _river to index them ( not the file system _reader).

While I was looking for a solution, I came across ScrutMyDocs, and the app were much more close to what I needed.

So then I quitted using CouchDB and the CouchDB _river, and started using StrutMyDocs and the file system _river. Now I got part of it working, and your help has been very appreciated.

Next, I would like to modify ScrutMyDocs (I'm working on that) and to share the new version in GitHub.

2013/6/19 Jordon quwu.ustb@gmail.com
Dear David Pilato:
You good job for us to save a lot of time, thank you very much for your generous help!

在 2013年6月19日星期三UTC-4上午4时32分52秒，David Pilato写道：
CouchDB river does not read attachment content for now. See https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/25
Why this? Because couchDb _changes API does not provide attachments. So we need to have new round trips to couchDb to index attachments as well.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 18 juin 2013 à 10:36, Jordon quwu...@gmail.com a écrit :

Dear All,
I am new to elasticsearch. I have tried to follow the different tutorials and post on index and mapping attached document in a couchdb database for days without success. After running the codes below i don't have any hits from words that exist in the couchdb attached files.

software version:
CouchDB: 1.2.1
elasticsearch-river-couchdb: 1.2.0
elasticsearch: 0.90
elasticsearch-mapper-attachments: 1.7.0

1.create a index
curl -XPUT "http://127.0.0.1:9200/pdfcouchindex"

2.create a mapping
curl -XPUT 'http://127.0.0.1:9200/pdfcouchindex/pdftype/_mapping' -d '{
"pdftype": {
"properties" : {
"_attachments": {
"properties": {
""attachment"": {
"type": "attachment","index" : "analyzed"
}
}
},
"title": {
"type": "string"
}
}
}
}'

create the river
curl -XPUT "http://127.0.0.1:9200/_river/pdfcouchindex/_meta" -d '
{
"type": "couchdb",
"couchdb": {
"host": "127.0.0.1",
"port": "5984",
"db": "pdfcouch",
"filter": null
},
"index": {
"index": "pdfcouchindex",
"type": "pdftype"
}
}'

Retrieve the indexed document by the keyword
(1) curl -XPOST 'http://localhost:9200/pdfcouchindex/pdftype/_search?pretty=true' -d '{"query" : {"text" : { "_all" : "Abstraction" } } }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" :
}
}

(2)http://localhost:9200/pdfcouchindex/pdftype/_search?q=*&pretty=true
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e00678a",
"_score" : 1.0, "_source" : {"_rev":"1-6dc6e91540ad3ccc4a0d54f66c7574ac","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e00678a","address":"","_attachments":{"attachment":{"stub":true,"length":366987,"digest":"md5-G52n3UVybIbCOuQe5eX2dg==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
}, {
"_index" : "pdfcouchindex",
"_type" : "pdftype",
"_id" : "60699284e3faaeef260fc43e7e006d81",
"_score" : 1.0, "_source" : {"_rev":"1-2de491d3982a9f7ae0fb9b928dfc6c2e","authors":"","organization":"","title":"mongodb","keywords":"","issn":"","abstracts":"","_id":"60699284e3faaeef260fc43e7e006d81","address":"","_attachments":{"attachment":{"stub":true,"length":2357103,"digest":"md5-29s/N7EvL0E97d2uDqOkGw==","revpos":1,"content_type":"application/pdf"}},"foundation":"","media":""}
} ]
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Fátima Castiglione Maldonado
castiglionemaldonado@gmail.com
                 ____
               ,'_   |
_______|||
< ) .------.
-----------,------.-' ,-' -.
             |    |  |              ,'                `.
            ,'    |  |            ,'                    `.
            |  _,-'  |__         /                        \
          _,'-'    `.   `---.___|_____________             \

      .--'  -----.  | _____________________   `-. -----     |
      |    ___|  |  |                      \  ,- \          |
      |    ___|  |===========================((|) |         |
      |       |  |  | _____________________/  `- /          |

      `--._ -----'  |        _________________,-' -----     |
           `.-._   ,' __.---'   |                          /
            |   `-.  |           \                        /
            `.    |  |            `.                    ,'

             |    |  |              `.                ,'
_____,------------------. -._ _,-' <___________________________) ------'
| _| |
               `.____|
=================================

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
ES full text search on couchdb attachments documents Elasticsearch	7	1353	July 6, 2017
Problem CouchDB and attachment Elasticsearch	3	350	July 6, 2017
How to create index for a attachment of pdf by using elasticsearch-river-mongodb: 1.6.9 (don't have any hits,or missing fields) Elasticsearch	2	308	July 6, 2017
Attachments questions Elasticsearch	2	252	July 6, 2017
How to create index for a attachment of a doc in couchDB with ES? Elasticsearch	31	1003	July 6, 2017

How to create index for a attachment of pdf by using elasticsearch-river-couchdb(1.2.0) (don't have any hits)

--

--

--

--

Related topics