IMAPRiver plugin attachemnt index issue


(Gabriel Kapitany) #1

Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver
plugin elasticsearch-river-imap-0.0.7-b20 with
elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man]
version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man]
initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins ] [Shiver Man]
loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man]
initialized
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man]
starting ...
[2014-07-23 11:56:48,052][INFO ][transport ] [Shiver Man]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service ] [Shiver Man]
new_master [Shiver
Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery ] [Shiver Man]
elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway ] [Shiver Man]
recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man]
started
[2014-07-23 11:56:54,760][INFO
][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river
name: river-imap
[2014-07-23 11:56:54,761][INFO
][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] Using
default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job
execution threads will use class loader of thread: elasticsearch[Shiver
Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl]
Initialized Scheduler Signaller of type: class
org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz
Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore
initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] Scheduler
meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' with
instanceId 'NON_CLUSTERED'

Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

"type":"imap",
"mail.store.protocol":"imap",
"mail.imap.host":"cto-sdx01",
"mail.imap.port":993,
"mail.imap.ssl.enable":true,
"mail.imap.connectionpoolsize":"3",
"mail.debug":"false",
"mail.imap.timeout":10000,
"user":"gkapitan",
"password":"xxxxxxx",
"schedule":null,
"interval":"60s",
"threads":5,
"folderpattern":null,
"bulk_size":100,
"max_bulk_requests":"30",
"bulk_flush_interval":"5s",
"mail_index_name":"imapriverdata",
"mail_type_name":"mail",
"with_striptags_from_textcontent":true,
"with_attachments":true,
"with_text_content":true,
"with_flag_sync":true,
"index_settings" : null,
"type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of:
pdf, doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
"attachmentCount" : 1,
"attachments" : [ {
"content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
"contentType" : "text/csv; charset=us-ascii",
"size" : 33,
"filename" : "Book1.csv",
"name" : "Book1.csv"
} ],
"bcc" : null,
"cc" : null,
"contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
"flaghashcode" : 48,
"flags" : [ "Recent", "Seen" ],
"folderFullName" : "mail/gkapitan/Sent",
"folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
"from" : {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : "Gabriel Kapitany"
},
"headers" : [ {
"name" : "Content-Disposition",
"value" : "inline"
}, {
"name" : "Subject",
"value" : "csv"
}, {
"name" : "To",
"value" : "gkapitan@cto-sdx01.idm.symcto.com"
}, {
"name" : "Date",
"value" : "Wed, 23 Jul 2014 12:24:05 -0400"
}, {
"name" : "MIME-Version",
"value" : "1.0"
}, {
"name" : "Message-ID",
"value" : "20140723162358.GA3992@cto-sdx01.idm.symcto.com"
}, {
"name" : "User-Agent",
"value" : "Mutt/1.5.20 (2009-12-10)"
}, {
"name" : "Content-Type",
"value" : "multipart/mixed; boundary="wac7ysb48OaltWcw""
}, {
"name" : "From",
"value" : "Gabriel Kapitany gkapitan@cto-sdx01.idm.symcto.com"
} ],
"mailboxType" : "IMAP",
"popId" : null,
"receivedDate" : 1406132638000,
"sentDate" : 1406132645000,
"size" : 639,
"subject" : "csv",
"textContent" : "\r\n",
"to" : [ {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : null
} ],
"uid" : 16
}}]}}

Thanks,
Gabriel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7c1de60b-1b32-47be-a64a-e8784a104c2f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #2

How do you know that it has not been indexed?

You can't rely on the _source field. It sounds like attachment are here in attachments field.
It looks good to me.

May be you should check that the mapping is correct and attachments field has attachment type?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 23 juillet 2014 à 18:31:35, Gabriel Kapitany (gkapitany@gmail.com) a écrit:

Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver plugin elasticsearch-river-imap-0.0.7-b20 with elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man] version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man] initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins ] [Shiver Man] loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] initialized
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] starting ...
[2014-07-23 11:56:48,052][INFO ][transport ] [Shiver Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service ] [Shiver Man] new_master [Shiver Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]], reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery ] [Shiver Man] elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway ] [Shiver Man] recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man] started
[2014-07-23 11:56:54,760][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river name: river-imap
[2014-07-23 11:56:54,761][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] Using default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job execution threads will use class loader of thread: elasticsearch[Shiver Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl] Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' with instanceId 'NON_CLUSTERED'

Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

"type":"imap",
"mail.store.protocol":"imap",
"mail.imap.host":"cto-sdx01",
"mail.imap.port":993,
"mail.imap.ssl.enable":true,
"mail.imap.connectionpoolsize":"3",
"mail.debug":"false",
"mail.imap.timeout":10000,
"user":"gkapitan",
"password":"xxxxxxx",
"schedule":null,
"interval":"60s",
"threads":5,
"folderpattern":null,
"bulk_size":100,
"max_bulk_requests":"30",
"bulk_flush_interval":"5s",
"mail_index_name":"imapriverdata",
"mail_type_name":"mail",
"with_striptags_from_textcontent":true,
"with_attachments":true,
"with_text_content":true,
"with_flag_sync":true,
"index_settings" : null,
"type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of: pdf, doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
"attachmentCount" : 1,
"attachments" : [ {
"content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
"contentType" : "text/csv; charset=us-ascii",
"size" : 33,
"filename" : "Book1.csv",
"name" : "Book1.csv"
} ],
"bcc" : null,
"cc" : null,
"contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
"flaghashcode" : 48,
"flags" : [ "Recent", "Seen" ],
"folderFullName" : "mail/gkapitan/Sent",
"folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
"from" : {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : "Gabriel Kapitany"
},
"headers" : [ {
"name" : "Content-Disposition",
"value" : "inline"
}, {
"name" : "Subject",
"value" : "csv"
}, {
"name" : "To",
"value" : "gkapitan@cto-sdx01.idm.symcto.com"
}, {
"name" : "Date",
"value" : "Wed, 23 Jul 2014 12:24:05 -0400"
}, {
"name" : "MIME-Version",
"value" : "1.0"
}, {
"name" : "Message-ID",
"value" : "20140723162358.GA3992@cto-sdx01.idm.symcto.com"
}, {
"name" : "User-Agent",
"value" : "Mutt/1.5.20 (2009-12-10)"
}, {
"name" : "Content-Type",
"value" : "multipart/mixed; boundary="wac7ysb48OaltWcw""
}, {
"name" : "From",
"value" : "Gabriel Kapitany gkapitan@cto-sdx01.idm.symcto.com"
} ],
"mailboxType" : "IMAP",
"popId" : null,
"receivedDate" : 1406132638000,
"sentDate" : 1406132645000,
"size" : 639,
"subject" : "csv",
"textContent" : "\r\n",
"to" : [ {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : null
} ],
"uid" : 16
}}]}}

Thanks,
Gabriel

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7c1de60b-1b32-47be-a64a-e8784a104c2f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53cfe47b.216231b.13e40%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(Gabriel Kapitany) #3

Hi David,

I have checked if the attachment type matches the attachment type:
"contentType" : "text/csv; charset=us-ascii" and it does.

A search by a keyword in the the attachment brings up nothing. However if
I do a mime64 decode on the attachment (content filed) shows that the
attachment is correctly stored.

Thanks,
Gabriel

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:

Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver
plugin elasticsearch-river-imap-0.0.7-b20 with
elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man]
version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man]
initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins ] [Shiver Man]
loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man]
initialized
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man]
starting ...
[2014-07-23 11:56:48,052][INFO ][transport ] [Shiver Man]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service ] [Shiver Man]
new_master [Shiver Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery ] [Shiver Man]
elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway ] [Shiver Man]
recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man]
started
[2014-07-23 11:56:54,760][INFO
][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river
name: river-imap
[2014-07-23 11:56:54,761][INFO
][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory]
Using default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job
execution threads will use class loader of thread: elasticsearch[Shiver
Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl]
Initialized Scheduler Signaller of type: class
org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz
Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore
initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler]
Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler'
with instanceId 'NON_CLUSTERED'

Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

"type":"imap",
"mail.store.protocol":"imap",
"mail.imap.host":"cto-sdx01",
"mail.imap.port":993,
"mail.imap.ssl.enable":true,
"mail.imap.connectionpoolsize":"3",
"mail.debug":"false",
"mail.imap.timeout":10000,
"user":"gkapitan",
"password":"xxxxxxx",
"schedule":null,
"interval":"60s",
"threads":5,
"folderpattern":null,
"bulk_size":100,
"max_bulk_requests":"30",
"bulk_flush_interval":"5s",
"mail_index_name":"imapriverdata",
"mail_type_name":"mail",
"with_striptags_from_textcontent":true,
"with_attachments":true,
"with_text_content":true,
"with_flag_sync":true,
"index_settings" : null,
"type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of:
pdf, doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
"attachmentCount" : 1,
"attachments" : [ {
"content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
"contentType" : "text/csv; charset=us-ascii",
"size" : 33,
"filename" : "Book1.csv",
"name" : "Book1.csv"
} ],
"bcc" : null,
"cc" : null,
"contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
"flaghashcode" : 48,
"flags" : [ "Recent", "Seen" ],
"folderFullName" : "mail/gkapitan/Sent",
"folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
"from" : {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : "Gabriel Kapitany"
},
"headers" : [ {
"name" : "Content-Disposition",
"value" : "inline"
}, {
"name" : "Subject",
"value" : "csv"
}, {
"name" : "To",
"value" : "gkapitan@cto-sdx01.idm.symcto.com"
}, {
"name" : "Date",
"value" : "Wed, 23 Jul 2014 12:24:05 -0400"
}, {
"name" : "MIME-Version",
"value" : "1.0"
}, {
"name" : "Message-ID",
"value" : "20140723162358.GA3992@cto-sdx01.idm.symcto.com"
}, {
"name" : "User-Agent",
"value" : "Mutt/1.5.20 (2009-12-10)"
}, {
"name" : "Content-Type",
"value" : "multipart/mixed; boundary="wac7ysb48OaltWcw""
}, {
"name" : "From",
"value" : "Gabriel Kapitany gkapitan@cto-sdx01.idm.symcto.com"
} ],
"mailboxType" : "IMAP",
"popId" : null,
"receivedDate" : 1406132638000,
"sentDate" : 1406132645000,
"size" : 639,
"subject" : "csv",
"textContent" : "\r\n",
"to" : [ {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : null
} ],
"uid" : 16
}}]}}

Thanks,
Gabriel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5c8c6169-686a-40eb-9314-73b5ddd5e7ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #4

What kind of query are you running?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 23 juillet 2014 à 19:02:31, Gabriel Kapitany (gkapitany@gmail.com) a écrit:

Hi David,

I have checked if the attachment type matches the attachment type: "contentType" : "text/csv; charset=us-ascii" and it does.

A search by a keyword in the the attachment brings up nothing. However if I do a mime64 decode on the attachment (content filed) shows that the attachment is correctly stored.

Thanks,
Gabriel

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:
Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver plugin elasticsearch-river-imap-0.0.7-b20 with elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man] version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man] initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins ] [Shiver Man] loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] initialized
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] starting ...
[2014-07-23 11:56:48,052][INFO ][transport ] [Shiver Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service ] [Shiver Man] new_master [Shiver Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]], reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery ] [Shiver Man] elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway ] [Shiver Man] recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man] started
[2014-07-23 11:56:54,760][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river name: river-imap
[2014-07-23 11:56:54,761][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] Using default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job execution threads will use class loader of thread: elasticsearch[Shiver Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl] Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' with instanceId 'NON_CLUSTERED'

Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

"type":"imap",
"mail.store.protocol":"imap",
"mail.imap.host":"cto-sdx01",
"mail.imap.port":993,
"mail.imap.ssl.enable":true,
"mail.imap.connectionpoolsize":"3",
"mail.debug":"false",
"mail.imap.timeout":10000,
"user":"gkapitan",
"password":"xxxxxxx",
"schedule":null,
"interval":"60s",
"threads":5,
"folderpattern":null,
"bulk_size":100,
"max_bulk_requests":"30",
"bulk_flush_interval":"5s",
"mail_index_name":"imapriverdata",
"mail_type_name":"mail",
"with_striptags_from_textcontent":true,
"with_attachments":true,
"with_text_content":true,
"with_flag_sync":true,
"index_settings" : null,
"type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of: pdf, doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
"attachmentCount" : 1,
"attachments" : [ {
"content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
"contentType" : "text/csv; charset=us-ascii",
"size" : 33,
"filename" : "Book1.csv",
"name" : "Book1.csv"
} ],
"bcc" : null,
"cc" : null,
"contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
"flaghashcode" : 48,
"flags" : [ "Recent", "Seen" ],
"folderFullName" : "mail/gkapitan/Sent",
"folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
"from" : {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : "Gabriel Kapitany"
},
"headers" : [ {
"name" : "Content-Disposition",
"value" : "inline"
}, {
"name" : "Subject",
"value" : "csv"
}, {
"name" : "To",
"value" : "gkapitan@cto-sdx01.idm.symcto.com"
}, {
"name" : "Date",
"value" : "Wed, 23 Jul 2014 12:24:05 -0400"
}, {
"name" : "MIME-Version",
"value" : "1.0"
}, {
"name" : "Message-ID",
"value" : "20140723162358.GA3992@cto-sdx01.idm.symcto.com"
}, {
"name" : "User-Agent",
"value" : "Mutt/1.5.20 (2009-12-10)"
}, {
"name" : "Content-Type",
"value" : "multipart/mixed; boundary="wac7ysb48OaltWcw""
}, {
"name" : "From",
"value" : "Gabriel Kapitany gkapitan@cto-sdx01.idm.symcto.com"
} ],
"mailboxType" : "IMAP",
"popId" : null,
"receivedDate" : 1406132638000,
"sentDate" : 1406132645000,
"size" : 639,
"subject" : "csv",
"textContent" : "\r\n",
"to" : [ {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : null
} ],
"uid" : 16
}}]}}

Thanks,
Gabriel

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5c8c6169-686a-40eb-9314-73b5ddd5e7ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53cfeaf0.66ef438d.13e40%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(Gabriel Kapitany) #5

query:

curl -XGET 'http://localhost:9200/imapriverdata/_search' -d '{

"query" : {
"match" : { "_all" : "alpha" }
}
}
'

response:

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

Thanks,
Gabriel

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:

Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver
plugin elasticsearch-river-imap-0.0.7-b20 with
elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man]
version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man]
initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins ] [Shiver Man]
loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man]
initialized
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man]
starting ...
[2014-07-23 11:56:48,052][INFO ][transport ] [Shiver Man]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service ] [Shiver Man]
new_master [Shiver Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery ] [Shiver Man]
elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway ] [Shiver Man]
recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man]
started
[2014-07-23 11:56:54,760][INFO
][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river
name: river-imap
[2014-07-23 11:56:54,761][INFO
][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory]
Using default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job
execution threads will use class loader of thread: elasticsearch[Shiver
Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl]
Initialized Scheduler Signaller of type: class
org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz
Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore
initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler]
Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler'
with instanceId 'NON_CLUSTERED'

Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

"type":"imap",
"mail.store.protocol":"imap",
"mail.imap.host":"cto-sdx01",
"mail.imap.port":993,
"mail.imap.ssl.enable":true,
"mail.imap.connectionpoolsize":"3",
"mail.debug":"false",
"mail.imap.timeout":10000,
"user":"gkapitan",
"password":"xxxxxxx",
"schedule":null,
"interval":"60s",
"threads":5,
"folderpattern":null,
"bulk_size":100,
"max_bulk_requests":"30",
"bulk_flush_interval":"5s",
"mail_index_name":"imapriverdata",
"mail_type_name":"mail",
"with_striptags_from_textcontent":true,
"with_attachments":true,
"with_text_content":true,
"with_flag_sync":true,
"index_settings" : null,
"type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of:
pdf, doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
"attachmentCount" : 1,
"attachments" : [ {
"content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
"contentType" : "text/csv; charset=us-ascii",
"size" : 33,
"filename" : "Book1.csv",
"name" : "Book1.csv"
} ],
"bcc" : null,
"cc" : null,
"contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
"flaghashcode" : 48,
"flags" : [ "Recent", "Seen" ],
"folderFullName" : "mail/gkapitan/Sent",
"folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
"from" : {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : "Gabriel Kapitany"
},
"headers" : [ {
"name" : "Content-Disposition",
"value" : "inline"
}, {
"name" : "Subject",
"value" : "csv"
}, {
"name" : "To",
"value" : "gkapitan@cto-sdx01.idm.symcto.com"
}, {
"name" : "Date",
"value" : "Wed, 23 Jul 2014 12:24:05 -0400"
}, {
"name" : "MIME-Version",
"value" : "1.0"
}, {
"name" : "Message-ID",
"value" : "20140723162358.GA3992@cto-sdx01.idm.symcto.com"
}, {
"name" : "User-Agent",
"value" : "Mutt/1.5.20 (2009-12-10)"
}, {
"name" : "Content-Type",
"value" : "multipart/mixed; boundary="wac7ysb48OaltWcw""
}, {
"name" : "From",
"value" : "Gabriel Kapitany gkapitan@cto-sdx01.idm.symcto.com"
} ],
"mailboxType" : "IMAP",
"popId" : null,
"receivedDate" : 1406132638000,
"sentDate" : 1406132645000,
"size" : 639,
"subject" : "csv",
"textContent" : "\r\n",
"to" : [ {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : null
} ],
"uid" : 16
}}]}}

Thanks,
Gabriel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a27f5e89-eb89-46b6-8083-2ac2d6e550c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #6

Could you try:

curl -XGET 'http://localhost:9200/imapriverdata/_search' -d '{
"query" : {
"match" : { "attachments" : "alpha" }
}
}
'

Also what gives:

GET /imapriverdata/_mapping?pretty

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 23 juillet 2014 à 19:12:06, Gabriel Kapitany (gkapitany@gmail.com) a écrit:

query:

curl -XGET 'http://localhost:9200/imapriverdata/_search' -d '{

"query" : {
"match" : { "_all" : "alpha" }
}
}
'

response:

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

Thanks,
Gabriel

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:
Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver plugin elasticsearch-river-imap-0.0.7-b20 with elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man] version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man] initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins ] [Shiver Man] loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] initialized
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] starting ...
[2014-07-23 11:56:48,052][INFO ][transport ] [Shiver Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service ] [Shiver Man] new_master [Shiver Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]], reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery ] [Shiver Man] elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway ] [Shiver Man] recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man] started
[2014-07-23 11:56:54,760][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river name: river-imap
[2014-07-23 11:56:54,761][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] Using default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job execution threads will use class loader of thread: elasticsearch[Shiver Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl] Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' with instanceId 'NON_CLUSTERED'

Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

"type":"imap",
"mail.store.protocol":"imap",
"mail.imap.host":"cto-sdx01",
"mail.imap.port":993,
"mail.imap.ssl.enable":true,
"mail.imap.connectionpoolsize":"3",
"mail.debug":"false",
"mail.imap.timeout":10000,
"user":"gkapitan",
"password":"xxxxxxx",
"schedule":null,
"interval":"60s",
"threads":5,
"folderpattern":null,
"bulk_size":100,
"max_bulk_requests":"30",
"bulk_flush_interval":"5s",
"mail_index_name":"imapriverdata",
"mail_type_name":"mail",
"with_striptags_from_textcontent":true,
"with_attachments":true,
"with_text_content":true,
"with_flag_sync":true,
"index_settings" : null,
"type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of: pdf, doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
"attachmentCount" : 1,
"attachments" : [ {
"content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
"contentType" : "text/csv; charset=us-ascii",
"size" : 33,
"filename" : "Book1.csv",
"name" : "Book1.csv"
} ],
"bcc" : null,
"cc" : null,
"contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
"flaghashcode" : 48,
"flags" : [ "Recent", "Seen" ],
"folderFullName" : "mail/gkapitan/Sent",
"folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
"from" : {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : "Gabriel Kapitany"
},
"headers" : [ {
"name" : "Content-Disposition",
"value" : "inline"
}, {
"name" : "Subject",
"value" : "csv"
}, {
"name" : "To",
"value" : "gkapitan@cto-sdx01.idm.symcto.com"
}, {
"name" : "Date",
"value" : "Wed, 23 Jul 2014 12:24:05 -0400"
}, {
"name" : "MIME-Version",
"value" : "1.0"
}, {
"name" : "Message-ID",
"value" : "20140723162358.GA3992@cto-sdx01.idm.symcto.com"
}, {
"name" : "User-Agent",
"value" : "Mutt/1.5.20 (2009-12-10)"
}, {
"name" : "Content-Type",
"value" : "multipart/mixed; boundary="wac7ysb48OaltWcw""
}, {
"name" : "From",
"value" : "Gabriel Kapitany gkapitan@cto-sdx01.idm.symcto.com"
} ],
"mailboxType" : "IMAP",
"popId" : null,
"receivedDate" : 1406132638000,
"sentDate" : 1406132645000,
"size" : 639,
"subject" : "csv",
"textContent" : "\r\n",
"to" : [ {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : null
} ],
"uid" : 16
}}]}}

Thanks,
Gabriel

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a27f5e89-eb89-46b6-8083-2ac2d6e550c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53cfed86.109cf92e.13e40%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(Gabriel Kapitany) #7

Changing the query from _all to* attachments* doesn't change the result
and the second query returns:

{
"imapriverdata" : {
"mappings" : {
"imapriverstate" : {
"properties" : {
"errormsg" : {
"type" : "string"
},
"exists" : {
"type" : "boolean"
},
"folderUrl" : {
"type" : "string"
},
"lastCount" : {
"type" : "long"
},
"lastIndexed" : {
"type" : "long"
},
"lastSchedule" : {
"type" : "long"
},
"lastTook" : {
"type" : "long"
},
"lastUid" : {
"type" : "long"
},
"messageid" : {
"type" : "string"
},
"uidValidity" : {
"type" : "long"
}
}
},
"mail" : {
"properties" : {
"attachmentCount" : {
"type" : "long"
},
"attachments" : {
"properties" : {
"content" : {
"type" : "string"
},
"contentType" : {
"type" : "string"
},
"filename" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"size" : {
"type" : "long"
}
}
},
"contentType" : {
"type" : "string"
},
"flaghashcode" : {
"type" : "integer"
},
"flags" : {
"type" : "string"
},
"folderFullName" : {
"type" : "string",
"index" : "not_analyzed"
},
"folderUri" : {
"type" : "string"
},
"from" : {
"properties" : {
"email" : {
"type" : "string"
},
"personal" : {
"type" : "string"
}
}
},
"headers" : {
"properties" : {
"name" : {
"type" : "string"
},
"value" : {
"type" : "string"
}
}
},
"mailboxType" : {
"type" : "string"
},
"receivedDate" : {
"type" : "date",
"format" : "basic_date_time"
},
"sentDate" : {
"type" : "date",
"format" : "basic_date_time"
},
"size" : {
"type" : "long"
},
"subject" : {
"type" : "string"
},
"textContent" : {
"type" : "string"
},
"to" : {
"properties" : {
"email" : {
"type" : "string"
},
"personal" : {
"type" : "string"
}
}
},
"uid" : {
"type" : "long"
}
}
}
}
}
}

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:

Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver
plugin elasticsearch-river-imap-0.0.7-b20 with
elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man]
version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man]
initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins ] [Shiver Man]
loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man]
initialized
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man]
starting ...
[2014-07-23 11:56:48,052][INFO ][transport ] [Shiver Man]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service ] [Shiver Man]
new_master [Shiver Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery ] [Shiver Man]
elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway ] [Shiver Man]
recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man]
started
[2014-07-23 11:56:54,760][INFO
][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river
name: river-imap
[2014-07-23 11:56:54,761][INFO
][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory]
Using default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job
execution threads will use class loader of thread: elasticsearch[Shiver
Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl]
Initialized Scheduler Signaller of type: class
org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz
Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore
initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler]
Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler'
with instanceId 'NON_CLUSTERED'

Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

"type":"imap",
"mail.store.protocol":"imap",
"mail.imap.host":"cto-sdx01",
"mail.imap.port":993,
"mail.imap.ssl.enable":true,
"mail.imap.connectionpoolsize":"3",
"mail.debug":"false",
"mail.imap.timeout":10000,
"user":"gkapitan",
"password":"xxxxxxx",
"schedule":null,
"interval":"60s",
"threads":5,
"folderpattern":null,
"bulk_size":100,
"max_bulk_requests":"30",
"bulk_flush_interval":"5s",
"mail_index_name":"imapriverdata",
"mail_type_name":"mail",
"with_striptags_from_textcontent":true,
"with_attachments":true,
"with_text_content":true,
"with_flag_sync":true,
"index_settings" : null,
"type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of:
pdf, doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
"attachmentCount" : 1,
"attachments" : [ {
"content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
"contentType" : "text/csv; charset=us-ascii",
"size" : 33,
"filename" : "Book1.csv",
"name" : "Book1.csv"
} ],
"bcc" : null,
"cc" : null,
"contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
"flaghashcode" : 48,
"flags" : [ "Recent", "Seen" ],
"folderFullName" : "mail/gkapitan/Sent",
"folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
"from" : {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : "Gabriel Kapitany"
},
"headers" : [ {
"name" : "Content-Disposition",
"value" : "inline"
}, {
"name" : "Subject",
"value" : "csv"
}, {
"name" : "To",
"value" : "gkapitan@cto-sdx01.idm.symcto.com"
}, {
"name" : "Date",
"value" : "Wed, 23 Jul 2014 12:24:05 -0400"
}, {
"name" : "MIME-Version",
"value" : "1.0"
}, {
"name" : "Message-ID",
"value" : "20140723162358.GA3992@cto-sdx01.idm.symcto.com"
}, {
"name" : "User-Agent",
"value" : "Mutt/1.5.20 (2009-12-10)"
}, {
"name" : "Content-Type",
"value" : "multipart/mixed; boundary="wac7ysb48OaltWcw""
}, {
"name" : "From",
"value" : "Gabriel Kapitany gkapitan@cto-sdx01.idm.symcto.com"
} ],
"mailboxType" : "IMAP",
"popId" : null,
"receivedDate" : 1406132638000,
"sentDate" : 1406132645000,
"size" : 639,
"subject" : "csv",
"textContent" : "\r\n",
"to" : [ {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : null
} ],
"uid" : 16
}}]}}

Thanks,
Gabriel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e369fb96-8958-41ef-a180-35d111e97dc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #8

That's what I thought.
attachments does not have attachment type.

      "attachments" : {
        "properties" : {
          "content" : {
            "type" : "string"
          },
          "contentType" : {
            "type" : "string"
          },
          "filename" : {
            "type" : "string"
          },
          "name" : {
            "type" : "string"
          },
          "size" : {
            "type" : "long"
          }
        }
      },

You need to fix that. I think you need to provide your own mapping.
Sadly it's not documented on the river project. May be you should open an issue and a PR to document how to deal with attachments?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 23 juillet 2014 à 19:27:43, Gabriel Kapitany (gkapitany@gmail.com) a écrit:

Changing the query from _all to attachments doesn't change the result and the second query returns:

{
"imapriverdata" : {
"mappings" : {
"imapriverstate" : {
"properties" : {
"errormsg" : {
"type" : "string"
},
"exists" : {
"type" : "boolean"
},
"folderUrl" : {
"type" : "string"
},
"lastCount" : {
"type" : "long"
},
"lastIndexed" : {
"type" : "long"
},
"lastSchedule" : {
"type" : "long"
},
"lastTook" : {
"type" : "long"
},
"lastUid" : {
"type" : "long"
},
"messageid" : {
"type" : "string"
},
"uidValidity" : {
"type" : "long"
}
}
},
"mail" : {
"properties" : {
"attachmentCount" : {
"type" : "long"
},
"attachments" : {
"properties" : {
"content" : {
"type" : "string"
},
"contentType" : {
"type" : "string"
},
"filename" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"size" : {
"type" : "long"
}
}
},
"contentType" : {
"type" : "string"
},
"flaghashcode" : {
"type" : "integer"
},
"flags" : {
"type" : "string"
},
"folderFullName" : {
"type" : "string",
"index" : "not_analyzed"
},
"folderUri" : {
"type" : "string"
},
"from" : {
"properties" : {
"email" : {
"type" : "string"
},
"personal" : {
"type" : "string"
}
}
},
"headers" : {
"properties" : {
"name" : {
"type" : "string"
},
"value" : {
"type" : "string"
}
}
},
"mailboxType" : {
"type" : "string"
},
"receivedDate" : {
"type" : "date",
"format" : "basic_date_time"
},
"sentDate" : {
"type" : "date",
"format" : "basic_date_time"
},
"size" : {
"type" : "long"
},
"subject" : {
"type" : "string"
},
"textContent" : {
"type" : "string"
},
"to" : {
"properties" : {
"email" : {
"type" : "string"
},
"personal" : {
"type" : "string"
}
}
},
"uid" : {
"type" : "long"
}
}
}
}
}
}

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:
Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver plugin elasticsearch-river-imap-0.0.7-b20 with elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man] version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man] initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins ] [Shiver Man] loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] initialized
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] starting ...
[2014-07-23 11:56:48,052][INFO ][transport ] [Shiver Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service ] [Shiver Man] new_master [Shiver Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]], reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery ] [Shiver Man] elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway ] [Shiver Man] recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man] started
[2014-07-23 11:56:54,760][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river name: river-imap
[2014-07-23 11:56:54,761][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] Using default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job execution threads will use class loader of thread: elasticsearch[Shiver Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl] Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' with instanceId 'NON_CLUSTERED'

Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

"type":"imap",
"mail.store.protocol":"imap",
"mail.imap.host":"cto-sdx01",
"mail.imap.port":993,
"mail.imap.ssl.enable":true,
"mail.imap.connectionpoolsize":"3",
"mail.debug":"false",
"mail.imap.timeout":10000,
"user":"gkapitan",
"password":"xxxxxxx",
"schedule":null,
"interval":"60s",
"threads":5,
"folderpattern":null,
"bulk_size":100,
"max_bulk_requests":"30",
"bulk_flush_interval":"5s",
"mail_index_name":"imapriverdata",
"mail_type_name":"mail",
"with_striptags_from_textcontent":true,
"with_attachments":true,
"with_text_content":true,
"with_flag_sync":true,
"index_settings" : null,
"type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of: pdf, doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
"attachmentCount" : 1,
"attachments" : [ {
"content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
"contentType" : "text/csv; charset=us-ascii",
"size" : 33,
"filename" : "Book1.csv",
"name" : "Book1.csv"
} ],
"bcc" : null,
"cc" : null,
"contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
"flaghashcode" : 48,
"flags" : [ "Recent", "Seen" ],
"folderFullName" : "mail/gkapitan/Sent",
"folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
"from" : {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : "Gabriel Kapitany"
},
"headers" : [ {
"name" : "Content-Disposition",
"value" : "inline"
}, {
"name" : "Subject",
"value" : "csv"
}, {
"name" : "To",
"value" : "gkapitan@cto-sdx01.idm.symcto.com"
}, {
"name" : "Date",
"value" : "Wed, 23 Jul 2014 12:24:05 -0400"
}, {
"name" : "MIME-Version",
"value" : "1.0"
}, {
"name" : "Message-ID",
"value" : "20140723162358.GA3992@cto-sdx01.idm.symcto.com"
}, {
"name" : "User-Agent",
"value" : "Mutt/1.5.20 (2009-12-10)"
}, {
"name" : "Content-Type",
"value" : "multipart/mixed; boundary="wac7ysb48OaltWcw""
}, {
"name" : "From",
"value" : "Gabriel Kapitany gkapitan@cto-sdx01.idm.symcto.com"
} ],
"mailboxType" : "IMAP",
"popId" : null,
"receivedDate" : 1406132638000,
"sentDate" : 1406132645000,
"size" : 639,
"subject" : "csv",
"textContent" : "\r\n",
"to" : [ {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : null
} ],
"uid" : 16
}}]}}

Thanks,
Gabriel

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e369fb96-8958-41ef-a180-35d111e97dc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53cff4dc.1befd79f.13e40%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(Gabriel Kapitany) #9

Hey Dave,

Thanks a lot for all your help,
Gabriel

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:

Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver
plugin elasticsearch-river-imap-0.0.7-b20 with
elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man]
version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man]
initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins ] [Shiver Man]
loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man]
initialized
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man]
starting ...
[2014-07-23 11:56:48,052][INFO ][transport ] [Shiver Man]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service ] [Shiver Man]
new_master [Shiver Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery ] [Shiver Man]
elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway ] [Shiver Man]
recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man]
started
[2014-07-23 11:56:54,760][INFO
][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river
name: river-imap
[2014-07-23 11:56:54,761][INFO
][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory]
Using default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job
execution threads will use class loader of thread: elasticsearch[Shiver
Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl]
Initialized Scheduler Signaller of type: class
org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz
Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore
initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler]
Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler'
with instanceId 'NON_CLUSTERED'

Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

"type":"imap",
"mail.store.protocol":"imap",
"mail.imap.host":"cto-sdx01",
"mail.imap.port":993,
"mail.imap.ssl.enable":true,
"mail.imap.connectionpoolsize":"3",
"mail.debug":"false",
"mail.imap.timeout":10000,
"user":"gkapitan",
"password":"xxxxxxx",
"schedule":null,
"interval":"60s",
"threads":5,
"folderpattern":null,
"bulk_size":100,
"max_bulk_requests":"30",
"bulk_flush_interval":"5s",
"mail_index_name":"imapriverdata",
"mail_type_name":"mail",
"with_striptags_from_textcontent":true,
"with_attachments":true,
"with_text_content":true,
"with_flag_sync":true,
"index_settings" : null,
"type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of:
pdf, doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
"attachmentCount" : 1,
"attachments" : [ {
"content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
"contentType" : "text/csv; charset=us-ascii",
"size" : 33,
"filename" : "Book1.csv",
"name" : "Book1.csv"
} ],
"bcc" : null,
"cc" : null,
"contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
"flaghashcode" : 48,
"flags" : [ "Recent", "Seen" ],
"folderFullName" : "mail/gkapitan/Sent",
"folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
"from" : {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : "Gabriel Kapitany"
},
"headers" : [ {
"name" : "Content-Disposition",
"value" : "inline"
}, {
"name" : "Subject",
"value" : "csv"
}, {
"name" : "To",
"value" : "gkapitan@cto-sdx01.idm.symcto.com"
}, {
"name" : "Date",
"value" : "Wed, 23 Jul 2014 12:24:05 -0400"
}, {
"name" : "MIME-Version",
"value" : "1.0"
}, {
"name" : "Message-ID",
"value" : "20140723162358.GA3992@cto-sdx01.idm.symcto.com"
}, {
"name" : "User-Agent",
"value" : "Mutt/1.5.20 (2009-12-10)"
}, {
"name" : "Content-Type",
"value" : "multipart/mixed; boundary="wac7ysb48OaltWcw""
}, {
"name" : "From",
"value" : "Gabriel Kapitany gkapitan@cto-sdx01.idm.symcto.com"
} ],
"mailboxType" : "IMAP",
"popId" : null,
"receivedDate" : 1406132638000,
"sentDate" : 1406132645000,
"size" : 639,
"subject" : "csv",
"textContent" : "\r\n",
"to" : [ {
"email" : "gkapitan@cto-sdx01.idm.symcto.com",
"personal" : null
} ],
"uid" : 16
}}]}}

Thanks,
Gabriel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b9568f43-8b41-464b-a3a0-8c1603c3ecb5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #10