Searching pdf files by content with Mongodb-river

sAs59 · March 17, 2014, 8:01am

Hi,
I am first at elasticsearch and I want to search pdf files by content, but the resulting can't read properly the content of pdf file. It looks as following:

http://localhost:9200/mongoindex/_search?pretty=true

{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532595b8f37d5cc2d64a517d",
"_score" : 1.0,
"_source" : {"content":{"content_type":"application/pdf", "title":"D:/sample.pdf",
"content":"JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhlbi1VUykgPj4NCmVuZG9iag0KMiAwIG9iag0

"filename":"D:/sample.pdf","contentType":"application/pdf","md5":"afe70f97bce7876e39aa43f71dc7266f","length":82441,"chunkSize":262144,"uploadDate":"2014-03-16T12:14:48.542Z","metadata":{}}

} ]
}
}

Could someone please help me on it? Thank you!

Here is the link I used: http://v.bartko.info/?p=463

dadoonet · March 18, 2014, 5:22pm

Unsure but I think I already answered to that question. Was that on stack overflow?
Could you describe what is wrong here with the result?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 18 mars 2014 à 18:05:47, sAs59 (mr.akmurat@gmail.com) a écrit:

Hi,
I am first at elasticsearch and I want to search pdf files by content, but
the resulting can't read properly the content of pdf file. It looks as
following:

http://localhost:9200/mongoindex/_search?pretty=true

{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532595b8f37d5cc2d64a517d",
"_score" : 1.0,
"_source" : {"content":{"content_type":"application/pdf",
"title":"D:/sample.pdf",
"content":"JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhlbi1VUykgPj4NCmVuZG9iag0KMiAwIG9iag0

"filename":"D:/sample.pdf","contentType":"application/pdf","md5":"afe70f97bce7876e39aa43f71dc7266f","length":82441,"chunkSize":262144,"uploadDate":"2014-03-16T12:14:48.542Z","metadata":{}}
} ]
}
}

Could someone please help me on it? Thank you!

Here is the link I used: http://v.bartko.info/?p=463

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1395043287994-4051989.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.532880f3.2ca88611.97ca%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

dadoonet · March 19, 2014, 4:29pm

As I told you on SOF, you need to Base64 decode your content.

For example what you sent is well decoded as a PDF…
(tested with http://www.motobit.com/util/base64-decoder-encoder.asp)

%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(en-US) >>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids[ 3 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 7 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.32 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S>>
endobj
4 0 obj

…

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 19 mars 2014 à 17:21:54, sAs59 (mr.akmurat@gmail.com) a écrit:

Yes, I have also posted this question on stackoverflow.
I tried to decode my content with base64 decoder, but resulting I didn't get
my actual content
The content of sample.pdf:

The world is changing, and changing priorities for the development of
society. Use
of information technology accelerates this process. Information
treated as a
commodity, and its role as a commodity increases, while the value of
information
depends on the timing and the cost of its treatment. The growth of
complexity and
amount of information makes the question of finding new approaches to access
it
as the use of traditional technology results in longer and the
cost of developing
software tools to access information. Existing systems provide search
information
sources of the same type, while ignoring others. For example,
systems such as
Yandex, Rambler, Google, Yahoo provide information search in the databases
of
keywords corresponding to certain HTML - pages, the rest of the same
information
(audio, video and other data other than HTML - pages) located on servers
in the
Internet remains unaddressed. This determines the relevance of the
work to
develop more efficient methods for constructing systems of access to
distributed
heterogeneous information.

It's the only file in my files collection and I used the following query
http://localhost:9200/mongoindex/_search?pretty=true

In resulting, got the content as follows

JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhlbi1VUykgPj4NCmVuZG9iag0KMiAwIG9iag0KPDwvVHlwZS9QYWdlcy9Db3VudCAxL0tpZHNbIDMgMCBSXSA+Pg0KZW5kb2JqDQozIDAgb2JqDQo8PC9UeXBlL1BhZ2UvUGFyZW50IDIgMCBSL1Jlc291cmNlczw8L0ZvbnQ8PC9GMSA1IDAgUi9GMiA3IDAgUj4+L1Byb2NTZXRbL1BERi9UZXh0L0ltYWdlQi9JbWFnZUMvSW1hZ2VJXSA+Pi9NZWRpYUJveFsgMCAwIDU5NS4zMiA4NDEuOTJdIC9Db250ZW50cyA0IDAgUi9Hcm91cDw8L1R5cGUvR3JvdXAvUy9UcmFuc3BhcmVuY3kvQ1MvRGV2aWNlUkdCPj4vVGFicy9TPj4NCmVuZG9iag0KNCAwIG9iag0KPDwvRmlsdGVyL0ZsYXRlRGVjb2RlL0xlbmd0aCAxNTQ5Pj4NCnN0cmVhbQ0KeJytWEtv20YQvhvwf9CRBOKNllwuyWuAOE3RXgr3ECQ90BIlC5Ytl5Ts9N93ZnZnd5aRIhUoAjjc9zevb2b04e766v2tnmmj5mZ2t7q+0rM5/NOzplIapuq6UJWd3T1dX81na/zz6frqa3aXm+whvymzfpbfaJO9wXiH4yFvsi1+LPEPLFbZBr/GMFzAOZN1sO8ZDq1xfgMfYfAObyyyDif4EnhhAQfoRXcSPtawwZ2kAd//QjA2EhAh2NN2hAwzY7x4tUM8Awws7ZFiLWHcv+J6T3L5O1/g4ylvs/4Zl/bxLlpd0VUjb14glJ42/ZPrMlNOvj9hdmQ4s/yv2d2v11cfwRwf7n40QlWq1pARvsY3SltF+Vdwi5P3Kdd11u2dXuHPjrfQEUMy9mCE0s3ichBsHVD6zd1i0W9Rq4hy6OjsiELjeiUUtmFx/EFU0UCaRcP1Iy8rWi+yz2her/oEsTkC2Juxd+8v+X3r3Eg+212gzLJSxnplLggB2pJA7ILnepdhdbw7caeuClWmV556XldW2SLd25EDhVCpo6eOcc6pcctuCTNSahy6/y6VRbxEal4MfZfX0RlHjsKqorgO9pUQkkiB8SuMO9pwQBu5yeCq9N6FnmrOGlBb1TZeg0tGTWHfO9IQrqjdW89xnCCnMb9PeLxeTGAUM4+UsyR0OJXcAuMFSeUf3k9eX7lvVMFekqFHM6Bjk+ngfcEpFCu14FpybnxtzSFB17/lxuNhdOHVgIzFIzU5Q373/Jl4BkrrnfKsIWzbqJIN0eU3jXC8A6uRdQFkFVDV5r8SV+210z0SGTMF1SY1BIz/hvEhceW9l3PHkNxGiWbFGWj5QzpyGwhH/xYOd+xxxHMsQheSlMwxHuQuDpFVKeKM3LTxZjyn9bpVLSudFaFNMUlepVfDKGeCzDhwvIqSBKknHHwcR6FL1bQpji3nQL63d6ow55IMAh+cPQ9+k6Q/XN4IP8DxVtqSLusH//SkaDiiE0eSo3DMVC9wJlDKK38kiV8QxHlbWa1M4ZU0Ok14n99z5HYDYrPWO4l/aSuUAKEjHQiG4EDgP05tIhFETr8gqpBdbJl9hPF3NvsofSCtq3yCILtFkuvpegHUVV+k5FdW1tJnBGu9N3YDwn+Ip/6/5GBNEUulkQ9SJAw+6LzZbSPMXk8yGk7Qzs6RctjDKnDphmrVOkmTgV79GxunRQfeiTblF11bQkm3Y8z0A1mWKkUghltI6e6kh0HU3U35nMGctxM+OAaWDqTlV7rxAjUXRpWs5i+5kZUMRgx5FAJq6+wPh/Q+Ih1opck+IcgdG2kdSgy/TPc+yKjA+ZZry8TDjqMti0LpKkVLhpnPhc/hVUf8LWYLeDNUR6nvtloEnRsnfgRj0glVrt09dz5preX2BWY4p/p5pZrSC/NIF4GxC9GCLRNaiAXAMHDWeZERtUxJjY9JyoFL+iH3/t9xR/UcVn+BvVim/P7bT1ynLKwydYr/5tTeUpPh5N6T95ZG1ZO9JGC3Du0KelSZmsZC3oklAkqyQypoQinV+z4uYQErSfZI+3I5U1VNrcrK4/2WdYe8jIl4J+In5dGdz/jgk9xDc8JrW08jLOXgXR0POoCd8LV0X1tE/nHB7LZeZtqqRd0nIp00rTWq0unek/faVpnJvWxaVwea7FvupQxZetFxfg29Iskcy4gQ0ENI84OMxjYpOmAYdXU+QCvboD86tJ99OY+k/gwW7kNvUM7BAZ3vhOgbxeLBu1G3ZBMPsY3uuWqjHqFofJNg0h9bgP4iJTsMssfBhdDMl5OeBiLQBYij5BMiG90qPRH5lUkC0S/4rtgIFcHTxFNvIRWgIR6FHgQP/Vzt1VwV9bQtdMbd+iTywmmuqESvQgWYLmzWr6ARoF9s/OnnSA7iUB8CZcdhOYo9sYbBCU2NuYnmpY7vwL+MiGpLFATFRVm8EC0VvoM6h85izKMH4PSeScmfWUY3ET+u0Mx97srwffSx8zwGVF1zI+Bs6lxtl/vyMWiy901ybExmudYnytbm2E9CaoLl/W0x03r622VRtmrepsikFP8C8sKIqQ0KZW5kc3RyZWFtDQplbmRvYmoNCjUgMCBvYmoNCjw8L1R5cGUvRm9udC9TdWJ0eXBlL1RydWVUeXBlL05hbWUvRjEvQmFzZUZvbnQvVGltZXMjMjBOZXcjMjBSb21hbi9FbmNvZGluZy9XaW5BbnNpRW5jb2RpbmcvRm9udERlc2NyaXB0b3IgNiAwIFIvRmlyc3RDaGFyIDMyL0xhc3RDaGFyIDEyMS9XaWR0aHMgMTAgMCBSPj4NCmVuZG9iag0KNiAwIG9iag0KPDwvVHlwZS9Gb250RGVzY3JpcHRvci9Gb250TmFtZS9UaW1lcyMyME5ldyMyMFJvbWFuL0ZsYWdzIDMyL0l0YWxpY0FuZ2xlIDAvQXNjZW50IDg5MS9EZXNjZW50IC0yMTYvQ2FwSGVpZ2h0IDY5My9BdmdXaWR0aCA0MDEvTWF4V2lkdGggMjYxNC9Gb250V2VpZ2h0IDQwMC9YSGVpZ2h0IDI1MC9MZWFkaW5nIDQyL1N0ZW1WIDQwL0ZvbnRCQm94WyAtNTY4IC0yMTYgMjA0NiA2OTNdID4+DQplbmRvYmoNCjcgMCBvYmoNCjw8L1R5cGUvRm9udC9TdWJ0eXBlL1RydWVUeXBlL05hbWUvRjIvQmFzZUZvbnQvQUJDREVFK0NhbGlicmkvRW5jb2RpbmcvV2luQW5zaUVuY29kaW5nL0ZvbnREZXNjcmlwdG9yIDggMCBSL0ZpcnN0Q2hhciAzMi9MYXN0Q2hhciAzMi9XaWR0aHMgMTEgMCBSPj4NCmVuZG9iag0KOCAwIG9iag0KPDwvVHlwZS9Gb250RGVzY3JpcHRvci9Gb250TmFtZS9BQkNERUUrQ2FsaWJyaS9GbGFncyAzMi9JdGFsaWNBbmdsZSAwL0FzY2VudCA3NTAvRGVzY2VudCAtMjUwL0NhcEhlaWdodCA3NTAvQXZnV2lkdGggNTIxL01heFdpZHRoIDE3NDMvRm9udFdlaWdodCA0MDAvWEhlaWdodCAyNTAvU3RlbVYgNTIvRm9udEJCb3hbIC01MDMgLTI1MCAxMjQwIDc1MF0gL0ZvbnRGaWxlMiAxMiAwIFI+Pg0KZW5kb2JqDQo5IDAgb2JqDQo8PC9Qcm9kdWNlcihjb252ZXJ0b25saW5lZnJlZS5jb20pL0NyZWF0b3IoY29udmVydG9ubGluZWZyZWUuY29tKS9DcmVhdGlvbkRhdGUoRDoyMDE0MDMxNjEyMTQxMykgL01vZERhdGUoRDoyMDE0MDMxNjEyMTQxMykgPj4NCmVuZG9iag0KMTAgMCBvYmoNClsgMjUwIDAgMCAwIDAgMCAwIDAgMzMzIDMzMyAwIDAgMjUwIDMzMyAyNTAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCA2MTEgNTU2IDcyMiA3MjIgMzMzIDAgMCA2MTEgODg5IDAgMCAwIDAgNjY3IDAgNjExIDcyMiAwIDAgMCA3MjIgMCAwIDAgMCAwIDAgMCA0NDQgNTAwIDQ0NCA1MDAgNDQ0IDMzMyA1MDAgNTAwIDI3OCAwIDUwMCAyNzggNzc4IDUwMCA1MDAgNTAwIDUwMCAzMzMgMzg5IDI3OCA1MDAgNTAwIDcyMiA1MDAgNTAwXSANCmVuZG9iag0KMTEgMCBvYmoNClsgMjI2XSANCmVuZG9

This is only a small part I've copied
Maybe the problem is in mapping?

p.s. Sorry for my bad english)

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052267.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1395246111167-4052267.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.5329c602.7f01579b.97ca%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

sAs59 · March 20, 2014, 9:11am

It's still unclear, I've decoded my whole text and instead I'm getting this
kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœXKoÛF¾eðÐ‘â–.Ék€8MÑ^
÷$=Ð%
–-—”ìôßwfvgw–‘"(8Ü÷7¯ofôáîúêýži£æfv·º¾Ò³9üÓ³¦R¦êºP•Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ,VÙ¿Æ0\À9“u°ïq~aðo,²'øxaaèEw>Ö°Á¤ßÿB06!ØÓv„
3c¼xµC< ,í‘b-aÜ¿âzOrù;_àã)o³þ—öñ.Z]ÑU#o^
”ž6ý“ë2SN¾?avd8³ü¯ÙÝ¯×WÁî~4BUªÖ¾Æ7J[EùWp‹“÷)×uÖí^áÏŽ·ÐC2ö`„ÒÍârlPúÍÝbÑoQ«ˆrèèìˆBãz%¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú±9Ø›±wï/ù}ëÜH>Û]
Ì²RÆze.Ú’@ì‚çz—au¼;q§®
U¦Wžz^WVÙ"ÝÛ‘…P©£§ŽqÎ©qËn 3Rjºÿ.•E¼Dj^
}—×ÑGŽÂª¢¸ö•’Hñ+Œ;Úp@e¹ÉàªôÞ…žjÎP[Õ6^ƒKFMaß;Ò®¨Ý[Ïqœ
§1¿Ox¼^L`3”³$t8•ÜãIåÞO^_¹oTÁ^’¡G3
c“éà}Á)+µàZrn|mÍ!A×¿åÆãatáÕ€ŒÅ#59C~÷ü™xJëò¬!lÛ¨’
Ñå7p¼«‘udPÕæ¿WíµÓ=3Õ&5Œÿ†ñ!qå½—sÇÜF‰fÅhùC:reGÿwìqÄs,B’”Ì1ä.‘U)âŒÜ´ñf<§õºU-+¡M1I^¥WÃ(g‚Ì8p¼Š’©'|G¡KÕ´)Ž-ç@¾·wª0ç’
œ=~“¤?\Þ?ÀñVÚ’.ëaÿô¤h8¢G’£pÌT/p&PÊ+$‰_Äy[YLá•4:MxŸßsävb³Ö;‰i+”¡#†à@à?Nm"DN¿
ª]l™}„ñw6û(} «|‚ »E’ëézÔU_¤äWVÖÒgk½7vÂˆ§þ¿ä`MK¥‘R$

è¼Ùm#Ì^O2NÐÎÎ‘rØÃpé†jÕ:I“^ýee§EaÞ‰6å][BI·cÌôY–E
†[Héî¤‡AÔÝMùœÁœ·>8–¤åWºñ5F•¬æ/¹‘• F yjëì‡ô>"h¥É>!ÈeiJ
¿L÷>È¨Àù–kËÄÃŽ£-‹BéEK†™Ï…ÏáUGü-f x3TG©ï¶Ze'~cÒ U®Ý=w>iåöf8§úy¥šÒ
óH± Ñ‚-ZˆÀ0pÖy‘µLIIÊKú!÷þßqGõV½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$`·í
zT™šÆBÞ‰%J²ChB)Õû>.a+IöHûr9SUMÊÊãý–u‡¼Œ‰x'â'åÑÏøà“ÜCsÂk[O#,åà]:€ðµt_[DþqÁì¶^fÚªEÝ'"45ªÒéÞ“÷ÚV™É½lZWašì[î¥
YzÑq~
½"ÉËˆÐCHóƒŒÆ6):`uu>@+Û ?:´Ÿ}9¤þ
îCoPÎÁï„èeÅâÁ»Q·d±î¹j£¡h|“`Ò[€þ"%;
²ÇÁ…ÐÌ—“ž"Ðˆ£ä"eÝ*=ù•IÑ/ø®ØÁÓÄSo!
!…ý\íÕ\õ´-tÆÝú$òÂi®¨D¯B˜.lÖ¯ _lüéçHâPeÇa9Š=±†Á
M¹‰æ¥ŽïÀ¿ŒˆjKÅEY¼-¾ƒ:‡ÎbÌ£aàôžIÉŸYF7?®ÐÌ}îÊð}ô±ó<T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jÞ¦È¤ÿòÂˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

dadoonet · March 20, 2014, 9:44am

I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source (default):
curl -XGET 'http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 (mr.akmurat@gmail.com) a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœXKoÛF ¾ ð Ð‘ â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêýži£æfv·º¾Ò³9üÓ³¦R ¦êºP• Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï q~ að o,² 'ø xa èEw >Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^ ”ž6ý“ë2SN¾?avd8³ü¯ÙÝ¯×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] Ì²RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqÎ©qËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@ ¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3 c“éà}Á) +µàZrn|mÍ!A×¿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä. ‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ ?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰ Äy[YLá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿ ª ]l™}„ñw6û(} «|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v Â ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎÎ‘rØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[Héî¤‡AÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>È¨Àù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>iåö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUMÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€ ðµt[DþqÁì¶^fÚªEÝ'" 45ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6):uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò [€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B ˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7 ?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jÞ¦È¤ ÿ òÂˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

sAs59 · March 22, 2014, 1:15pm

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:
http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:

{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}

It returns files ID and its good.

Is there a way showing my files content in a readable form

Usually it returns:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" :
{"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

I want:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" :
{"content":{"content_type":null,"title":"D:/text.txt","content":"My
name is Akmurat Saktagan. I am 21 years
old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via Elasticsearch Users] <
ml-node+s115913n4052339h86@n3.nabble.com> wrote:

I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source
(default):
curl -XGET '
http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=0)
a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting
this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœXKoÛF ¾ ð Ð‘ â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêýži£æfv·º¾Ò³9üÓ³¦R ¦êºP•
Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï q~ að o,² 'ø xa èEw

Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^
”ž6ý“ë2SN¾?avd8³ü¯ÙÝ¯×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] Ì²RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqÎ©qËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3
c“éà}Á) +µàZrn|mÍ!A×¿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä.
‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ
?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰ Äy[YLá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿
ª ]l™}„ñw6û(} «|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v Â ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎÎ‘rØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[Héî¤‡AÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>È¨Àù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>iåö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUMÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€
ðµt[DþqÁì¶^fÚªEÝ'" 45ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6): uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò
[€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B
˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7
?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jÞ¦È¤ ÿ òÂˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052333.html

Sent from the Elasticsearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at Nabble.com.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion
below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4051989&code=bXIuYWttdXJhdEBnbWFpbC5jb218NDA1MTk4OXwxOTEyNTA5Nzkz
.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml

dadoonet · March 22, 2014, 1:32pm

Could you paste your mapping?

http://localhost:9200/mongoindex/files/_mapping?pretty

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:15, sAs59 mr.akmurat@gmail.com a écrit :

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:
http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}
It returns files ID and its good.
Is there a way showing my files content in a readable form
Usually it returns:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}
}
I want:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"My name is Akmurat Saktagan. I am 21 years old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}
Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via Elasticsearch Users] <[hidden email]> wrote:
I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source (default):
curl -XGET 'http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]) a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœXKoÛF ¾ ð Ð‘ â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêýži£æfv·º¾Ò³9üÓ³¦R ¦êºP• Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï q~ að o,² 'ø xa èEw >Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^ ”ž6ý“ë2SN¾?avd8³ü¯ÙÝ¯×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] Ì²RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqÎ©qËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@ ¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3 c“éà}Á) +µàZrn|mÍ!A×¿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä. ‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ ?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰ Äy[YLá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿ ª ]l™}„ñw6û(} «|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v Â ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎÎ‘rØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[Héî¤‡AÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>È¨Àù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>iåö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUMÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€ ðµt[DþqÁì¶^fÚªEÝ'" 45ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6): uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò [€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B ˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7 ?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jÞ¦È¤ ÿ òÂˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

View this message in context: Re: searching pdf files by content with Mongodb-river

Sent from the Elasticsearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

sAs59 · March 22, 2014, 1:36pm

http://localhost:9200/mongoindex/files/_mapping?pretty=true

"mongoindex" : {
"mappings" : {
"files" : {
"properties" : {
"chunkSize" : {
"type" : "long"
},
"content" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"content" : {
"type" : "string"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"contentType" : {
"type" : "string"
},
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"index" : "no",
"store" : true
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"filename" : {
"type" : "string"
},
"length" : {
"type" : "long"
},
"md5" : {
"type" : "string"
},
"metadata" : {
"type" : "object"
},
"uploadDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}
}

On Sat, Mar 22, 2014 at 7:33 PM, dadoonet [via Elasticsearch Users] <
ml-node+s115913n4052548h65@n3.nabble.com> wrote:

Could you paste your mapping?

http://localhost:9200/mongoindex/files http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
/_mapping?pretty

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:15, sAs59 <[hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=0>
a écrit :

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:

http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:

{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}

It returns files ID and its good.

Is there a way showing my files content in a readable form

Usually it returns:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

I want:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"My name is Akmurat Saktagan. I am 21 years old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via Elasticsearch Users] <[hidden
email] http://user/SendEmail.jtp?type=node&node=4052547&i=0> wrote:

I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source
(default):
curl -XGET '
http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
http://Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=0)
a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting
this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœXKoÛF ¾ ð Ð‘ â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêýži£æfv·º¾Ò³9üÓ³¦R ¦êºP•
Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï q~ að o,² 'ø xa èEw

Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^
”ž6ý“ë2SN¾?avd8³ü¯ÙÝ¯×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] Ì²RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqÎ©qËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3
c“éà}Á) +µàZrn|mÍ!A×¿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä.
‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ
?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰ Äy[YLá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿
ª ]l™}„ñw6û(} «|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v Â ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎÎ‘rØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[Héî¤‡AÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>È¨Àù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>iåö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUMÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€
ðµt[DþqÁì¶^fÚªEÝ'" 45ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6): uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò
[€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B
˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7
?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jÞ¦È¤ ÿ òÂˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052333.html

Sent from the Elasticsearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the
discussion below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
here.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml

View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052547.html
Sent from the Elasticsearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion
below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052548.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4051989&code=bXIuYWttdXJhdEBnbWFpbC5jb218NDA1MTk4OXwxOTEyNTA5Nzkz
.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml

dadoonet · March 22, 2014, 3:30pm

Sounds like it's not correct.

You have 2 attachments and the one you actualy use does not store file.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:36, sAs59 mr.akmurat@gmail.com a écrit :

http://localhost:9200/mongoindex/files/_mapping?pretty=true
"mongoindex" : {
"mappings" : {
"files" : {
"properties" : {
"chunkSize" : {
"type" : "long"
},
"content" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"content" : {
"type" : "string"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"contentType" : {
"type" : "string"
},
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"index" : "no",
"store" : true
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"filename" : {
"type" : "string"
},
"length" : {
"type" : "long"
},
"md5" : {
"type" : "string"
},
"metadata" : {
"type" : "object"
},
"uploadDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}
}

On Sat, Mar 22, 2014 at 7:33 PM, dadoonet [via Elasticsearch Users] <[hidden email]> wrote:
Could you paste your mapping?

http://localhost:9200/mongoindex/files/_mapping?pretty

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:15, sAs59 <[hidden email]> a écrit :

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:
http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}
It returns files ID and its good.
Is there a way showing my files content in a readable form
Usually it returns:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}
}
I want:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"My name is Akmurat Saktagan. I am 21 years old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}
Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via Elasticsearch Users] <[hidden email]> wrote:
I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source (default):
curl -XGET 'http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]) a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœXKoÛF ¾ ð Ð‘ â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêýži£æfv·º¾Ò³9üÓ³¦R ¦êºP• Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï q~ að o,² 'ø xa èEw >Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^ ”ž6ý“ë2SN¾?avd8³ü¯ÙÝ¯×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] Ì²RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqÎ©qËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@ ¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3 c“éà}Á) +µàZrn|mÍ!A×¿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä. ‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ ?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰ Äy[YLá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿ ª ]l™}„ñw6û(} «|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v Â ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎÎ‘rØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[Héî¤‡AÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>È¨Àù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>iåö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUMÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€ ðµt[DþqÁì¶^fÚªEÝ'" 45ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6): uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò [€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B ˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7 ?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jÞ¦È¤ ÿ òÂˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

View this message in context: Re: searching pdf files by content with Mongodb-river

Sent from the Elasticsearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052548.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1Ah6rpoM0ZTGUKrpb_yyBozA0s-_tQTRn7VEdAXPZ3wsw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E6ABD9A4-1F09-4EA5-B8CA-100A5F31474A%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

sAs59 · March 22, 2014, 3:37pm

Is it about my mapping?

On Sat, Mar 22, 2014 at 9:31 PM, dadoonet [via Elasticsearch Users] <
ml-node+s115913n4052555h82@n3.nabble.com> wrote:

Sounds like it's not correct.

You have 2 attachments and the one you actualy use does not store file.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:36, sAs59 <[hidden email]http://user/SendEmail.jtp?type=node&node=4052555&i=0>
a écrit :

http://localhost:9200/mongoindex/files/_mapping?pretty=true

"mongoindex" : {
"mappings" : {
"files" : {
"properties" : {
"chunkSize" : {
"type" : "long"
},
"content" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"content" : {
"type" : "string"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"contentType" : {
"type" : "string"
},
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"index" : "no",
"store" : true
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"filename" : {
"type" : "string"
},
"length" : {
"type" : "long"
},
"md5" : {
"type" : "string"
},
"metadata" : {
"type" : "object"
},
"uploadDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}
}

On Sat, Mar 22, 2014 at 7:33 PM, dadoonet [via Elasticsearch Users] <[hidden
email] http://user/SendEmail.jtp?type=node&node=4052549&i=0> wrote:

Could you paste your mapping?

http://localhost:9200/mongoindex/files http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
/_mapping?pretty

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:15, sAs59 <[hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=0>
a écrit :

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:

http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:

{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}

It returns files ID and its good.

Is there a way showing my files content in a readable form

Usually it returns:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

I want:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"My name is Akmurat Saktagan. I am 21 years old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via Elasticsearch Users] <[hidden
email] http://user/SendEmail.jtp?type=node&node=4052547&i=0> wrote:

I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source
(default):
curl -XGET '
http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
http://Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=0)
a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting
this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœXKoÛF ¾ ð Ð‘ â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêýži£æfv·º¾Ò³9üÓ³¦R ¦êºP•
Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï q~ að o,² 'ø xa èEw

Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^
”ž6ý“ë2SN¾?avd8³ü¯ÙÝ¯×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] Ì²RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqÎ©qËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3
c“éà}Á) +µàZrn|mÍ!A×¿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä.
‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ
?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰ Äy[YLá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿
ª ]l™}„ñw6û(} «|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v Â ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎÎ‘rØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[Héî¤‡AÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>È¨Àù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>iåö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUMÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€
ðµt[DþqÁì¶^fÚªEÝ'" 45ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6): uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò
[€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B
˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7
?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jÞ¦È¤ ÿ òÂˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052333.html

Sent from the Elasticsearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the
discussion below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
here.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml

View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052547.html
Sent from the Elasticsearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the
discussion below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052548.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
here.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml

View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052549.html
Sent from the Elasticsearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052555&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1Ah6rpoM0ZTGUKrpb_yyBozA0s-_tQTRn7VEdAXPZ3wsw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1Ah6rpoM0ZTGUKrpb_yyBozA0s-_tQTRn7VEdAXPZ3wsw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052555&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/E6ABD9A4-1F09-4EA5-B8CA-100A5F31474A%40pilato.fr https://groups.google.com/d/msgid/elasticsearch/E6ABD9A4-1F09-4EA5-B8CA-100A5F31474A%40pilato.fr?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion
below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052555.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4051989&code=bXIuYWttdXJhdEBnbWFpbC5jb218NDA1MTk4OXwxOTEyNTA5Nzkz
.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml

dadoonet · March 22, 2014, 6:02pm

Yes.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 16:38, sAs59 mr.akmurat@gmail.com a écrit :

Is it about my mapping?

On Sat, Mar 22, 2014 at 9:31 PM, dadoonet [via Elasticsearch Users] <[hidden email]> wrote:
Sounds like it's not correct.

You have 2 attachments and the one you actualy use does not store file.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:36, sAs59 <[hidden email]> a écrit :

http://localhost:9200/mongoindex/files/_mapping?pretty=true
"mongoindex" : {
"mappings" : {
"files" : {
"properties" : {
"chunkSize" : {
"type" : "long"
},
"content" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"content" : {
"type" : "string"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"contentType" : {
"type" : "string"
},
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"index" : "no",
"store" : true
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"filename" : {
"type" : "string"
},
"length" : {
"type" : "long"
},
"md5" : {
"type" : "string"
},
"metadata" : {
"type" : "object"
},
"uploadDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}
}

On Sat, Mar 22, 2014 at 7:33 PM, dadoonet [via Elasticsearch Users] <[hidden email]> wrote:
Could you paste your mapping?

http://localhost:9200/mongoindex/files/_mapping?pretty

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:15, sAs59 <[hidden email]> a écrit :

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:
http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}
It returns files ID and its good.
Is there a way showing my files content in a readable form
Usually it returns:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}
}
I want:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"My name is Akmurat Saktagan. I am 21 years old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}
Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via Elasticsearch Users] <[hidden email]> wrote:
I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source (default):
curl -XGET 'http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]) a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœXKoÛF ¾ ð Ð‘ â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêýži£æfv·º¾Ò³9üÓ³¦R ¦êºP• Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï q~ að o,² 'ø xa èEw >Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^ ”ž6ý“ë2SN¾?avd8³ü¯ÙÝ¯×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] Ì²RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqÎ©qËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@ ¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3 c“éà}Á) +µàZrn|mÍ!A×¿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä. ‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ ?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰ Äy[YLá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿ ª ]l™}„ñw6û(} «|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v Â ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎÎ‘rØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[Héî¤‡AÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>È¨Àù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>iåö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUMÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€ ðµt[DþqÁì¶^fÚªEÝ'" 45ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6): uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò [€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B ˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7 ?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jÞ¦È¤ ÿ òÂˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

View this message in context: Re: searching pdf files by content with Mongodb-river

Sent from the Elasticsearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052548.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1Ah6rpoM0ZTGUKrpb_yyBozA0s-_tQTRn7VEdAXPZ3wsw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E6ABD9A4-1F09-4EA5-B8CA-100A5F31474A%40pilato.fr.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052555.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1Cyts5QYke5bXTUUih7AQ%3DVB6Xb1VxscSSu_qvyANjjHA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6EC7F599-6C4A-4812-80AB-2FC7C2870535%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Elastic search With MongoDB : Searching PDFs Elasticsearch	16	2732	July 6, 2017
Indexing PDFs directly Elasticsearch	4	654	October 14, 2019
How to create index for a attachment of pdf by using elasticsearch-river-mongodb: 1.6.9 (don't have any hits,or missing fields) Elasticsearch	2	308	July 6, 2017
Not able to search through attachment contents Elasticsearch	32	7917	July 5, 2017
.pdf and pdf not giving the same results while searching in elasticsearch Elasticsearch	2	934	March 4, 2020

Searching pdf files by content with Mongodb-river

View this message in context: Re: searching pdf files by content with Mongodb-river Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Sent from the Elasticsearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at Nabble.com.

Sent from the Elasticsearch Users mailing list archive at Nabble.com.

View this message in context: Re: searching pdf files by content with Mongodb-river Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Sent from the Elasticsearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at Nabble.com.

Sent from the Elasticsearch Users mailing list archive at Nabble.com.

View this message in context: Re: searching pdf files by content with Mongodb-river Sent from the Elasticsearch Users mailing list archive at Nabble.com.

For more options, visit https://groups.google.com/d/optout.

View this message in context: Re: searching pdf files by content with Mongodb-river Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Sent from the Elasticsearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at Nabble.com.

Sent from the Elasticsearch Users mailing list archive at Nabble.com.

View this message in context: Re: searching pdf files by content with Mongodb-river Sent from the Elasticsearch Users mailing list archive at Nabble.com.

For more options, visit https://groups.google.com/d/optout.

View this message in context: Re: searching pdf files by content with Mongodb-river Sent from the Elasticsearch Users mailing list archive at Nabble.com.

View this message in context: Re: searching pdf files by content with Mongodb-river Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Related topics

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Sent from the Elasticsearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Sent from the Elasticsearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the Elasticsearch Users mailing list archive at Nabble.com.