Searching pdf files by content with Mongodb-river


(sAs59) #1

Hi,
I am first at elasticsearch and I want to search pdf files by content, but the resulting can't read properly the content of pdf file. It looks as following:

http://localhost:9200/mongoindex/_search?pretty=true

{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532595b8f37d5cc2d64a517d",
"_score" : 1.0,
"_source" : {"content":{"content_type":"application/pdf", "title":"D:/sample.pdf",
"content":"JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhlbi1VUykgPj4NCmVuZG9iag0KMiAwIG9iag0

"filename":"D:/sample.pdf","contentType":"application/pdf","md5":"afe70f97bce7876e39aa43f71dc7266f","length":82441,"chunkSize":262144,"uploadDate":"2014-03-16T12:14:48.542Z","metadata":{}} 

} ]
}
}

Could someone please help me on it? Thank you!

Here is the link I used: http://v.bartko.info/?p=463


(David Pilato) #2

Unsure but I think I already answered to that question. Was that on stack overflow?
Could you describe what is wrong here with the result?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 18 mars 2014 à 18:05:47, sAs59 (mr.akmurat@gmail.com) a écrit:

Hi,
I am first at elasticsearch and I want to search pdf files by content, but
the resulting can't read properly the content of pdf file. It looks as
following:

http://localhost:9200/mongoindex/_search?pretty=true

{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532595b8f37d5cc2d64a517d",
"_score" : 1.0,
"_source" : {"content":{"content_type":"application/pdf",
"title":"D:/sample.pdf",
"content":"JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhlbi1VUykgPj4NCmVuZG9iag0KMiAwIG9iag0

"filename":"D:/sample.pdf","contentType":"application/pdf","md5":"afe70f97bce7876e39aa43f71dc7266f","length":82441,"chunkSize":262144,"uploadDate":"2014-03-16T12:14:48.542Z","metadata":{}}
} ]
}
}

Could someone please help me on it? Thank you!

Here is the link I used: http://v.bartko.info/?p=463

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1395043287994-4051989.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.532880f3.2ca88611.97ca%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #3

As I told you on SOF, you need to Base64 decode your content.

For example what you sent is well decoded as a PDF…
(tested with http://www.motobit.com/util/base64-decoder-encoder.asp)

%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(en-US) >>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids[ 3 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 7 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.32 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S>>
endobj
4 0 obj

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 19 mars 2014 à 17:21:54, sAs59 (mr.akmurat@gmail.com) a écrit:

Yes, I have also posted this question on stackoverflow.
I tried to decode my content with base64 decoder, but resulting I didn't get
my actual content
The content of sample.pdf:

The world is changing, and changing priorities for the development of
society. Use
of information technology accelerates this process. Information
treated as a
commodity, and its role as a commodity increases, while the value of
information
depends on the timing and the cost of its treatment. The growth of
complexity and
amount of information makes the question of finding new approaches to access
it
as the use of traditional technology results in longer and the
cost of developing
software tools to access information. Existing systems provide search
information
sources of the same type, while ignoring others. For example,
systems such as
Yandex, Rambler, Google, Yahoo provide information search in the databases
of
keywords corresponding to certain HTML - pages, the rest of the same
information
(audio, video and other data other than HTML - pages) located on servers
in the
Internet remains unaddressed. This determines the relevance of the
work to
develop more efficient methods for constructing systems of access to
distributed
heterogeneous information.

It's the only file in my files collection and I used the following query
http://localhost:9200/mongoindex/_search?pretty=true

In resulting, got the content as follows

JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhlbi1VUykgPj4NCmVuZG9iag0KMiAwIG9iag0KPDwvVHlwZS9QYWdlcy9Db3VudCAxL0tpZHNbIDMgMCBSXSA+Pg0KZW5kb2JqDQozIDAgb2JqDQo8PC9UeXBlL1BhZ2UvUGFyZW50IDIgMCBSL1Jlc291cmNlczw8L0ZvbnQ8PC9GMSA1IDAgUi9GMiA3IDAgUj4+L1Byb2NTZXRbL1BERi9UZXh0L0ltYWdlQi9JbWFnZUMvSW1hZ2VJXSA+Pi9NZWRpYUJveFsgMCAwIDU5NS4zMiA4NDEuOTJdIC9Db250ZW50cyA0IDAgUi9Hcm91cDw8L1R5cGUvR3JvdXAvUy9UcmFuc3BhcmVuY3kvQ1MvRGV2aWNlUkdCPj4vVGFicy9TPj4NCmVuZG9iag0KNCAwIG9iag0KPDwvRmlsdGVyL0ZsYXRlRGVjb2RlL0xlbmd0aCAxNTQ5Pj4NCnN0cmVhbQ0KeJytWEtv20YQvhvwf9CRBOKNllwuyWuAOE3RXgr3ECQ90BIlC5Ytl5Ts9N93ZnZnd5aRIhUoAjjc9zevb2b04e766v2tnmmj5mZ2t7q+0rM5/NOzplIapuq6UJWd3T1dX81na/zz6frqa3aXm+whvymzfpbfaJO9wXiH4yFvsi1+LPEPLFbZBr/GMFzAOZN1sO8ZDq1xfgMfYfAObyyyDif4EnhhAQfoRXcSPtawwZ2kAd//QjA2EhAh2NN2hAwzY7x4tUM8Awws7ZFiLWHcv+J6T3L5O1/g4ylvs/4Zl/bxLlpd0VUjb14glJ42/ZPrMlNOvj9hdmQ4s/yv2d2v11cfwRwf7n40QlWq1pARvsY3SltF+Vdwi5P3Kdd11u2dXuHPjrfQEUMy9mCE0s3ichBsHVD6zd1i0W9Rq4hy6OjsiELjeiUUtmFx/EFU0UCaRcP1Iy8rWi+yz2her/oEsTkC2Juxd+8v+X3r3Eg+212gzLJSxnplLggB2pJA7ILnepdhdbw7caeuClWmV556XldW2SLd25EDhVCpo6eOcc6pcctuCTNSahy6/y6VRbxEal4MfZfX0RlHjsKqorgO9pUQkkiB8SuMO9pwQBu5yeCq9N6FnmrOGlBb1TZeg0tGTWHfO9IQrqjdW89xnCCnMb9PeLxeTGAUM4+UsyR0OJXcAuMFSeUf3k9eX7lvVMFekqFHM6Bjk+ngfcEpFCu14FpybnxtzSFB17/lxuNhdOHVgIzFIzU5Q373/Jl4BkrrnfKsIWzbqJIN0eU3jXC8A6uRdQFkFVDV5r8SV+210z0SGTMF1SY1BIz/hvEhceW9l3PHkNxGiWbFGWj5QzpyGwhH/xYOd+xxxHMsQheSlMwxHuQuDpFVKeKM3LTxZjyn9bpVLSudFaFNMUlepVfDKGeCzDhwvIqSBKknHHwcR6FL1bQpji3nQL63d6ow55IMAh+cPQ9+k6Q/XN4IP8DxVtqSLusH//SkaDiiE0eSo3DMVC9wJlDKK38kiV8QxHlbWa1M4ZU0Ok14n99z5HYDYrPWO4l/aSuUAKEjHQiG4EDgP05tIhFETr8gqpBdbJl9hPF3NvsofSCtq3yCILtFkuvpegHUVV+k5FdW1tJnBGu9N3YDwn+Ip/6/5GBNEUulkQ9SJAw+6LzZbSPMXk8yGk7Qzs6RctjDKnDphmrVOkmTgV79GxunRQfeiTblF11bQkm3Y8z0A1mWKkUghltI6e6kh0HU3U35nMGctxM+OAaWDqTlV7rxAjUXRpWs5i+5kZUMRgx5FAJq6+wPh/Q+Ih1opck+IcgdG2kdSgy/TPc+yKjA+ZZry8TDjqMti0LpKkVLhpnPhc/hVUf8LWYLeDNUR6nvtloEnRsnfgRj0glVrt09dz5preX2BWY4p/p5pZrSC/NIF4GxC9GCLRNaiAXAMHDWeZERtUxJjY9JyoFL+iH3/t9xR/UcVn+BvVim/P7bT1ynLKwydYr/5tTeUpPh5N6T95ZG1ZO9JGC3Du0KelSZmsZC3oklAkqyQypoQinV+z4uYQErSfZI+3I5U1VNrcrK4/2WdYe8jIl4J+In5dGdz/jgk9xDc8JrW08jLOXgXR0POoCd8LV0X1tE/nHB7LZeZtqqRd0nIp00rTWq0unek/faVpnJvWxaVwea7FvupQxZetFxfg29Iskcy4gQ0ENI84OMxjYpOmAYdXU+QCvboD86tJ99OY+k/gwW7kNvUM7BAZ3vhOgbxeLBu1G3ZBMPsY3uuWqjHqFofJNg0h9bgP4iJTsMssfBhdDMl5OeBiLQBYij5BMiG90qPRH5lUkC0S/4rtgIFcHTxFNvIRWgIR6FHgQP/Vzt1VwV9bQtdMbd+iTywmmuqESvQgWYLmzWr6ARoF9s/OnnSA7iUB8CZcdhOYo9sYbBCU2NuYnmpY7vwL+MiGpLFATFRVm8EC0VvoM6h85izKMH4PSeScmfWUY3ET+u0Mx97srwffSx8zwGVF1zI+Bs6lxtl/vyMWiy901ybExmudYnytbm2E9CaoLl/W0x03r622VRtmrepsikFP8C8sKIqQ0KZW5kc3RyZWFtDQplbmRvYmoNCjUgMCBvYmoNCjw8L1R5cGUvRm9udC9TdWJ0eXBlL1RydWVUeXBlL05hbWUvRjEvQmFzZUZvbnQvVGltZXMjMjBOZXcjMjBSb21hbi9FbmNvZGluZy9XaW5BbnNpRW5jb2RpbmcvRm9udERlc2NyaXB0b3IgNiAwIFIvRmlyc3RDaGFyIDMyL0xhc3RDaGFyIDEyMS9XaWR0aHMgMTAgMCBSPj4NCmVuZG9iag0KNiAwIG9iag0KPDwvVHlwZS9Gb250RGVzY3JpcHRvci9Gb250TmFtZS9UaW1lcyMyME5ldyMyMFJvbWFuL0ZsYWdzIDMyL0l0YWxpY0FuZ2xlIDAvQXNjZW50IDg5MS9EZXNjZW50IC0yMTYvQ2FwSGVpZ2h0IDY5My9BdmdXaWR0aCA0MDEvTWF4V2lkdGggMjYxNC9Gb250V2VpZ2h0IDQwMC9YSGVpZ2h0IDI1MC9MZWFkaW5nIDQyL1N0ZW1WIDQwL0ZvbnRCQm94WyAtNTY4IC0yMTYgMjA0NiA2OTNdID4+DQplbmRvYmoNCjcgMCBvYmoNCjw8L1R5cGUvRm9udC9TdWJ0eXBlL1RydWVUeXBlL05hbWUvRjIvQmFzZUZvbnQvQUJDREVFK0NhbGlicmkvRW5jb2RpbmcvV2luQW5zaUVuY29kaW5nL0ZvbnREZXNjcmlwdG9yIDggMCBSL0ZpcnN0Q2hhciAzMi9MYXN0Q2hhciAzMi9XaWR0aHMgMTEgMCBSPj4NCmVuZG9iag0KOCAwIG9iag0KPDwvVHlwZS9Gb250RGVzY3JpcHRvci9Gb250TmFtZS9BQkNERUUrQ2FsaWJyaS9GbGFncyAzMi9JdGFsaWNBbmdsZSAwL0FzY2VudCA3NTAvRGVzY2VudCAtMjUwL0NhcEhlaWdodCA3NTAvQXZnV2lkdGggNTIxL01heFdpZHRoIDE3NDMvRm9udFdlaWdodCA0MDAvWEhlaWdodCAyNTAvU3RlbVYgNTIvRm9udEJCb3hbIC01MDMgLTI1MCAxMjQwIDc1MF0gL0ZvbnRGaWxlMiAxMiAwIFI+Pg0KZW5kb2JqDQo5IDAgb2JqDQo8PC9Qcm9kdWNlcihjb252ZXJ0b25saW5lZnJlZS5jb20pL0NyZWF0b3IoY29udmVydG9ubGluZWZyZWUuY29tKS9DcmVhdGlvbkRhdGUoRDoyMDE0MDMxNjEyMTQxMykgL01vZERhdGUoRDoyMDE0MDMxNjEyMTQxMykgPj4NCmVuZG9iag0KMTAgMCBvYmoNClsgMjUwIDAgMCAwIDAgMCAwIDAgMzMzIDMzMyAwIDAgMjUwIDMzMyAyNTAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCA2MTEgNTU2IDcyMiA3MjIgMzMzIDAgMCA2MTEgODg5IDAgMCAwIDAgNjY3IDAgNjExIDcyMiAwIDAgMCA3MjIgMCAwIDAgMCAwIDAgMCA0NDQgNTAwIDQ0NCA1MDAgNDQ0IDMzMyA1MDAgNTAwIDI3OCAwIDUwMCAyNzggNzc4IDUwMCA1MDAgNTAwIDUwMCAzMzMgMzg5IDI3OCA1MDAgNTAwIDcyMiA1MDAgNTAwXSANCmVuZG9iag0KMTEgMCBvYmoNClsgMjI2XSANCmVuZG9

This is only a small part I've copied
Maybe the problem is in mapping?

p.s. Sorry for my bad english)

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052267.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1395246111167-4052267.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.5329c602.7f01579b.97ca%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(sAs59) #4

It's still unclear, I've decoded my whole text and instead I'm getting this
kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœ­XKoÛF¾eðБ⍖.Ék€8MÑ^
÷$=Ð%
–-—”ìôßwfvgw–‘"(8Ü÷7¯ofôáîúêý­ži£æfv·º¾Ò³9üÓ³¦R¦êºP•Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ,VÙ¿Æ0\À9“u°ï­q~aðo,²'øxaaèEw>Ö°Á¤ßÿB06!ØÓv„
3c¼xµC< ,í‘b-aÜ¿âzOrù;_àã)o³þ—öñ.Z]ÑU#o^
”ž6ý“ë2SN¾?avd8³ü¯Ùݯ×WÁî~4BUªÖ¾Æ7J[EùWp‹“÷)×uÖí^áÏŽ·ÐC2ö„ÒÍârlPúÍÝbÑoQ«ˆrèèìˆBãz%¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú±9Ø›±wï/ù}ëÜH>Û] ̲RÆze.Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘…P©£§ŽqΩqËn 3Rjºÿ.•E¼Dj^ }—×ÑGŽÂª¢¸ö•’Hñ+Œ;Úp@e¹ÉàªôÞ…žjÎP[Õ6^ƒKFMaß;Ò®¨Ý[Ïqœ §1¿Ox¼^L3”³$t8•ÜãIåÞO^_¹oTÁ^’¡G3
c“éà}Á)+µàZrn|mÍ!A׿åÆãatáÕ€ŒÅ#59C~÷ü™xJëò¬!lÛ¨’
Ñå7p¼«‘udPÕæ¿WíµÓ=3Õ&5Œÿ†ñ!qå½—sǐÜF‰fÅhùC:reGÿwìqÄs,B’”Ì1ä.‘U)âŒÜ´ñf<§õºU-+¡M1I^¥WÃ(g‚Ì8p¼Š’©'|G¡KÕ´)Ž-ç@¾·wª0ç’
œ=~“¤?\Þ?ÀñVÚ’.ëaÿô¤h8¢G’£pÌT/p&PÊ+$‰_Äy[Y­Lá•4:MxŸßsävb³Ö;‰i+”¡#†à@à?Nm"DN¿
ª]l™}„ñw6û(} ­«|‚ »E’ëézÔU_¤äWVÖÒgk½7vˆ§þ¿ä`MK¥‘R$

è¼Ùm#Ì^O2NÐÎΑrØÃpé†jÕ:I“^ýee§EaÞ‰6å][BI·cÌôY–E
†[HéAÔÝMùœÁœ·>8–¤åWºñ5F•¬æ/¹‘• F yjëì‡ô>"h¥É>!ÈeiJ
¿L÷>ȨÀù–kËÄÃŽ£-‹Bé
EK†™Ï…ÏáUGü-f x3TG©ï¶Ze'~cÒ U®Ý=w>i­åöf8§úy¥šÒ
óH± Ñ‚-ZˆÀ0pÖy‘µLIIʁKú!÷þßqGõV½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$·í zT™šÆBÞ‰%J²C*hB)Õû>.a+IöHûr9SUM­ÊÊãý–u‡¼Œ‰x'â'åѝÏøà“ÜCsÂk[O#,åà]:€ðµt_[DþqÁì¶^fÚªEÝ'"4­5ªÒéÞ“÷ÚV™É½lZWašì[î¥ YzÑq~ ½"ÉˈÐCHóƒŒÆ6):uu>@+Û ?:´Ÿ}9¤þ
îCoPÎÁï„èeÅâÁ»Q·d±î¹j£¡h|“`Ò[€þ"%;
²ÇÁ…ÐÌ—“ž"Ј£ä"eÝ
=ù•IÑ/ø®ØÁÓÄSo!
!…ý\íÕ\õ´-tÆÝú$òÂi®¨D¯B˜.lÖ¯ _lüéçHâPeÇa9Š=±†Á
M¹‰æ¥ŽïÀ¿ŒˆjKÅEY¼-¾ƒ:‡ÎbÌ£aàôžIÉŸYF7?®ÐÌ}îÊð}ô±ó<T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jަȤÿòˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA


(David Pilato) #5

I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source (default):
curl -XGET 'http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 (mr.akmurat@gmail.com) a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœ­XKoÛF ¾ ð Б â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêý­ži£æfv·º¾Ò³9üÓ³¦R ¦êºP• Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï ­q~ að o,² 'ø xa èEw >Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^ ”ž6ý“ë2SN¾?avd8³ü¯Ùݯ×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] ̲RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqΩqËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@ ¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3 c“éà}Á) +µàZrn|mÍ!A׿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä. ‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ ?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰
Äy[Y­Lá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿ ª ]l™}„ñw6û(} ­«|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v  ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎΑrØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[HéAÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>ȨÀù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>i­åö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUM­ÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€ ðµt
[DþqÁì¶^fÚªEÝ'" 4­5ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6):uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò [€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B ˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7 ?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jަȤ ÿ òˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(sAs59) #6

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:
http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:

{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}

It returns files ID and its good.

Is there a way showing my files content in a readable form

Usually it returns:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" :
{"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

I want:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" :
{"content":{"content_type":null,"title":"D:/text.txt","content":"My
name is Akmurat Saktagan. I am 21 years
old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via ElasticSearch Users] <
ml-node+s115913n4052339h86@n3.nabble.com> wrote:

I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source
(default):
curl -XGET '
http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=0)
a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting
this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœ­XKoÛF ¾ ð Б â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêý­ži£æfv·º¾Ò³9üÓ³¦R ¦êºP•
Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï ­q~ að o,² 'ø xa èEw

Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^
”ž6ý“ë2SN¾?avd8³ü¯Ùݯ×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] ̲RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqΩqËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3
c“éà}Á) +µàZrn|mÍ!A׿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä.
‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ
?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰
Äy[Y­Lá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿
ª ]l™}„ñw6û(} ­«|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v  ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎΑrØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[HéAÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>ȨÀù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>i­åö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUM­ÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€
ðµt
[DþqÁì¶^fÚªEÝ'" 4­5ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6):uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò
[€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B
˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7
?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jަȤ ÿ òˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA


View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052333.html

Sent from the ElasticSearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at Nabble.com.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.localhttps://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.


If you reply to this email, your message will be added to the discussion
below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4051989&code=bXIuYWttdXJhdEBnbWFpbC5jb218NDA1MTk4OXwxOTEyNTA5Nzkz
.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml


(David Pilato) #7

Could you paste your mapping?

http://localhost:9200/mongoindex/files/_mapping?pretty

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:15, sAs59 mr.akmurat@gmail.com a écrit :

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:
http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}
It returns files ID and its good.
Is there a way showing my files content in a readable form
Usually it returns:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}
}
I want:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"My name is Akmurat Saktagan. I am 21 years old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}
Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via ElasticSearch Users] <[hidden email]> wrote:
I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source (default):
curl -XGET 'http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]) a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœ­XKoÛF ¾ ð Б â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêý­ži£æfv·º¾Ò³9üÓ³¦R ¦êºP• Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï ­q~ að o,² 'ø xa èEw >Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^ ”ž6ý“ë2SN¾?avd8³ü¯Ùݯ×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] ̲RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqΩqËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@ ¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3 c“éà}Á) +µàZrn|mÍ!A׿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä. ‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ ?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰
Äy[Y­Lá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿ ª ]l™}„ñw6û(} ­«|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v  ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎΑrØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[HéAÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>ȨÀù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>i­åö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUM­ÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€ ðµt
[DþqÁì¶^fÚªEÝ'" 4­5ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6):uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò [€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B ˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7 ?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jަȤ ÿ òˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

View this message in context: Re: searching pdf files by content with Mongodb-river

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(sAs59) #8

http://localhost:9200/mongoindex/files/_mapping?pretty=true

"mongoindex" : {
"mappings" : {
"files" : {
"properties" : {
"chunkSize" : {
"type" : "long"
},
"content" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"content" : {
"type" : "string"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"contentType" : {
"type" : "string"
},
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"index" : "no",
"store" : true
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"filename" : {
"type" : "string"
},
"length" : {
"type" : "long"
},
"md5" : {
"type" : "string"
},
"metadata" : {
"type" : "object"
},
"uploadDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}
}

On Sat, Mar 22, 2014 at 7:33 PM, dadoonet [via ElasticSearch Users] <
ml-node+s115913n4052548h65@n3.nabble.com> wrote:

Could you paste your mapping?

http://localhost:9200/mongoindex/fileshttp://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
/_mapping?pretty

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:15, sAs59 <[hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=0>
a écrit :

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:

http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:

{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}

It returns files ID and its good.

Is there a way showing my files content in a readable form

Usually it returns:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

I want:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"My name is Akmurat Saktagan. I am 21 years old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via ElasticSearch Users] <[hidden
email] http://user/SendEmail.jtp?type=node&node=4052547&i=0> wrote:

I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source
(default):
curl -XGET '
http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
http://Elasticsearch.com

@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=0)
a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting
this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœ­XKoÛF ¾ ð Б â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêý­ži£æfv·º¾Ò³9üÓ³¦R ¦êºP•
Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï ­q~ að o,² 'ø xa èEw

Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^
”ž6ý“ë2SN¾?avd8³ü¯Ùݯ×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] ̲RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqΩqËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3
c“éà}Á) +µàZrn|mÍ!A׿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä.
‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ
?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰
Äy[Y­Lá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿
ª ]l™}„ñw6û(} ­«|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v  ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎΑrØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[HéAÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>ȨÀù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>i­åö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUM­ÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€
ðµt
[DþqÁì¶^fÚªEÝ'" 4­5ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6):uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò
[€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B
˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7
?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jަȤ ÿ òˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA


View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052333.html

Sent from the ElasticSearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.localhttps://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.


If you reply to this email, your message will be added to the
discussion below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
here.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml


View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052547.html
Sent from the ElasticSearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.frhttps://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.


If you reply to this email, your message will be added to the discussion
below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052548.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4051989&code=bXIuYWttdXJhdEBnbWFpbC5jb218NDA1MTk4OXwxOTEyNTA5Nzkz
.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml


(David Pilato) #9

Sounds like it's not correct.

You have 2 attachments and the one you actualy use does not store file.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:36, sAs59 mr.akmurat@gmail.com a écrit :

http://localhost:9200/mongoindex/files/_mapping?pretty=true
"mongoindex" : {
"mappings" : {
"files" : {
"properties" : {
"chunkSize" : {
"type" : "long"
},
"content" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"content" : {
"type" : "string"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"contentType" : {
"type" : "string"
},
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"index" : "no",
"store" : true
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"filename" : {
"type" : "string"
},
"length" : {
"type" : "long"
},
"md5" : {
"type" : "string"
},
"metadata" : {
"type" : "object"
},
"uploadDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}
}

On Sat, Mar 22, 2014 at 7:33 PM, dadoonet [via ElasticSearch Users] <[hidden email]> wrote:
Could you paste your mapping?

http://localhost:9200/mongoindex/files/_mapping?pretty

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:15, sAs59 <[hidden email]> a écrit :

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:
http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}
It returns files ID and its good.
Is there a way showing my files content in a readable form
Usually it returns:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}
}
I want:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"My name is Akmurat Saktagan. I am 21 years old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}
Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via ElasticSearch Users] <[hidden email]> wrote:
I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source (default):
curl -XGET 'http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]) a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœ­XKoÛF ¾ ð Б â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêý­ži£æfv·º¾Ò³9üÓ³¦R ¦êºP• Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï ­q~ að o,² 'ø xa èEw >Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^ ”ž6ý“ë2SN¾?avd8³ü¯Ùݯ×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] ̲RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqΩqËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@ ¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3 c“éà}Á) +µàZrn|mÍ!A׿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä. ‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ ?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰
Äy[Y­Lá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿ ª ]l™}„ñw6û(} ­«|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v  ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎΑrØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[HéAÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>ȨÀù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>i­åö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUM­ÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€ ðµt
[DþqÁì¶^fÚªEÝ'" 4­5ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6):uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò [€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B ˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7 ?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jަȤ ÿ òˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

View this message in context: Re: searching pdf files by content with Mongodb-river

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052548.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1Ah6rpoM0ZTGUKrpb_yyBozA0s-_tQTRn7VEdAXPZ3wsw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E6ABD9A4-1F09-4EA5-B8CA-100A5F31474A%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(sAs59) #10

Is it about my mapping?

On Sat, Mar 22, 2014 at 9:31 PM, dadoonet [via ElasticSearch Users] <
ml-node+s115913n4052555h82@n3.nabble.com> wrote:

Sounds like it's not correct.

You have 2 attachments and the one you actualy use does not store file.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:36, sAs59 <[hidden email]http://user/SendEmail.jtp?type=node&node=4052555&i=0>
a écrit :

http://localhost:9200/mongoindex/files/_mapping?pretty=true

"mongoindex" : {
"mappings" : {
"files" : {
"properties" : {
"chunkSize" : {
"type" : "long"
},
"content" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"content" : {
"type" : "string"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"contentType" : {
"type" : "string"
},
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"index" : "no",
"store" : true
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"filename" : {
"type" : "string"
},
"length" : {
"type" : "long"
},
"md5" : {
"type" : "string"
},
"metadata" : {
"type" : "object"
},
"uploadDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}
}

On Sat, Mar 22, 2014 at 7:33 PM, dadoonet [via ElasticSearch Users] <[hidden
email] http://user/SendEmail.jtp?type=node&node=4052549&i=0> wrote:

Could you paste your mapping?

http://localhost:9200/mongoindex/fileshttp://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
/_mapping?pretty

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:15, sAs59 <[hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=0>
a écrit :

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:

http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:

{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}

It returns files ID and its good.

Is there a way showing my files content in a readable form

Usually it returns:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

I want:

{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"My name is Akmurat Saktagan. I am 21 years old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}

Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via ElasticSearch Users] <[hidden
email] http://user/SendEmail.jtp?type=node&node=4052547&i=0> wrote:

I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source
(default):
curl -XGET '
http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
http://Elasticsearch.com

@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=0)
a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting
this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœ­XKoÛF ¾ ð Б â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêý­ži£æfv·º¾Ò³9üÓ³¦R ¦êºP•
Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï ­q~ að o,² 'ø xa èEw

Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^
”ž6ý“ë2SN¾?avd8³ü¯Ùݯ×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] ̲RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqΩqËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3
c“éà}Á) +µàZrn|mÍ!A׿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä.
‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ
?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰
Äy[Y­Lá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿
ª ]l™}„ñw6û(} ­«|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v  ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎΑrØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[HéAÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>ȨÀù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>i­åö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUM­ÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€
ðµt
[DþqÁì¶^fÚªEÝ'" 4­5ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6):uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò
[€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B
˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7
?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jަȤ ÿ òˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA


View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052333.html

Sent from the ElasticSearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052339&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.localhttps://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.


If you reply to this email, your message will be added to the
discussion below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
here.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml


View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052547.html
Sent from the ElasticSearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052548&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.frhttps://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.


If you reply to this email, your message will be added to the
discussion below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052548.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
here.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml


View this message in context: Re: searching pdf files by content with
Mongodb-riverhttp://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052549.html
Sent from the ElasticSearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at
Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052555&i=1
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1Ah6rpoM0ZTGUKrpb_yyBozA0s-_tQTRn7VEdAXPZ3wsw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1Ah6rpoM0ZTGUKrpb_yyBozA0s-_tQTRn7VEdAXPZ3wsw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4052555&i=2
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/E6ABD9A4-1F09-4EA5-B8CA-100A5F31474A%40pilato.frhttps://groups.google.com/d/msgid/elasticsearch/E6ABD9A4-1F09-4EA5-B8CA-100A5F31474A%40pilato.fr?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.


If you reply to this email, your message will be added to the discussion
below:

http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052555.html
To unsubscribe from searching pdf files by content with Mongodb-river, click
herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4051989&code=bXIuYWttdXJhdEBnbWFpbC5jb218NDA1MTk4OXwxOTEyNTA5Nzkz
.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml


(David Pilato) #11

Yes.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 16:38, sAs59 mr.akmurat@gmail.com a écrit :

Is it about my mapping?

On Sat, Mar 22, 2014 at 9:31 PM, dadoonet [via ElasticSearch Users] <[hidden email]> wrote:
Sounds like it's not correct.

You have 2 attachments and the one you actualy use does not store file.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:36, sAs59 <[hidden email]> a écrit :

http://localhost:9200/mongoindex/files/_mapping?pretty=true
"mongoindex" : {
"mappings" : {
"files" : {
"properties" : {
"chunkSize" : {
"type" : "long"
},
"content" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"content" : {
"type" : "string"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"contentType" : {
"type" : "string"
},
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"index" : "no",
"store" : true
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
}
}
},
"filename" : {
"type" : "string"
},
"length" : {
"type" : "long"
},
"md5" : {
"type" : "string"
},
"metadata" : {
"type" : "object"
},
"uploadDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}
}

On Sat, Mar 22, 2014 at 7:33 PM, dadoonet [via ElasticSearch Users] <[hidden email]> wrote:
Could you paste your mapping?

http://localhost:9200/mongoindex/files/_mapping?pretty

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 14:15, sAs59 <[hidden email]> a écrit :

Hi,
I followed your instructions and it seems work.
In my files collection I have two files which contains word "akmurat"
And when I search using following command:
http://localhost:9200/mongoindex/files/_search?q=akmurat&fields=file.file&pretty=true
I got:
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89c4119bcc028e8001da",
"_score" : 0.081366636
}, {
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_score" : 0.057534903
} ]
}
}
It returns files ID and its good.
Is there a way showing my files content in a readable form
Usually it returns:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"TXkgbmFtZSBpcyBBa211cmF0IFNha3RhZ2FuLiBJIGFtIDIxIHllYXJzIG9sZC4="},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}
}
I want:
{
"_index" : "mongoindex",
"_type" : "files",
"_id" : "532d89b94f7399ab6975977a",
"_version" : 1,
"found" : true, "_source" : {"content":{"content_type":null,"title":"D:/text.txt","content":"My name is Akmurat Saktagan. I am 21 years old."},"filename":"D:/text.txt","contentType":null,"md5":"c8f86639cb4bfec23deab7beea473683","length":47,"chunkSize":262144,"uploadDate":"2014-03-22T13:01:45.258Z","metadata":{}}

}
Thank you!

On Thu, Mar 20, 2014 at 3:45 PM, dadoonet [via ElasticSearch Users] <[hidden email]> wrote:
I think I'm starting to understand what you are trying to get…
You don't want original content but only extracted content, right?

I think that if you store content it should work.

Something like this (in mapping):

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"index" : "no", "store" : "yes"}
}
}
}
}
}

And then when search, ask for field "file.file" instead of _source (default):
curl -XGET 'http://localhost:9200/index/person/_search?q=whatever&fields=file.file'

Should work I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 mars 2014 à 10:12:01, sAs59 ([hidden email]) a écrit:

It's still unclear, I've decoded my whole text and instead I'm getting this kind of text.
Where should I see my actual text?
I also tried using different charset, but still unclear.

<</Filter/FlateDecode/Length 1549>>
stream
xœ­XKoÛF ¾ ð Б â –.Ék€8MÑ^
÷ $=Ð % –-—”ìôßwfvgw–‘" ( 8Ü÷7¯ofôáîúêý­ži£æfv·º¾Ò³9üÓ³¦R ¦êºP• Ý=]_Ígküóéúêkv—›ì!¿)³~–ßh“½Áx‡ã!o²-~,ñ ,VÙ ¿Æ0\À9“u°ï ­q~ að o,² 'ø xa èEw >Ö°Á ¤ ßÿB06 !ØÓv„3c¼xµC< ,í‘b-aÜ¿âzOrù;àã)o³þ —öñ.Z]ÑU#o^ ”ž6ý“ë2SN¾?avd8³ü¯Ùݯ×W Á î~4BUªÖ ¾Æ7J[EùWp‹“÷)×uÖí ^áÏŽ·Ð C2ö„ÒÍâr l PúÍÝbÑoQ«ˆrèèìˆBãz% ¶aqüATÑ@šEÃõ#/+Z/²Ïh^¯ú ±9 Ø›±wï/ù}ëÜH>Û] ̲RÆze. Ú’@ì‚çz—au¼;q§® U¦Wžz^WVÙ"ÝÛ‘ …P©£§ŽqΩqËn 3Rj ºÿ.•E¼Dj^}—×Ñ GŽÂª¢¸ ö• ’H ñ+Œ;Úp@ ¹ÉàªôÞ…žjÎ P[Õ6^ƒKFMaß;Ò ®¨Ý[Ïqœ §1¿Ox¼^L 3 ”³$t8•Ü ã Iå ÞO^¹oTÁ^’¡G3 c“éà}Á) +µàZrn|mÍ!A׿åÆãatáÕ€ŒÅ#59C~÷ü™x Jë ò¬!lÛ¨’
Ñå7 p¼ «‘u d PÕæ¿ WíµÓ= 3 Õ&5 Œÿ†ñ!qå½—sÇ ÜF‰fÅ hùC:r Gÿ wìqÄs,B ’”Ì1 ä. ‘U)âŒÜ´ñf<§õºU-+ ¡M1I^¥WÃ(g‚Ì8p¼Š’ ©' | G¡KÕ´)Ž-ç@¾·wª0ç’ œ= ~“¤?\Þ ?ÀñVÚ’.ë ÿô¤h8¢ G’£pÌT/p&PÊ+ $‰
Äy[Y­Lá•4:MxŸßsäv b³Ö;‰ i+”¡# †à@à?Nm" DN¿ ª ]l™}„ñw6û(} ­«|‚ »E’ëéz ÔU_¤äWVÖÒg k½7v  ˆ§þ¿äM K¥‘ R$>è¼Ùm#Ì^O2 NÐÎΑrØÃ*pé†jÕ:I“ ^ý §E Þ‰6å ][BI·cÌô Y–*E †[HéAÔÝMùœÁœ· >8 – ¤åWºñ 5 F•¬æ/¹‘•Fy jëì ‡ô>" h¥É>!È i J¿L÷>ȨÀù–kËÄÃŽ£-‹Bé*EK†™Ï…ÏáUGü-f x3TG©ï¶Z '~ cÒ U®Ý=w>i­åö f8§úy¥šÒ óH ± Ñ‚- Zˆ À0pÖy‘ µLI IÊ Kú!÷þßqGõ V ½X¦üþÛO\§,¬2uŠÿæÔÞR“áäÞ“÷–FÕ“½$· í
zT™šÆBÞ‰% J²C*hB)Õû>.a +IöHûr9SUM­ÊÊãý–u‡¼Œ‰x'â'åÑ Ïøà“ÜCsÂk[O#,åà] :€ ðµt
[DþqÁì¶^fÚªEÝ'" 4­5ªÒéÞ“÷ÚV™É½lZW šì[î¥YzÑq~
½"É Ëˆ ÐCHóƒŒÆ6):uu>@+Û ?:´Ÿ}9 ¤þ îCoPÎÁ ï„è ÅâÁ»Q·d ± î¹j£ ¡h|“Ò [€þ"%;²ÇÁ…ÐÌ—“ž "Ð ˆ£ä " Ý*= ù•I Ñ/ø®Ø ÁÓÄSo! ! … ý\íÕ\ õ´-tÆÝú$òÂi®¨D¯B ˜.lÖ¯ _lüéçH âP eÇa9Š=±†Á M ¹‰æ¥ŽïÀ¿ŒˆjK ÅEY¼ - ¾ƒ:‡ÎbÌ£ àôžIÉŸYF7 ?®ÐÌ}îÊð}ô±ó< T]s#àlê\m—ûò1h²÷MrlLf¹Ö'ÊÖæØOBj‚åým1ÓzúÛeQ¶jަȤ ÿ òˆ©
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinA

View this message in context: Re: searching pdf files by content with Mongodb-river

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1CzWZCxFbYL_akVm%2B%2Bjh%2BwQj-NXsAgedTsp3sLbUtNpKw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.532ab87c.9daf632.97ca%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052339.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1D-EDGHk_kn5tzgU6CWU58hW29jdkd0sVdFhUv6Coppow%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85A4AC31-3459-4D92-84F2-027047022C4C%40pilato.fr.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052548.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1Ah6rpoM0ZTGUKrpb_yyBozA0s-_tQTRn7VEdAXPZ3wsw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E6ABD9A4-1F09-4EA5-B8CA-100A5F31474A%40pilato.fr.

For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052555.html
To unsubscribe from searching pdf files by content with Mongodb-river, click here.
NAML

View this message in context: Re: searching pdf files by content with Mongodb-river
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B5_B1Cyts5QYke5bXTUUih7AQ%3DVB6Xb1VxscSSu_qvyANjjHA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6EC7F599-6C4A-4812-80AB-2FC7C2870535%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(system) #12