Elasticsearch mongodb river with GridFS attached to DBObject problem

(Zoran Jeremic) #1


I'm using Elasticsearch mongodb river to index documents stored in mongodb,
I couln't find appropriate documentation clearly explaining how to solve
problem I have. Document has a form of DBObject containing several custom
fields and GridFSInputFile attached as "file" field.
I specified a river like:
PUT _meta
"type": "mongodb",
"mongodb": {
"db": "platform4",
"collection": "documents2"
"index": {
"name": "inextweb_documents4",
"type": "documents"

MongoDB and GridFS store document and file properly. Elasticsearch river
maps document so it can be searched over the custom fields. However, I'm
able to search text within files.
I tried to add "gridfs":"true" and "gridfs":"fs.files" when I specified
but that combination didn't work at all.I didn't use mappings at
as I don't know which fields could be added at runtime.
Could you please suggest what could be the problem here?
This is the document format:

"engine_id": "engineid1234",
"external_id": "http://en.wikipedia.org/wiki/Healthcare_in_India",
"contentType": "text/html",
"added": 1384822341,
"file": {
"_id": {
"$oid": "528ab645975c8c01cdb201fb"
"chunkSize": 262144,
"length": 171063,
"md5": "59cb83ecde3378749c58893567e021a3",
"filename": "9fd7cce6-4248-41f3-98f0-dffb498a061d",
"contentType": null,
"uploadDate": {
"$date": "2013-11-19T00:52:21.009Z"
"aliases": null
"fields": [
"title": {
"boost": 2.21,
"type": "text",
"value": "Healthcare in India"
"domain": {
"boost": 0,
"type": "text",
"value": "wikipedia.org"


You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

(system) #2