Accessing non-stored fields


(Tom Verschueren) #1

Hi

I am new to elasticsearch and am trying out the attachement plugin. I'm a
bit confused on how to handle the meta-data from the attachements.

I have created a simple mapping as example. I explicitly store the 'title'
field, other fields are by default not stored.
PUT /test/file/_mapping
{
"random" : {
"properties": {
"content" : {
"type" : "attachment",
"fields" : {

                "title" : {
                    "index": "analyzed", 
                    "store" : "yes"
                },

                "content_type" : {
                    "store" : "no"
                }
            }
      }
  }

}
}

This is the mapping as given by elasticsearch

{
"test": {
"mappings": {
"file": {
"properties": {
"content": {
"type": "attachment",
"path": "full",
"fields": {
"content": {
"type": "string"
},
"author": {
"type": "string"
},
"title": {
"type": "string",
"store": true
},
"name": {
"type": "string"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"keywords": {
"type": "string"
},
"content_type": {
"type": "string"
},
"content_length": {
"type": "integer"
}
}
}
}
}
}
}
}

Example query:

GET /test/file/_search
{
"fields": [
"*", "content.content_type"
],
"query": {
"match": {
"content.content_type": "xhtml test document"
}
}
}

response:
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.26574233,
"hits": [
{
"_index": "test",
"_type": "file",
"_id": "3SmwWJe6TtiP0nheD6pFCg",
"_score": 0.26574233,
"fields": {
"content.content_type": [
"...PCEtLQogTGljZW5zZWQgdG8gdGhlI..."
],
"content.title": [
"XHTML test document"
]
}
}
]
}
}

So I am able to query on the "content_type" field, but in the response I
get the base64 representation of the attachement, instead of
""application/xhtml+xml".
Do I really need to store each meta-data field for my attachement? I was
under the impression that elasticsearch would extract the field from the
_source at runtime (or would this cause to much overhead?)

Thx,
Tom

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/461e4ba9-cdab-4c76-a915-c8e1f8b7ae22%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #2

Yes you'd need to store the content_type to get it back. The _source field
in your case is actually nothing more than the base64 of your raw input
document at the time you indexed it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e7c8c6d7-7c9d-49a6-8234-9bebec5050d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3