Yes, I suppose having metadata attached to a source document like that
would make it both harder to parse and, as you mentioned, make it
unpredictable in terms of what can be expected in the field object.
What seems natural to me is extracting the boost/analyzer metadata
into a separate object within an inserted document. For example:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"tweet" : {
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "Me gusta Elastic Search!"
}, properties {
"message" : { "_analyzer" : {type:"snowball",
language:"spanish"} }
}
}'
That way the "source" of the message would stay intact, and additional
properties could be set/overridden for each inserted document.
On Mar 6, 4:25 am, Shay Banon shay.ba...@elasticsearch.com wrote:
You can also use the analyzer field mapping:Elasticsearch Platform — Find real-time answers at scale | Elastic, but note this controls the index analyzer to all fields in the doc, unless some are explicitly set.
That raises an interesting question, which I have been thinking about a bit lately. Which is how can a field have custom boost / analyzer / other be set per document. Its a bit tricky, since I would like to maintain the "domain" drive aspect of the json document, but it should still be allowed somehow. Still thinking on how best to provide that, maybe someting liek this:
{
"my_field" : {
"value" : "text here",
"_analyzer" : "..."
}
}
But, then, this looses the "pureness" aspect of the indexed doc. On the other hand, its good to have that option.
On Saturday, March 5, 2011 at 6:35 PM, Kosta wrote:
Just stumbled across this:
Mapper: Dynamic Template Support · Issue #397 · elastic/elasticsearch · GitHub
I suppose I could have a field representing the document's source
language, and based on that dynamic field the analyzer language would
be set, but that's a bit hackish
On Mar 5, 2:28 pm, Kosta kosta.kra...@gmail.com wrote:
I looked at the docs for mapping types, and as far as I can see it's
possible to configure analyzers for index fields either in json config
files, or by running the PUT command with appropriate config on
Elasticsearch itself. In one of the examples I saw the following;
"my_analyzer" : {
"type" : "snowball",
"language" : "English"
}
Which basically hardcodes the language of a particular field to
English. However, in my case I know the language at insert time, so I
would like to specify the analyzer language dynamically with each
insert. Would something like this be possible? Thanks for any
suggestions in advance!