Just an FYI... Start with, for example, the following JSON document (all on
one line for the _bulk API, but pretty printed below). This follows my
basic document struture: An array of field names, whith each of those
fields taking either a single value or an array of heterogenous values.
Nothing more complex than a Map<String,Object> can represent, in which
Object is either a single type (String, Boolean, and so on) or an
Array. A subset of the "throw any JSON document into ES", but still
a very useful subset that far exceeds any database engine I've ever used:
{
"_index" : "twitter" ,
"_type" : "tweet" ,
"_id" : "3" ,
"_score" : 1.0 ,
"_source" : {
"user" : "bbthing68" ,
"postDate" : "2012-11-15T14:12:12" ,
"altitude" : 45767 ,
"dst" : true ,
"prefix" : null ,
"counts" : [ 1 , 2 , 3.14149 , "11.1" , "13" ] ,
"vdst" : [ true , false , true ] ,
"message" : [ 2 , "Just trying this out" , "With one/two multivalued
fields" ]
}
}
Both the SearchHit.getSourceAsString and the GetResponse.getSourceAsStringmethods return the following JSON string (again, it's on one line, but it's
pretty printed here only for this post):
{
"user" : "bbthing68" ,
"postDate" : "2012-11-15T14:12:12" ,
"altitude" : 45767 ,
"dst" : true ,
"prefix" : null ,
"counts" : [ 1 , 2 , 3.14149 , "11.1" , "13" ] ,
"vdst" : [ true , false , true ] ,
"message" : [ 2 , "Just trying this out" , "With one/two multivalued
fields" ]
}
I was using the getSourceAsMap methods, which return a Map<String,Object>.
But when I use the JsonParser in stream parsing mode (as supplied directly
by ElasticSearch; no need to fetch the full Jackson jar file), I can
directly stream parse that source so very much faster. My overall
response times are now much lower. And it's also much easier and faster for
me to just parse the source and pull out only the subset of the fields I
want instead of try to tell ES which subset of fields I want.
Oh, and when I store the fields from my stream parsing process, I put them
into a LinkedHashMap<String,Object>. That little bit of overhead keeps the
keys (field names) in the exact same order as they appear in the source.
Which is really awesomely cool. No more jumbled order of field names when
displaying results during testing!
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.