Store Large Object in Index without Mapping

davrob · November 21, 2011, 7:34pm

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use ElasticSearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.

kimchy · November 22, 2011, 8:34am

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 daviroberts@gmail.com wrote:

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use Elasticsearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.

davrob · November 23, 2011, 11:38am

Thanks Shay, that works great.

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com wrote:

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use Elasticsearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.

davrob · November 24, 2011, 5:08pm

Having implemented the solution using enabled=false on a normal object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into a
nested property where, as I understand it, nested objects are only de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" : "string", "index" : "not_analyzed"},
"weekViewMap" : {"type" : "object", "enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a Nested
TermFilter search, where nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

Will this work?
Will a query against another nested object also cause this nested
object to be included in the source, or do the different nested
objects act indepently?

David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com wrote:

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use Elasticsearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.

kimchy · November 24, 2011, 6:26pm

You mean the object is new that you now add to your document? I suggest you
first understand why its slower, for example, by asking for empty fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 daviroberts@gmail.com wrote:

Having implemented the solution using enabled=false on a normal object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into a
nested property where, as I understand it, nested objects are only de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" : "object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a Nested
TermFilter search, where nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

Will this work?

Will a query against another nested object also cause this nested
object to be included in the source, or do the different nested
objects act indepently?

David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com wrote:

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use Elasticsearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.

davrob · November 25, 2011, 11:08am

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

All fields but 'large' field (obtained by running against an old
index without this in):

INFO: QUERY TIME --->>>366ms
One field - obtained from the index with the 'large' field in, but
just asking for one field using "searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms
All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");

or

searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I suggest you
first understand why its slower, for example, by asking for empty fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com wrote:

Having implemented the solution using enabled=false on a normal object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into a
nested property where, as I understand it, nested objects are only de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" : "object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a Nested
TermFilter search, where nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

Will this work?

Will a query against another nested object also cause this nested
object to be included in the source, or do the different nested
objects act indepently?

David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com wrote:

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use Elasticsearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.

kimchy · November 27, 2011, 3:20pm

Do you rely on _source or do you specifically store fields? If so, then I
don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large" field),
and if you don't, it just a matter of passing the _source as is through the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 daviroberts@gmail.com wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

All fields but 'large' field (obtained by running against an old
index without this in):

INFO: QUERY TIME --->>>366ms

One field - obtained from the index with the 'large' field in, but
just asking for one field using "searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");
        or
searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I suggest
you
first understand why its slower, for example, by asking for empty fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com wrote:

Having implemented the solution using enabled=false on a normal object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into a
nested property where, as I understand it, nested objects are only de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" :
"object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a Nested
TermFilter search, where nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

Will this work?

Will a query against another nested object also cause this nested
object to be included in the source, or do the different nested
objects act indepently?

David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not
even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com
wrote:

Hi,

I would like to store a large hashmap in my index without mapping
it
but having available in the _source.
Essentially I want to use Elasticsearch as an object store, where I
can associate a hashmap with an index id rather than make a call
out
to Mongo DB, and the like, using the id as a reference.

-David.

davrob · November 28, 2011, 10:06am

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
 Map<String, Object> source = hit.getSource();
     if (source == null){
    source = new HashMap<String, Object>();
    for (String responseField :

searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If so, then I
don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large" field),
and if you don't, it just a matter of passing the _source as is through the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

All fields but 'large' field (obtained by running against an old
index without this in):

INFO: QUERY TIME --->>>366ms

One field - obtained from the index with the 'large' field in, but
just asking for one field using "searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");
        or
searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I suggest
you
first understand why its slower, for example, by asking for empty fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com wrote:

Having implemented the solution using enabled=false on a normal object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into a
nested property where, as I understand it, nested objects are only de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" :
"object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a Nested
TermFilter search, where nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

Will this work?

Will a query against another nested object also cause this nested
object to be included in the source, or do the different nested
objects act indepently?

David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not
even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com
wrote:

Hi,

I would like to store a large hashmap in my index without mapping
it
but having available in the _source.
Essentially I want to use Elasticsearch as an object store, where I
can associate a hashmap with an index id rather than make a call
out
to Mongo DB, and the like, using the id as a reference.

-David.

kimchy · November 28, 2011, 11:23am

Right, but it still gets loaded and extracted, the _source, when you ask
for a field. And, just transferring it seems strange that it will cause
that big of a time difference. How big is the document that you index (with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 daviroberts@gmail.com wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:
Do you rely on _source or do you specifically store fields? If so, then I
don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large"
field),
and if you don't, it just a matter of passing the _source as is through
the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

All fields but 'large' field (obtained by running against an old
index without this in):

INFO: QUERY TIME --->>>366ms

One field - obtained from the index with the 'large' field in, but
just asking for one field using "searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");
        or
searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I
suggest
you
first understand why its slower, for example, by asking for empty
fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com
wrote:

Having implemented the solution using enabled=false on a normal
object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into
a
nested property where, as I understand it, nested objects are only
de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" :
"object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a
Nested
TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

Will this work?

Will a query against another nested object also cause this
nested
object to be included in the source, or do the different nested
objects act indepently?

David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object
level
property of it with enabled set to false, which means it will not
even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com
wrote:

Hi,

I would like to store a large hashmap in my index without
mapping
it
but having available in the _source.
Essentially I want to use Elasticsearch as an object store,
where I
can associate a hashmap with an index id rather than make a
call
out
to Mongo DB, and the like, using the id as a reference.

-David.

davrob · November 28, 2011, 4:10pm

Hi Shay,

Well, yes, I think your instinct is right, the difference in the files
is only very small, here's an example:

gist.github.com

https://gist.github.com/dav-rob/1400795

Week View Map

{
    "weekViewMap": {
        "20111128-day3": {
            "numActivities": 3,
            "userTypeMap": {
                "25120904": "T",
                "25801329": "M",
                "25867415": "M"
            }
        },

This file has been truncated. show original

The Average size of Contact document is 1327 characters, while the
average size of the contacts with the added hashmap is 1699
characters.

So, I'm not sure what is causing the issue, I do have a map within the
map, but I'd have thought the JSON serialization API would have no
real problem with that.

The work around I have in mind at the moment is to have a "fat" type
"fatcontact" and a skinny one "contact" that are identical apart from
the embedded HashMap. If users want to search on the embedded map I
switch types to search on "fatcontact" type - but this is a pretty
ugly work-around, because I index everything twice.

-David.

On Nov 28, 11:23 am, Shay Banon kim...@gmail.com wrote:

Right, but it still gets loaded and extracted, the _source, when you ask
for a field. And, just transferring it seems strange that it will cause
that big of a time difference. How big is the document that you index (with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If so, then I
don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large"
field),
and if you don't, it just a matter of passing the _source as is through
the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

All fields but 'large' field (obtained by running against an old
index without this in):

INFO: QUERY TIME --->>>366ms

One field - obtained from the index with the 'large' field in, but
just asking for one field using "searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");
        or
searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I
suggest
you
first understand why its slower, for example, by asking for empty
fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com
wrote:

Having implemented the solution using enabled=false on a normal
object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into
a
nested property where, as I understand it, nested objects are only
de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" :
"object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a
Nested
TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

Will this work?

Will a query against another nested object also cause this
nested
object to be included in the source, or do the different nested
objects act indepently?

David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object
level
property of it with enabled set to false, which means it will not
even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com
wrote:

Hi,

I would like to store a large hashmap in my index without
mapping
it
but having available in the _source.
Essentially I want to use Elasticsearch as an object store,
where I
can associate a hashmap with an index id rather than make a
call
out
to Mongo DB, and the like, using the id as a reference.

-David.

kimchy · November 28, 2011, 4:27pm

If thats the case, then its really strange. Can you post a recreation that
shows this (can be Java code, with a simple "main class" repro)?

On Mon, Nov 28, 2011 at 6:10 PM, davrob2 daviroberts@gmail.com wrote:

Hi Shay,

Well, yes, I think your instinct is right, the difference in the files
is only very small, here's an example:

Week View Map · GitHub

The Average size of Contact document is 1327 characters, while the
average size of the contacts with the added hashmap is 1699
characters.

So, I'm not sure what is causing the issue, I do have a map within the
map, but I'd have thought the JSON serialization API would have no
real problem with that.

The work around I have in mind at the moment is to have a "fat" type
"fatcontact" and a skinny one "contact" that are identical apart from
the embedded HashMap. If users want to search on the embedded map I
switch types to search on "fatcontact" type - but this is a pretty
ugly work-around, because I index everything twice.

-David.

On Nov 28, 11:23 am, Shay Banon kim...@gmail.com wrote:
Right, but it still gets loaded and extracted, the _source, when you ask
for a field. And, just transferring it seems strange that it will cause
that big of a time difference. How big is the document that you index
(with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If so,
then I
don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large"
field),
and if you don't, it just a matter of passing the _source as is
through
the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

All fields but 'large' field (obtained by running against an old
index without this in):

INFO: QUERY TIME --->>>366ms

One field - obtained from the index with the 'large' field in,
but
just asking for one field using
"searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing
the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");
        or
searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I
suggest
you
first understand why its slower, for example, by asking for empty
fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com
wrote:

Having implemented the solution using enabled=false on a normal
object
type - I find that my query time has increased massively,
probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big
objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away
when
doing operations against the index (like searching with a
match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include
this
object (using a checkbox option) by pushing the large HashMap
into
a
nested property where, as I understand it, nested objects are
only
de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" :
{"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type"
:
"object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a
Nested
TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

Will this work?

Will a query against another nested object also cause this
nested
object to be included in the source, or do the different nested
objects act indepently?

David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object
level
property of it with enabled set to false, which means it
will not
even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 <
davirobe...@gmail.com>
wrote:

Hi,

I would like to store a large hashmap in my index without
mapping
it
but having available in the _source.
Essentially I want to use Elasticsearch as an object store,
where I
can associate a hashmap with an index id rather than make a
call
out
to Mongo DB, and the like, using the id as a reference.

-David.

davrob · November 28, 2011, 7:15pm

Hi Shay,

Do you have an example Gist of a Java based recreation, it would be
helpful to copy.

David.

On Nov 28, 4:27 pm, Shay Banon kim...@gmail.com wrote:

If thats the case, then its really strange. Can you post a recreation that
shows this (can be Java code, with a simple "main class" repro)?

On Mon, Nov 28, 2011 at 6:10 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Well, yes, I think your instinct is right, the difference in the files
is only very small, here's an example:

Week View Map · GitHub

The Average size of Contact document is 1327 characters, while the
average size of the contacts with the added hashmap is 1699
characters.

So, I'm not sure what is causing the issue, I do have a map within the
map, but I'd have thought the JSON serialization API would have no
real problem with that.

The work around I have in mind at the moment is to have a "fat" type
"fatcontact" and a skinny one "contact" that are identical apart from
the embedded HashMap. If users want to search on the embedded map I
switch types to search on "fatcontact" type - but this is a pretty
ugly work-around, because I index everything twice.

-David.

On Nov 28, 11:23 am, Shay Banon kim...@gmail.com wrote:

Right, but it still gets loaded and extracted, the _source, when you ask
for a field. And, just transferring it seems strange that it will cause
that big of a time difference. How big is the document that you index
(with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If so,
then I
don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large"
field),
and if you don't, it just a matter of passing the _source as is
through
the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

All fields but 'large' field (obtained by running against an old
index without this in):

INFO: QUERY TIME --->>>366ms

One field - obtained from the index with the 'large' field in,
but
just asking for one field using
"searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing
the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");
        or
searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I
suggest
you
first understand why its slower, for example, by asking for empty
fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com
wrote:

Having implemented the solution using enabled=false on a normal
object
type - I find that my query time has increased massively,
probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big
objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away
when
doing operations against the index (like searching with a
match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include
this
object (using a checkbox option) by pushing the large HashMap
into
a
nested property where, as I understand it, nested objects are
only
de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" :
{"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type"
:
"object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a
Nested
TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

Will this work?

Will a query against another nested object also cause this
nested
object to be included in the source, or do the different nested
objects act indepently?

David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object
level
property of it with enabled set to false, which means it
will not
even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 <
davirobe...@gmail.com>
wrote:

Hi,

I would like to store a large hashmap in my index without
mapping
it
but having available in the _source.
Essentially I want to use Elasticsearch as an object store,
where I
can associate a hashmap with an index id rather than make a
call
out
to Mongo DB, and the like, using the id as a reference.

-David.

kimchy · November 28, 2011, 8:48pm

Not one offhand, but it should be simple. Create a simple main class, start
a client in it, index a sample doc(s), and then execute the search requests
that show the problem, where one takes considerable more time than the
others.

On Mon, Nov 28, 2011 at 9:15 PM, davrob2 daviroberts@gmail.com wrote:

Hi Shay,

Do you have an example Gist of a Java based recreation, it would be
helpful to copy.

David.

On Nov 28, 4:27 pm, Shay Banon kim...@gmail.com wrote:
If thats the case, then its really strange. Can you post a recreation
that
shows this (can be Java code, with a simple "main class" repro)?

On Mon, Nov 28, 2011 at 6:10 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Well, yes, I think your instinct is right, the difference in the files
is only very small, here's an example:

Week View Map · GitHub

The Average size of Contact document is 1327 characters, while the
average size of the contacts with the added hashmap is 1699
characters.

So, I'm not sure what is causing the issue, I do have a map within the
map, but I'd have thought the JSON serialization API would have no
real problem with that.

The work around I have in mind at the moment is to have a "fat" type
"fatcontact" and a skinny one "contact" that are identical apart from
the embedded HashMap. If users want to search on the embedded map I
switch types to search on "fatcontact" type - but this is a pretty
ugly work-around, because I index everything twice.

-David.

On Nov 28, 11:23 am, Shay Banon kim...@gmail.com wrote:

Right, but it still gets loaded and extracted, the _source, when you
ask
for a field. And, just transferring it seems strange that it will
cause
that big of a time difference. How big is the document that you index
(with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If
so,
then I
don't really understand this big difference, since when you ask
for
specific fields, it still gets loaded and parsed (with the
"large"
field),
and if you don't, it just a matter of passing the _source as is
through
the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse =
searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

All fields but 'large' field (obtained by running against
an old
index without this in):

INFO: QUERY TIME --->>>366ms

One field - obtained from the index with the 'large' field
in,
but
just asking for one field using
"searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is
causing
the
problem, rather than the search.

In that case, do you think the above approach will be a good
way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");
        or
searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your
document? I
suggest
you
first understand why its slower, for example, by asking for
empty
fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 <
davirobe...@gmail.com>
wrote:

Having implemented the solution using enabled=false on a
normal
object
type - I find that my query time has increased massively,
probably
because the queries I execute almost always access the
"source"
object, which now has to deserialize additional very big
objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked
away
when
doing operations against the index (like searching with a
match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include
this
object (using a checkbox option) by pushing the large
HashMap
into
a
nested property where, as I understand it, nested objects
are
only
de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" :
{"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" :
{"type"
:
"object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by
making a
Nested
TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

Will this work?

Will a query against another nested object also cause
this
nested
object to be included in the source, or do the different
nested
objects act indepently?

David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the
object
level
property of it with enabled set to false, which means it
will not
even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 <
davirobe...@gmail.com>
wrote:

Hi,

I would like to store a large hashmap in my index
without
mapping
it
but having available in the _source.
Essentially I want to use Elasticsearch as an object
store,
where I
can associate a hashmap with an index id rather than
make a
call
out
to Mongo DB, and the like, using the id as a reference.

-David.

davrob · November 30, 2011, 2:29pm

Hi Shay,

So I ran my recreation here: Base Recreation Class and Test Class · GitHub

And its entirely my own stupidity - I was measuring the network
latency to New York from my desktop, as opposed to running the client
and server both in New York, or both locally.

Thanks for your patience, hopefully other java users will find the
BaseClass in the recreation usefull for their own work.

David.

On Nov 28, 8:48 pm, Shay Banon kim...@gmail.com wrote:

Not one offhand, but it should be simple. Create a simple main class, start
a client in it, index a sample doc(s), and then execute the search requests
that show the problem, where one takes considerable more time than the
others.

On Mon, Nov 28, 2011 at 9:15 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Do you have an example Gist of a Java based recreation, it would be
helpful to copy.

David.

On Nov 28, 4:27 pm, Shay Banon kim...@gmail.com wrote:

If thats the case, then its really strange. Can you post a recreation
that
shows this (can be Java code, with a simple "main class" repro)?

On Mon, Nov 28, 2011 at 6:10 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Well, yes, I think your instinct is right, the difference in the files
is only very small, here's an example:

Week View Map · GitHub

The Average size of Contact document is 1327 characters, while the
average size of the contacts with the added hashmap is 1699
characters.

So, I'm not sure what is causing the issue, I do have a map within the
map, but I'd have thought the JSON serialization API would have no
real problem with that.

The work around I have in mind at the moment is to have a "fat" type
"fatcontact" and a skinny one "contact" that are identical apart from
the embedded HashMap. If users want to search on the embedded map I
switch types to search on "fatcontact" type - but this is a pretty
ugly work-around, because I index everything twice.

-David.

On Nov 28, 11:23 am, Shay Banon kim...@gmail.com wrote:

Right, but it still gets loaded and extracted, the _source, when you
ask
for a field. And, just transferring it seems strange that it will
cause
that big of a time difference. How big is the document that you index
(with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If
so,
then I
don't really understand this big difference, since when you ask
for
specific fields, it still gets loaded and parsed (with the
"large"
field),
and if you don't, it just a matter of passing the _source as is
through
the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse =
searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

All fields but 'large' field (obtained by running against
an old
index without this in):

INFO: QUERY TIME --->>>366ms

One field - obtained from the index with the 'large' field
in,
but
just asking for one field using
"searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is
causing
the
problem, rather than the search.

In that case, do you think the above approach will be a good
way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");
        or
searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your
document? I
suggest
you
first understand why its slower, for example, by asking for
empty
fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 <
davirobe...@gmail.com>
wrote:

Having implemented the solution using enabled=false on a
normal
object
type - I find that my query time has increased massively,
probably
because the queries I execute almost always access the
"source"
object, which now has to deserialize additional very big
objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked
away
when
doing operations against the index (like searching with a
match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include
this
object (using a checkbox option) by pushing the large
HashMap
into
a
nested property where, as I understand it, nested objects
are
only
de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" :
{"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" :
{"type"
:
"object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by
making a
Nested
TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

Will this work?

Will a query against another nested object also cause
this
nested
object to be included in the source, or do the different
nested
objects act indepently?

David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the
object
level
property of it with enabled set to false, which means it
will not
even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 <
davirobe...@gmail.com>
wrote:

Hi,

I would like to store a large hashmap in my index
without
mapping
it
but having available in the _source.
Essentially I want to use Elasticsearch as an object
store,
where I
can associate a hashmap with an index id rather than
make a
call
out
to Mongo DB, and the like, using the id as a reference.

-David.