Store Large Object in Index without Mapping


(davrob) #1

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use ElasticSearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.


(Shay Banon) #2

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 daviroberts@gmail.com wrote:

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use ElasticSearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.


(davrob) #3

Thanks Shay, that works great.

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com wrote:

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use ElasticSearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.


(davrob) #4

Having implemented the solution using enabled=false on a normal object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into a
nested property where, as I understand it, nested objects are only de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" : "string", "index" : "not_analyzed"},
"weekViewMap" : {"type" : "object", "enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a Nested
TermFilter search, where nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

  1. Will this work?
  2. Will a query against another nested object also cause this nested
    object to be included in the source, or do the different nested
    objects act indepently?
  • David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com wrote:

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use ElasticSearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.


(Shay Banon) #5

You mean the object is new that you now add to your document? I suggest you
first understand why its slower, for example, by asking for empty fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 daviroberts@gmail.com wrote:

Having implemented the solution using enabled=false on a normal object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into a
nested property where, as I understand it, nested objects are only de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" : "object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a Nested
TermFilter search, where nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

  1. Will this work?
  2. Will a query against another nested object also cause this nested
    object to be included in the source, or do the different nested
    objects act indepently?
  • David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com wrote:

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use ElasticSearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.


(davrob) #6

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

  1. All fields but 'large' field (obtained by running against an old
    index without this in):

    INFO: QUERY TIME --->>>366ms

  2. One field - obtained from the index with the 'large' field in, but
    just asking for one field using "searchReq.addFields("person_name");"

    INFO: QUERY TIME --->>>344ms

  3. All fields including 'large' field:

    INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");

         or

searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I suggest you
first understand why its slower, for example, by asking for empty fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com wrote:

Having implemented the solution using enabled=false on a normal object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into a
nested property where, as I understand it, nested objects are only de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" : "object",
"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a Nested
TermFilter search, where nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

  1. Will this work?
  2. Will a query against another nested object also cause this nested
    object to be included in the source, or do the different nested
    objects act indepently?
  • David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not even go
and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com wrote:

Hi,

I would like to store a large hashmap in my index without mapping it
but having available in the _source.
Essentially I want to use ElasticSearch as an object store, where I
can associate a hashmap with an index id rather than make a call out
to Mongo DB, and the like, using the id as a reference.

-David.


(Shay Banon) #7

Do you rely on _source or do you specifically store fields? If so, then I
don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large" field),
and if you don't, it just a matter of passing the _source as is through the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 daviroberts@gmail.com wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

  1. All fields but 'large' field (obtained by running against an old
    index without this in):

    INFO: QUERY TIME --->>>366ms

  2. One field - obtained from the index with the 'large' field in, but
    just asking for one field using "searchReq.addFields("person_name");"

    INFO: QUERY TIME --->>>344ms

  3. All fields including 'large' field:

    INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");

        or

searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I suggest
you
first understand why its slower, for example, by asking for empty fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com wrote:

Having implemented the solution using enabled=false on a normal object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into a
nested property where, as I understand it, nested objects are only de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" :
"object",

"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a Nested
TermFilter search, where nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

  1. Will this work?
  2. Will a query against another nested object also cause this nested
    object to be included in the source, or do the different nested
    objects act indepently?
  • David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not
even go

and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com
wrote:

Hi,

I would like to store a large hashmap in my index without mapping
it

but having available in the _source.
Essentially I want to use ElasticSearch as an object store, where I
can associate a hashmap with an index id rather than make a call
out

to Mongo DB, and the like, using the id as a reference.

-David.


(davrob) #8

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
 Map<String, Object> source = hit.getSource();
     if (source == null){
    source = new HashMap<String, Object>();
    for (String responseField :

searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

  • David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If so, then I
don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large" field),
and if you don't, it just a matter of passing the _source as is through the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

  1. All fields but 'large' field (obtained by running against an old
    index without this in):

INFO: QUERY TIME --->>>366ms

  1. One field - obtained from the index with the 'large' field in, but
    just asking for one field using "searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

  1. All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");

        or

searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I suggest
you
first understand why its slower, for example, by asking for empty fields,
or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com wrote:

Having implemented the solution using enabled=false on a normal object
type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into a
nested property where, as I understand it, nested objects are only de-
serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" :
"object",

"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a Nested
TermFilter search, where nestedObject.contstantField="ALL-OBJECTS-HAVE-
THIS".

Questions

  1. Will this work?
  2. Will a query against another nested object also cause this nested
    object to be included in the source, or do the different nested
    objects act indepently?
  • David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object level
property of it with enabled set to false, which means it will not
even go

and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com
wrote:

Hi,

I would like to store a large hashmap in my index without mapping
it

but having available in the _source.
Essentially I want to use ElasticSearch as an object store, where I
can associate a hashmap with an index id rather than make a call
out

to Mongo DB, and the like, using the id as a reference.

-David.


(Shay Banon) #9

Right, but it still gets loaded and extracted, the _source, when you ask
for a field. And, just transferring it seems strange that it will cause
that big of a time difference. How big is the document that you index (with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 daviroberts@gmail.com wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

  • David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If so, then I
don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large"
field),
and if you don't, it just a matter of passing the _source as is through
the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

  1. All fields but 'large' field (obtained by running against an old
    index without this in):

INFO: QUERY TIME --->>>366ms

  1. One field - obtained from the index with the 'large' field in, but
    just asking for one field using "searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

  1. All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");

        or

searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I
suggest

you

first understand why its slower, for example, by asking for empty
fields,

or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com
wrote:

Having implemented the solution using enabled=false on a normal
object

type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into
a

nested property where, as I understand it, nested objects are only
de-

serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" :
"object",

"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a
Nested

TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-

THIS".

Questions

  1. Will this work?
  2. Will a query against another nested object also cause this
    nested

object to be included in the source, or do the different nested
objects act indepently?

  • David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object
level

property of it with enabled set to false, which means it will not
even go

and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com
wrote:

Hi,

I would like to store a large hashmap in my index without
mapping

it

but having available in the _source.
Essentially I want to use ElasticSearch as an object store,
where I

can associate a hashmap with an index id rather than make a
call

out

to Mongo DB, and the like, using the id as a reference.

-David.


(davrob) #10

Hi Shay,

Well, yes, I think your instinct is right, the difference in the files
is only very small, here's an example:

The Average size of Contact document is 1327 characters, while the
average size of the contacts with the added hashmap is 1699
characters.

So, I'm not sure what is causing the issue, I do have a map within the
map, but I'd have thought the JSON serialization API would have no
real problem with that.

The work around I have in mind at the moment is to have a "fat" type
"fatcontact" and a skinny one "contact" that are identical apart from
the embedded HashMap. If users want to search on the embedded map I
switch types to search on "fatcontact" type - but this is a pretty
ugly work-around, because I index everything twice.

-David.

On Nov 28, 11:23 am, Shay Banon kim...@gmail.com wrote:

Right, but it still gets loaded and extracted, the _source, when you ask
for a field. And, just transferring it seems strange that it will cause
that big of a time difference. How big is the document that you index (with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

  • David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If so, then I
don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large"
field),
and if you don't, it just a matter of passing the _source as is through
the
wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

  1. All fields but 'large' field (obtained by running against an old
    index without this in):

INFO: QUERY TIME --->>>366ms

  1. One field - obtained from the index with the 'large' field in, but
    just asking for one field using "searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

  1. All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing the
problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");

        or

searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I
suggest

you

first understand why its slower, for example, by asking for empty
fields,

or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com
wrote:

Having implemented the solution using enabled=false on a normal
object

type - I find that my query time has increased massively, probably
because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away when
doing operations against the index (like searching with a match_all
query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include this
object (using a checkbox option) by pushing the large HashMap into
a

nested property where, as I understand it, nested objects are only
de-

serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" : {"type" :
"string", "index" : "not_analyzed"},
"weekViewMap" : {"type" :
"object",

"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a
Nested

TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-

THIS".

Questions

  1. Will this work?
  2. Will a query against another nested object also cause this
    nested

object to be included in the source, or do the different nested
objects act indepently?

  • David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object
level

property of it with enabled set to false, which means it will not
even go

and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 davirobe...@gmail.com
wrote:

Hi,

I would like to store a large hashmap in my index without
mapping

it

but having available in the _source.
Essentially I want to use ElasticSearch as an object store,
where I

can associate a hashmap with an index id rather than make a
call

out

to Mongo DB, and the like, using the id as a reference.

-David.


(Shay Banon) #11

If thats the case, then its really strange. Can you post a recreation that
shows this (can be Java code, with a simple "main class" repro)?

On Mon, Nov 28, 2011 at 6:10 PM, davrob2 daviroberts@gmail.com wrote:

Hi Shay,

Well, yes, I think your instinct is right, the difference in the files
is only very small, here's an example:

https://gist.github.com/1400795

The Average size of Contact document is 1327 characters, while the
average size of the contacts with the added hashmap is 1699
characters.

So, I'm not sure what is causing the issue, I do have a map within the
map, but I'd have thought the JSON serialization API would have no
real problem with that.

The work around I have in mind at the moment is to have a "fat" type
"fatcontact" and a skinny one "contact" that are identical apart from
the embedded HashMap. If users want to search on the embedded map I
switch types to search on "fatcontact" type - but this is a pretty
ugly work-around, because I index everything twice.

-David.

On Nov 28, 11:23 am, Shay Banon kim...@gmail.com wrote:

Right, but it still gets loaded and extracted, the _source, when you ask
for a field. And, just transferring it seems strange that it will cause
that big of a time difference. How big is the document that you index
(with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

  • David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If so,
then I

don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large"
field),
and if you don't, it just a matter of passing the _source as is
through

the

wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

  1. All fields but 'large' field (obtained by running against an old
    index without this in):

INFO: QUERY TIME --->>>366ms

  1. One field - obtained from the index with the 'large' field in,
    but

just asking for one field using
"searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

  1. All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing
the

problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");

        or

searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I
suggest

you

first understand why its slower, for example, by asking for empty
fields,

or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com
wrote:

Having implemented the solution using enabled=false on a normal
object

type - I find that my query time has increased massively,
probably

because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big
objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away
when

doing operations against the index (like searching with a
match_all

query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include
this

object (using a checkbox option) by pushing the large HashMap
into

a

nested property where, as I understand it, nested objects are
only

de-

serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" :
{"type" :

"string", "index" : "not_analyzed"},
"weekViewMap" : {"type"
:

"object",

"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a
Nested

TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-

THIS".

Questions

  1. Will this work?
  2. Will a query against another nested object also cause this
    nested

object to be included in the source, or do the different nested
objects act indepently?

  • David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object
level

property of it with enabled set to false, which means it
will not

even go

and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 <
davirobe...@gmail.com>

wrote:

Hi,

I would like to store a large hashmap in my index without
mapping

it

but having available in the _source.
Essentially I want to use ElasticSearch as an object store,
where I

can associate a hashmap with an index id rather than make a
call

out

to Mongo DB, and the like, using the id as a reference.

-David.


(davrob) #12

Hi Shay,

Do you have an example Gist of a Java based recreation, it would be
helpful to copy.

  • David.

On Nov 28, 4:27 pm, Shay Banon kim...@gmail.com wrote:

If thats the case, then its really strange. Can you post a recreation that
shows this (can be Java code, with a simple "main class" repro)?

On Mon, Nov 28, 2011 at 6:10 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Well, yes, I think your instinct is right, the difference in the files
is only very small, here's an example:

https://gist.github.com/1400795

The Average size of Contact document is 1327 characters, while the
average size of the contacts with the added hashmap is 1699
characters.

So, I'm not sure what is causing the issue, I do have a map within the
map, but I'd have thought the JSON serialization API would have no
real problem with that.

The work around I have in mind at the moment is to have a "fat" type
"fatcontact" and a skinny one "contact" that are identical apart from
the embedded HashMap. If users want to search on the embedded map I
switch types to search on "fatcontact" type - but this is a pretty
ugly work-around, because I index everything twice.

-David.

On Nov 28, 11:23 am, Shay Banon kim...@gmail.com wrote:

Right, but it still gets loaded and extracted, the _source, when you ask
for a field. And, just transferring it seems strange that it will cause
that big of a time difference. How big is the document that you index
(with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

  • David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If so,
then I

don't really understand this big difference, since when you ask for
specific fields, it still gets loaded and parsed (with the "large"
field),
and if you don't, it just a matter of passing the _source as is
through

the

wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse = searchReq.execute().actionGet();
timer.endTimer("QUERY TIME --->>>");

Stats

  1. All fields but 'large' field (obtained by running against an old
    index without this in):

INFO: QUERY TIME --->>>366ms

  1. One field - obtained from the index with the 'large' field in,
    but

just asking for one field using
"searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

  1. All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is causing
the

problem, rather than the search.

In that case, do you think the above approach will be a good way of
stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");

        or

searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your document? I
suggest

you

first understand why its slower, for example, by asking for empty
fields,

or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 davirobe...@gmail.com
wrote:

Having implemented the solution using enabled=false on a normal
object

type - I find that my query time has increased massively,
probably

because the queries I execute almost always access the "source"
object, which now has to deserialize additional very big
objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked away
when

doing operations against the index (like searching with a
match_all

query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include
this

object (using a checkbox option) by pushing the large HashMap
into

a

nested property where, as I understand it, nested objects are
only

de-

serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" :
{"type" :

"string", "index" : "not_analyzed"},
"weekViewMap" : {"type"
:

"object",

"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by making a
Nested

TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-

THIS".

Questions

  1. Will this work?
  2. Will a query against another nested object also cause this
    nested

object to be included in the source, or do the different nested
objects act indepently?

  • David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the object
level

property of it with enabled set to false, which means it
will not

even go

and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 <
davirobe...@gmail.com>

wrote:

Hi,

I would like to store a large hashmap in my index without
mapping

it

but having available in the _source.
Essentially I want to use ElasticSearch as an object store,
where I

can associate a hashmap with an index id rather than make a
call

out

to Mongo DB, and the like, using the id as a reference.

-David.


(Shay Banon) #13

Not one offhand, but it should be simple. Create a simple main class, start
a client in it, index a sample doc(s), and then execute the search requests
that show the problem, where one takes considerable more time than the
others.

On Mon, Nov 28, 2011 at 9:15 PM, davrob2 daviroberts@gmail.com wrote:

Hi Shay,

Do you have an example Gist of a Java based recreation, it would be
helpful to copy.

  • David.

On Nov 28, 4:27 pm, Shay Banon kim...@gmail.com wrote:

If thats the case, then its really strange. Can you post a recreation
that
shows this (can be Java code, with a simple "main class" repro)?

On Mon, Nov 28, 2011 at 6:10 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Well, yes, I think your instinct is right, the difference in the files
is only very small, here's an example:

https://gist.github.com/1400795

The Average size of Contact document is 1327 characters, while the
average size of the contacts with the added hashmap is 1699
characters.

So, I'm not sure what is causing the issue, I do have a map within the
map, but I'd have thought the JSON serialization API would have no
real problem with that.

The work around I have in mind at the moment is to have a "fat" type
"fatcontact" and a skinny one "contact" that are identical apart from
the embedded HashMap. If users want to search on the embedded map I
switch types to search on "fatcontact" type - but this is a pretty
ugly work-around, because I index everything twice.

-David.

On Nov 28, 11:23 am, Shay Banon kim...@gmail.com wrote:

Right, but it still gets loaded and extracted, the _source, when you
ask

for a field. And, just transferring it seems strange that it will
cause

that big of a time difference. How big is the document that you index
(with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

  • David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If
so,

then I

don't really understand this big difference, since when you ask
for

specific fields, it still gets loaded and parsed (with the
"large"

field),

and if you don't, it just a matter of passing the _source as is
through

the

wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse =
searchReq.execute().actionGet();

timer.endTimer("QUERY TIME --->>>");

Stats

  1. All fields but 'large' field (obtained by running against
    an old

index without this in):

INFO: QUERY TIME --->>>366ms

  1. One field - obtained from the index with the 'large' field
    in,

but

just asking for one field using
"searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

  1. All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is
causing

the

problem, rather than the search.

In that case, do you think the above approach will be a good
way of

stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");

        or

searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your
document? I

suggest

you

first understand why its slower, for example, by asking for
empty

fields,

or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 <
davirobe...@gmail.com>

wrote:

Having implemented the solution using enabled=false on a
normal

object

type - I find that my query time has increased massively,
probably

because the queries I execute almost always access the
"source"

object, which now has to deserialize additional very big
objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked
away

when

doing operations against the index (like searching with a
match_all

query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include
this

object (using a checkbox option) by pushing the large
HashMap

into

a

nested property where, as I understand it, nested objects
are

only

de-

serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" :
{"type" :

"string", "index" : "not_analyzed"},
"weekViewMap" :
{"type"

:

"object",

"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by
making a

Nested

TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-

THIS".

Questions

  1. Will this work?
  2. Will a query against another nested object also cause
    this

nested

object to be included in the source, or do the different
nested

objects act indepently?

  • David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the
object

level

property of it with enabled set to false, which means it
will not

even go

and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 <
davirobe...@gmail.com>

wrote:

Hi,

I would like to store a large hashmap in my index
without

mapping

it

but having available in the _source.
Essentially I want to use ElasticSearch as an object
store,

where I

can associate a hashmap with an index id rather than
make a

call

out

to Mongo DB, and the like, using the id as a reference.

-David.


(davrob) #14

Hi Shay,

So I ran my recreation here: https://gist.github.com/1409236

And its entirely my own stupidity - I was measuring the network
latency to New York from my desktop, as opposed to running the client
and server both in New York, or both locally.

Thanks for your patience, hopefully other java users will find the
BaseClass in the recreation usefull for their own work.

  • David.

On Nov 28, 8:48 pm, Shay Banon kim...@gmail.com wrote:

Not one offhand, but it should be simple. Create a simple main class, start
a client in it, index a sample doc(s), and then execute the search requests
that show the problem, where one takes considerable more time than the
others.

On Mon, Nov 28, 2011 at 9:15 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Do you have an example Gist of a Java based recreation, it would be
helpful to copy.

  • David.

On Nov 28, 4:27 pm, Shay Banon kim...@gmail.com wrote:

If thats the case, then its really strange. Can you post a recreation
that
shows this (can be Java code, with a simple "main class" repro)?

On Mon, Nov 28, 2011 at 6:10 PM, davrob2 davirobe...@gmail.com wrote:

Hi Shay,

Well, yes, I think your instinct is right, the difference in the files
is only very small, here's an example:

https://gist.github.com/1400795

The Average size of Contact document is 1327 characters, while the
average size of the contacts with the added hashmap is 1699
characters.

So, I'm not sure what is causing the issue, I do have a map within the
map, but I'd have thought the JSON serialization API would have no
real problem with that.

The work around I have in mind at the moment is to have a "fat" type
"fatcontact" and a skinny one "contact" that are identical apart from
the embedded HashMap. If users want to search on the embedded map I
switch types to search on "fatcontact" type - but this is a pretty
ugly work-around, because I index everything twice.

-David.

On Nov 28, 11:23 am, Shay Banon kim...@gmail.com wrote:

Right, but it still gets loaded and extracted, the _source, when you
ask

for a field. And, just transferring it seems strange that it will
cause

that big of a time difference. How big is the document that you index
(with
that _map)?

On Mon, Nov 28, 2011 at 12:06 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

Yes, I do rely on _source usually, but when I specify the field to
search using "searchReq.addFields("person_name");" then the source
returned is null and I just get the individual field as below:

for (SearchHit hit : queryResponse.getHits()){
Map<String, Object> source = hit.getSource();
if (source == null){
source = new HashMap<String, Object>();
for (String responseField :
searchBean.getSearch().getResponseFields()){
SearchHitField shf = hit.field(responseField);
if (shf != null){
source.put(responseField, shf.getValue());
}
}
}

  • David

On Nov 27, 3:20 pm, Shay Banon kim...@gmail.com wrote:

Do you rely on _source or do you specifically store fields? If
so,

then I

don't really understand this big difference, since when you ask
for

specific fields, it still gets loaded and parsed (with the
"large"

field),

and if you don't, it just a matter of passing the _source as is
through

the

wire.

On Fri, Nov 25, 2011 at 1:08 PM, davrob2 davirobe...@gmail.com
wrote:

Hi Shay,

So these are the stats for this operation:

Timer timer = Timer.startTimer();
SearchResponse searchResponse =
searchReq.execute().actionGet();

timer.endTimer("QUERY TIME --->>>");

Stats

  1. All fields but 'large' field (obtained by running against
    an old

index without this in):

INFO: QUERY TIME --->>>366ms

  1. One field - obtained from the index with the 'large' field
    in,

but

just asking for one field using
"searchReq.addFields("person_name");"

INFO: QUERY TIME --->>>344ms

  1. All fields including 'large' field:

INFO: QUERY TIME --->>>2922ms

I think that proves that it is the serialization that is
causing

the

problem, rather than the search.

In that case, do you think the above approach will be a good
way of

stoping that object being serilized by default?

Or, a good alternative would be to have a method on
SearchRequestBuilder like:

searchReq.removeFields("weekViewMap");

        or

searchReq.doNotSerializeFields("weekViewMap");

For the default (fields = null), which pulls back the source.

-David

On Nov 24, 6:26 pm, Shay Banon kim...@gmail.com wrote:

You mean the object is new that you now add to your
document? I

suggest

you

first understand why its slower, for example, by asking for
empty

fields,

or even _source.

On Thu, Nov 24, 2011 at 7:08 PM, davrob2 <
davirobe...@gmail.com>

wrote:

Having implemented the solution using enabled=false on a
normal

object

type - I find that my query time has increased massively,
probably

because the queries I execute almost always access the
"source"

object, which now has to deserialize additional very big
objects.

From the Nested Object Documentation

"Those internal nested documents are automatically masked
away

when

doing operations against the index (like searching with a
match_all

query), and they bubble out when using the nested query."

So, my idea is, to offer users the functionality to include
this

object (using a checkbox option) by pushing the large
HashMap

into

a

nested property where, as I understand it, nested objects
are

only

de-

serialized if a query is made against them.

My Ideas is to define the mapping, as follows:

{
"MainType" : {
"properties" : {
"nestedObject" : {
"type" : "nested",
"properties" : {
"constantField" :
{"type" :

"string", "index" : "not_analyzed"},
"weekViewMap" :
{"type"

:

"object",

"enabled" : false}
}
}
}
}
}

So, I would "turn on" the weekView serialization, by
making a

Nested

TermFilter search, where
nestedObject.contstantField="ALL-OBJECTS-HAVE-

THIS".

Questions

  1. Will this work?
  2. Will a query against another nested object also cause
    this

nested

object to be included in the source, or do the different
nested

objects act indepently?

  • David

On Nov 22, 8:34 am, Shay Banon kim...@gmail.com wrote:

You mean not index that big "hashmap"? You can map the
object

level

property of it with enabled set to false, which means it
will not

even go

and try and map the object represented by it.

On Mon, Nov 21, 2011 at 9:34 PM, davrob2 <
davirobe...@gmail.com>

wrote:

Hi,

I would like to store a large hashmap in my index
without

mapping

it

but having available in the _source.
Essentially I want to use ElasticSearch as an object
store,

where I

can associate a hashmap with an index id rather than
make a

call

out

to Mongo DB, and the like, using the id as a reference.

-David.


(system) #15