Doc(), source() and "stored" property


(Vadim Voituk) #1

Hello,

I have an index of middle-size documents (100Kb of index size/document).
Each of these document have number of "filterable" values. Let's call it
"rules list".
Almost each query doing filtering using this "rules list".
The filtering is implemented with native script.
Also the fields of "rules list" marked as "stored".

So, in this particular case it's better to have these rules in document,
but not in "source", and even not to load "source" into memory during
filtering.

Since i've marked "rules" fields as stored, i'm expecting these values
should be available in native script in a way:

doc().field("rules")

But it's not. And "rules" are available only via

source().get("rules")

I guess in this case the source should be loaded and parsed on a filtering
phase, and it's not effective at all.

Should it works as i expected or i'm doing wrong assumption?


(Shay Banon) #2

When you call doc(), it will go to the field cache, it means that the
indexed terms for the field are loaded into memory and provided. When you
use source(), it will load the source and parse it, when you use fields(),
it will load a stored field . Loading from the index (either specifically
stored fields or source) is slow, its recommended you use the doc part.

On Tue, May 29, 2012 at 6:14 PM, Vadim Voituk vadim.voituk@gmail.comwrote:

Hello,

I have an index of middle-size documents (100Kb of index size/document).
Each of these document have number of "filterable" values. Let's call it
"rules list".
Almost each query doing filtering using this "rules list".
The filtering is implemented with native script.
Also the fields of "rules list" marked as "stored".

So, in this particular case it's better to have these rules in document,
but not in "source", and even not to load "source" into memory during
filtering.

Since i've marked "rules" fields as stored, i'm expecting these values
should be available in native script in a way:

doc().field("rules")

But it's not. And "rules" are available only via

source().get("rules")

I guess in this case the source should be loaded and parsed on a filtering
phase, and it's not effective at all.

Should it works as i expected or i'm doing wrong assumption?


(Vadim Voituk) #3

Hello, Shay

Thanks for your answer, that's exactly what i want to do.
But the problem here, that the value i need is not present in doc().

I'm getting this exception when trying to access "stored" field via
doc().field("variants") or doc().get("variants")
("Variants" - it's a name for "rules list" from my initial question)

Caused by: org.elasticsearch.ElasticSearchIllegalArgumentException: No
field found for [variants]
at org.elasticsearch.search.lookup.DocLookup.get(DocLookup.java:110)
at
org.elasticsearch.search.lookup.DocLookup.field(DocLookup.java:87)
at
com.voituk.VariantsFilterScriptFactory$VariantsFilterScript.run(VariantsFilterScriptFactory.java:57)

And here is my _mapping for "variants" field (it's collection).

{
"properties": {
"variants": {
"properties": {
"id": {"type": "integer"},
"filter1": {"type": "integer","store": "yes","index":
"not_analyzed"},
"filter2": {"type": "integer","store": "yes","index":
"not_analyzed"},
"filter3": {"type": "string","store": "yes","index":
"not_analyzed"},
"filter4": {"type": "integer","store": "yes","index":
"not_analyzed"}
}
}
}
}
}

Why it's not available via doc()? What i'm doing wrong here?

--
Vadim

On Wednesday, May 30, 2012 10:51:48 AM UTC+2, kimchy wrote:

When you call doc(), it will go to the field cache, it means that the
indexed terms for the field are loaded into memory and provided. When you
use source(), it will load the source and parse it, when you use fields(),
it will load a stored field . Loading from the index (either specifically
stored fields or source) is slow, its recommended you use the doc part.


(Shay Banon) #4

variants is an object, you can only access specific fields in doc() (from
field cache), or fields (from specific stored fields), something like
variants.id.

On Wed, May 30, 2012 at 11:09 AM, Vadim Voituk vadim.voituk@gmail.comwrote:

Hello, Shay

Thanks for your answer, that's exactly what i want to do.
But the problem here, that the value i need is not present in doc().

I'm getting this exception when trying to access "stored" field via
doc().field("variants") or doc().get("variants")
("Variants" - it's a name for "rules list" from my initial question)

Caused by: org.elasticsearch.ElasticSearchIllegalArgumentException: No
field found for [variants]
at
org.elasticsearch.search.lookup.DocLookup.get(DocLookup.java:110)
at
org.elasticsearch.search.lookup.DocLookup.field(DocLookup.java:87)
at
com.voituk.VariantsFilterScriptFactory$VariantsFilterScript.run(VariantsFilterScriptFactory.java:57)

And here is my _mapping for "variants" field (it's collection).

{
"properties": {
"variants": {
"properties": {
"id": {"type": "integer"},
"filter1": {"type": "integer","store": "yes","index":
"not_analyzed"},
"filter2": {"type": "integer","store": "yes","index":
"not_analyzed"},
"filter3": {"type": "string","store": "yes","index":
"not_analyzed"},
"filter4": {"type": "integer","store": "yes","index":
"not_analyzed"}
}
}
}
}
}

Why it's not available via doc()? What i'm doing wrong here?

--
Vadim

On Wednesday, May 30, 2012 10:51:48 AM UTC+2, kimchy wrote:

When you call doc(), it will go to the field cache, it means that the
indexed terms for the field are loaded into memory and provided. When you
use source(), it will load the source and parse it, when you use fields(),
it will load a stored field . Loading from the index (either specifically
stored fields or source) is slow, its recommended you use the doc part.


(Vadim Voituk) #5

On Wednesday, May 30, 2012 10:51:48 AM UTC+2, kimchy wrote:

When you call doc(), it will go to the field cache, it means that the
indexed terms for the field are loaded into memory and provided. When you
use source(), it will load the source and parse it, when you use fields(),
it will load a stored field . Loading from the index (either specifically
stored fields or source) is slow, its recommended you use the doc part.

Thanks, Shanon, much more clear for me.

To be precise, the "variants" it's not an object, but the list/array of
objects.
Something like:

{
//... other fields ...
"variants": [
{
title: "Variant #1",
filter1: "... some value of filter #1",
filter2: "... some value of filter #2",
filter3: "... some value of filter #3",
},

{
  title: "Variant #2",
  filter1: "... another value of filter #1",
  filter2: "... another value of filter #2",
  filter3: "... another value of filter #3",
},
// more variants there...

],
//... other fields ...
}

And as i got from my ES-core investigations, if i'll do the "request" like

doc().fields("variants.filter1") 

i'll get the list of all unique values of "filter1" field within "variants"
array.

So the question is - is it possible to make this list of objects to be
accessible without loading and parsing of entire document source?

Looking forward for any feedback or thoughts about this.

Thanks.


(Vadim Voituk) #6

Sorry for popping this up this, but i'm afraid that my last question was
lost in a huge description :slight_smile:

Here it is:
Is it possible to make the list of document's sub-objects stored together
with index (or load in memory) to use in inside native filter without
loading and parsing of entire source object.

Here is how it looks now in native script Java code:

@Object public Object run()
final List vars = (List) source().get("variants");
for(Map var: vars) {
// Navigate through and process filters one by one
}

The problem here is source().get("variants"); call - it's very slow
because the "source" itself is about 100Kb.

Any suggestions how to put "variants" into doc() or memory?

On Monday, June 4, 2012 10:20:33 AM UTC+2, Vadim Voituk wrote:

On Wednesday, May 30, 2012 10:51:48 AM UTC+2, kimchy wrote:

When you call doc(), it will go to the field cache, it means that the
indexed terms for the field are loaded into memory and provided. When you
use source(), it will load the source and parse it, when you use fields(),
it will load a stored field . Loading from the index (either specifically
stored fields or source) is slow, its recommended you use the doc part.

Thanks, Shanon, much more clear for me.

To be precise, the "variants" it's not an object, but the list/array of
objects.
Something like:

{
//... other fields ...
"variants": [
{
title: "Variant #1",
filter1: "... some value of filter #1",
filter2: "... some value of filter #2",
filter3: "... some value of filter #3",
},

{
  title: "Variant #2",
  filter1: "... another value of filter #1",
  filter2: "... another value of filter #2",
  filter3: "... another value of filter #3",
},
// more variants there...

],
//... other fields ...
}

And as i got from my ES-core investigations, if i'll do the "request" like

doc().fields("variants.filter1") 

i'll get the list of all unique values of "filter1" field within
"variants" array.

So the question is - is it possible to make this list of objects to be
accessible without loading and parsing of entire document source?

Looking forward for any feedback or thoughts about this.

Thanks.


(oreno) #7

Hi Vadim,
Any chance you found a solution for this issue? improving source() fetch performance that is.
I Also need to retrieve the objects from the source() method, since that way the objects are returning at
their initial structure and I'm able to iterate and distinguishing between these objects's fields, instead of getting all their combined fields at once.
The problem is that it has a bad performance as explained above.

Any news?

Thanks in advanced,

Oren


(Alex Roytman) #8

Hi Vadim,

I am not an elastic expert and do not have a solition for you but maybe a workaround.
What if you store your rules as json encoded string and make not analyzed stored field out of it. You will need to parse it inside of yyour script call but you will not need to load your source

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(oreno) #9

Hi Alex,
I will try about anything that might improve the performance the source() fetching at this point. Do you have an idea of how this can be configured?

Thanks,

Oren

נשלח מה-iPhone שלי

ב-Jul 6, 2013, בשעה 2:59 PM, "AlexR [via ElasticSearch Users]" ml-node+s115913n4037645h30@n3.nabble.com כתב/ה:

Hi Vadim,

I am not an elastic expert and do not have a solition for you but maybe a workaround.
What if you store your rules as json encoded string and make not analyzed stored field out of it. You will need to parse it inside of yyour script call but you will not need to load your source

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/doc-source-and-stored-property-tp4018466p4037645.html
To unsubscribe from doc(), source() and "stored" property, click here.
NAML


(paolociccarese) #10

Hi Vadim,
I am creating a native script and I am dealing with the same scenario (list of objects) and, while I can access the properties fine with source(), I can get it to work with doc().

I was wondering if you found a way to make doc() work.
Anything you can share?

Thank you,
Paolo


(system) #11