Parent/child query performance


(Mustafa Sener) #1

Hi,

We test parent/child query and filter for one of our requirements. We have
100K parent object and about 20 M child objects which belongs to these
parent objects. A thread is adding more document as bulks continuously. In
this case my parent child query requests sent by TransportClient return with
timeout exception. I think this is because of existence of parent and child
objects on same index and continuously updating of child object type. It
hurts search performance of parent objects. is this comment correct? Do you
have any other suggestions to increase query performance?

I think it would be much better if we have a IN query instead of
parent/child query. In this way we can separate child index from parent
index and we can boost the performance of search operation done on parent
object. Currently we define hasChild query or filter as following using JAVA
api:

def filter = FilterBuilders.hasChildFilter("childobjecttype",
QueryBuilders.....)
def query = QueryBuilders.hasChildQuery("childobjecttype",
QueryBuilders.....)

we can modify this by just adding index as following:

def filter = FilterBuilders.hasChildFilter("childindex", "childobjecttype",
QueryBuilders.....)
def query = QueryBuilders.hasChildQuery("childindex", "childobjecttype",
QueryBuilders.....)

or index may not be mandatory and if it is not specified we can take the
index of parent type:

def filter = FilterBuilders.hasChildFilter("childobjecttype",
QueryBuilders.....)
filter.setIndex("childindex");
def query = QueryBuilders.hasChildQuery("childobjecttype",
QueryBuilders.....)

query*.setIndex("childindex");*

Or since one of the benefits of using parent/child mechanism is putting
parent and child data on same shard, we can leave as it is. We can give a
new name to this type of filter/queries as infilter/inQuery. This filter can
join two types as parent/child mechanism. It can be assumed that id of
related type will be written in any of properties of other type. We can
define filter as:

def filter = FilterBuilders.inFilter("type1index", "type1",
"relatedObjectIdPropetyInChild", QueryBuilders.....)
def query = QueryBuilders.inQuery("type1index", "type1",
"relatedObjectIdPropetyInChild",
QueryBuilders.....**)

*

*Is this meaningful? *
*

Mustafa Sener
www.ifountain.com
WebRep
Overall rating


(Mustafa Sener) #2

I want to add something related with parent/child mechanism. When I run
child query passed to hasChildFilter or hasChildQuery methods alone, its
performance does not decrease. I can get a response for a search in 5 msecs.
However, when I use same query in hasChildQuery or hasChildFilter, its
performance decreases and I start to get timeouts.

On Fri, Apr 29, 2011 at 10:49 AM, Mustafa Sener mustafa.sener@gmail.comwrote:

Hi,

We test parent/child query and filter for one of our requirements. We have
100K parent object and about 20 M child objects which belongs to these
parent objects. A thread is adding more document as bulks continuously. In
this case my parent child query requests sent by TransportClient return with
timeout exception. I think this is because of existence of parent and child
objects on same index and continuously updating of child object type. It
hurts search performance of parent objects. is this comment correct? Do you
have any other suggestions to increase query performance?

I think it would be much better if we have a IN query instead of
parent/child query. In this way we can separate child index from parent
index and we can boost the performance of search operation done on parent
object. Currently we define hasChild query or filter as following using JAVA
api:

def filter = FilterBuilders.hasChildFilter("childobjecttype",
QueryBuilders.....)
def query = QueryBuilders.hasChildQuery("childobjecttype",
QueryBuilders.....)

we can modify this by just adding index as following:

def filter = FilterBuilders.hasChildFilter("childindex",
"childobjecttype", QueryBuilders.....)
def query = QueryBuilders.hasChildQuery("childindex", "childobjecttype",
QueryBuilders.....)

or index may not be mandatory and if it is not specified we can take the
index of parent type:

def filter = FilterBuilders.hasChildFilter("childobjecttype",
QueryBuilders.....)
filter.setIndex("childindex");
def query = QueryBuilders.hasChildQuery("childobjecttype",
QueryBuilders.....)

query*.setIndex("childindex");*

Or since one of the benefits of using parent/child mechanism is putting
parent and child data on same shard, we can leave as it is. We can give a
new name to this type of filter/queries as infilter/inQuery. This filter can
join two types as parent/child mechanism. It can be assumed that id of
related type will be written in any of properties of other type. We can
define filter as:

def filter = FilterBuilders.inFilter("type1index", "type1",
"relatedObjectIdPropetyInChild", QueryBuilders.....)
def query = QueryBuilders.inQuery("type1index", "type1",
"relatedObjectIdPropetyInChild",
QueryBuilders.....**)

*

*Is this meaningful? *
*

Mustafa Sener
www.ifountain.com
WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com
WebRep
Overall rating

WebRep
Overall rating


(Mustafa Sener) #3

One last thing I performed my tests on both v0.15.2 and v0.16.0. They both
behave similarly.

On Fri, Apr 29, 2011 at 3:03 PM, Mustafa Sener mustafa.sener@gmail.comwrote:

I want to add something related with parent/child mechanism. When I run
child query passed to hasChildFilter or hasChildQuery methods alone, its
performance does not decrease. I can get a response for a search in 5 msecs.
However, when I use same query in hasChildQuery or hasChildFilter, its
performance decreases and I start to get timeouts.

On Fri, Apr 29, 2011 at 10:49 AM, Mustafa Sener mustafa.sener@gmail.comwrote:

Hi,

We test parent/child query and filter for one of our requirements. We have
100K parent object and about 20 M child objects which belongs to these
parent objects. A thread is adding more document as bulks continuously. In
this case my parent child query requests sent by TransportClient return with
timeout exception. I think this is because of existence of parent and child
objects on same index and continuously updating of child object type. It
hurts search performance of parent objects. is this comment correct? Do you
have any other suggestions to increase query performance?

I think it would be much better if we have a IN query instead of
parent/child query. In this way we can separate child index from parent
index and we can boost the performance of search operation done on parent
object. Currently we define hasChild query or filter as following using JAVA
api:

def filter = FilterBuilders.hasChildFilter("childobjecttype",
QueryBuilders.....)
def query = QueryBuilders.hasChildQuery("childobjecttype",
QueryBuilders.....)

we can modify this by just adding index as following:

def filter = FilterBuilders.hasChildFilter("childindex",
"childobjecttype", QueryBuilders.....)
def query = QueryBuilders.hasChildQuery("childindex", "childobjecttype",
QueryBuilders.....)

or index may not be mandatory and if it is not specified we can take the
index of parent type:

def filter = FilterBuilders.hasChildFilter("childobjecttype",
QueryBuilders.....)
filter.setIndex("childindex");
def query = QueryBuilders.hasChildQuery("childobjecttype",
QueryBuilders.....)

query*.setIndex("childindex");*

Or since one of the benefits of using parent/child mechanism is putting
parent and child data on same shard, we can leave as it is. We can give a
new name to this type of filter/queries as infilter/inQuery. This filter can
join two types as parent/child mechanism. It can be assumed that id of
related type will be written in any of properties of other type. We can
define filter as:

def filter = FilterBuilders.inFilter("type1index", "type1",
"relatedObjectIdPropetyInChild", QueryBuilders.....)
def query = QueryBuilders.inQuery("type1index", "type1",
"relatedObjectIdPropetyInChild",
QueryBuilders.....**)

*

*Is this meaningful? *
*

Mustafa Sener
www.ifountain.com
WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com
WebRep
Overall rating

WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com
WebRep
Overall rating

WebRep
Overall rating


(Shay Banon) #4

Heya,

I am not sure I understand the query that you execute and what do you mean by "in". And, most importantly, I don't understand this:

I want to add something related with parent/child mechanism. When I run child query passed to hasChildFilter or hasChildQuery methods alone, its performance does not decrease. I can get a response for a search in 5 msecs. However, when I use same query in hasChildQuery or hasChildFilter, its performance decreases and I start to get timeouts.

What is the difference between the two? Maybe a gist with some sample curls can help to "visualize" the queries.
On Friday, April 29, 2011 at 3:07 PM, Mustafa Sener wrote:
One last thing I performed my tests on both v0.15.2 and v0.16.0. They both behave similarly.

On Fri, Apr 29, 2011 at 3:03 PM, Mustafa Sener mustafa.sener@gmail.com wrote:

I want to add something related with parent/child mechanism. When I run child query passed to hasChildFilter or hasChildQuery methods alone, its performance does not decrease. I can get a response for a search in 5 msecs. However, when I use same query in hasChildQuery or hasChildFilter, its performance decreases and I start to get timeouts.

On Fri, Apr 29, 2011 at 10:49 AM, Mustafa Sener mustafa.sener@gmail.com wrote:

Hi,

We test parent/child query and filter for one of our requirements. We have 100K parent object and about 20 M child objects which belongs to these parent objects. A thread is adding more document as bulks continuously. In this case my parent child query requests sent by TransportClient return with timeout exception. I think this is because of existence of parent and child objects on same index and continuously updating of child object type. It hurts search performance of parent objects. is this comment correct? Do you have any other suggestions to increase query performance?

I think it would be much better if we have a IN query instead of parent/child query. In this way we can separate child index from parent index and we can boost the performance of search operation done on parent object. Currently we define hasChild query or filter as following using JAVA api:

def filter = FilterBuilders.hasChildFilter("childobjecttype", QueryBuilders.....)
def query = QueryBuilders.hasChildQuery("childobjecttype", QueryBuilders.....)

we can modify this by just adding index as following:

def filter = FilterBuilders.hasChildFilter("childindex", "childobjecttype", QueryBuilders.....)
def query = QueryBuilders.hasChildQuery("childindex", "childobjecttype", QueryBuilders.....)

or index may not be mandatory and if it is not specified we can take the index of parent type:

def filter = FilterBuilders.hasChildFilter("childobjecttype", QueryBuilders.....)
filter.setIndex("childindex");
def query = QueryBuilders.hasChildQuery("childobjecttype", QueryBuilders.....)
query.setIndex("childindex");

Or since one of the benefits of using parent/child mechanism is putting parent and child data on same shard, we can leave as it is. We can give a new name to this type of filter/queries as infilter/inQuery. This filter can join two types as parent/child mechanism. It can be assumed that id of related type will be written in any of properties of other type. We can define filter as:

def filter = FilterBuilders.inFilter("type1index", "type1", "relatedObjectIdPropetyInChild", QueryBuilders.....)
def query = QueryBuilders.inQuery("type1index", "type1", "relatedObjectIdPropetyInChild", QueryBuilders.....)

Is this meaningful?

--
Mustafa Sener
www.ifountain.com
WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com
WebRep

Overall rating

WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com
WebRep

Overall rating

WebRep
Overall rating


(Mustafa Sener) #5

Hi,
I created a gist to show performance problem of parent/child

To clarify IN query mechanism, it is a mechanism like database IN queries.
Query given to inQuery as last parameter searches documents in specified
index and type (first and second parameter of inQuery) and returns the
values of property name specified in inQuery (Third parameter). Finally,
search request will return documents matches these returned ids. as search
result.

I created a gist to show usage of inQuery.

On Sat, Apr 30, 2011 at 12:54 AM, Shay Banon
shay.banon@elasticsearch.comwrote:

Heya,

I am not sure I understand the query that you execute and what do you
mean by "in". And, most importantly, I don't understand this:

I want to add something related with parent/child mechanism. When I run
child query passed to hasChildFilter or hasChildQuery methods alone, its
performance does not decrease. I can get a response for a search in 5 msecs.
However, when I use same query in hasChildQuery or hasChildFilter, its
performance decreases and I start to get timeouts.

What is the difference between the two? Maybe a gist with some sample curls
can help to "visualize" the queries.

On Friday, April 29, 2011 at 3:07 PM, Mustafa Sener wrote:

One last thing I performed my tests on both v0.15.2 and v0.16.0. They both
behave similarly.

On Fri, Apr 29, 2011 at 3:03 PM, Mustafa Sener mustafa.sener@gmail.comwrote:

I want to add something related with parent/child mechanism. When I run
child query passed to hasChildFilter or hasChildQuery methods alone, its
performance does not decrease. I can get a response for a search in 5 msecs.
However, when I use same query in hasChildQuery or hasChildFilter, its
performance decreases and I start to get timeouts.

On Fri, Apr 29, 2011 at 10:49 AM, Mustafa Sener mustafa.sener@gmail.comwrote:

Hi,

We test parent/child query and filter for one of our requirements. We have
100K parent object and about 20 M child objects which belongs to these
parent objects. A thread is adding more document as bulks continuously. In
this case my parent child query requests sent by TransportClient return with
timeout exception. I think this is because of existence of parent and child
objects on same index and continuously updating of child object type. It
hurts search performance of parent objects. is this comment correct? Do you
have any other suggestions to increase query performance?

I think it would be much better if we have a IN query instead of
parent/child query. In this way we can separate child index from parent
index and we can boost the performance of search operation done on parent
object. Currently we define hasChild query or filter as following using JAVA
api:

def filter = FilterBuilders.hasChildFilter("childobjecttype",
QueryBuilders.....)
def query = QueryBuilders.hasChildQuery("childobjecttype",
QueryBuilders.....)

we can modify this by just adding index as following:

def filter = FilterBuilders.hasChildFilter("childindex",
"childobjecttype", QueryBuilders.....)
def query = QueryBuilders.hasChildQuery("childindex", "childobjecttype",
QueryBuilders.....)

or index may not be mandatory and if it is not specified we can take the
index of parent type:

def filter = FilterBuilders.hasChildFilter("childobjecttype",
QueryBuilders.....)
filter.setIndex("childindex");
def query = QueryBuilders.hasChildQuery("childobjecttype",
QueryBuilders.....)

query*.setIndex("childindex");*

Or since one of the benefits of using parent/child mechanism is putting
parent and child data on same shard, we can leave as it is. We can give a
new name to this type of filter/queries as infilter/inQuery. This filter can
join two types as parent/child mechanism. It can be assumed that id of
related type will be written in any of properties of other type. We can
define filter as:

def filter = FilterBuilders.inFilter("type1index", "type1",
"relatedObjectIdPropetyInChild", QueryBuilders.....)
def query = QueryBuilders.inQuery("type1index", "type1",
"relatedObjectIdPropetyInChild",
QueryBuilders.....**)

*

*Is this meaningful? *
*

Mustafa Sener
www.ifountain.com
WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com
WebRep
Overall rating

WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com
WebRep
Overall rating

WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com
WebRep
Overall rating


(system) #6