Very slow has_child query for large index

I have a 40 million child documents with 20 million parents on 3 shards all
hosted within one machine
My specs are elastic 0.9, 8 cores, 8 gigs dedicated memory

I am performing a has_child query, and it doesn't seem to return even after
15 minutes
Here is my query

curl -XPOST "http://xxxx/people/household/_search?search_type=count" -d'
{
"query": {
"has_child": {
"type": "person",
"query" : {
"filtered": {
"query": { "match_all": {}},
"filter" : {
"bool": { "must": [
{"term": {"exact_age": "65"}},
{"term": {"gender": "m"}}
]
}
}
}
}
}
}
}'

Is there something I am doing wrong, is there a way to check what is taking
so long

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

UPDATE: my query returned taking 30 minutes

On Thursday, June 20, 2013 3:05:01 PM UTC+3, David MZ wrote:

I have a 40 million child documents with 20 million parents on 3 shards
all hosted within one machine
My specs are elastic 0.9, 8 cores, 8 gigs dedicated memory

I am performing a has_child query, and it doesn't seem to return even
after 15 minutes
Here is my query

curl -XPOST "http://xxxx/people/household/_search?search_type=count" -d'
{
"query": {
"has_child": {
"type": "person",
"query" : {
"filtered": {
"query": { "match_all": {}},
"filter" : {
"bool": { "must": [
{"term": {"exact_age": "65"}},
{"term": {"gender": "m"}}
]
}
}
}
}
}
}
}'

Is there something I am doing wrong, is there a way to check what is
taking so long

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Why not

curl -XPOST "http://xxxx/people/household/_search?search_type=count
http://xxxx/people/household/_search?search_type=count" -d'
{
"query": {
"has_child": {
"type": "person",
"query" : {
"bool": { "must": [
{"term": {"exact_age": "65"}},
{"term": {"gender": "m"}}
]
}
}
}
}
}'

Why 3 shards and not 1 on single node?

Note that has_child loads all _id's on the heap, so adjust the heap
size. You did not tell us about your heap settings.

Jörg

Am 20.06.13 14:53, schrieb David MZ:

UPDATE: my query returned taking 30 minutes

On Thursday, June 20, 2013 3:05:01 PM UTC+3, David MZ wrote:

I have a 40 million child documents with 20 million parents on 3
shards all hosted within one machine
My specs are elastic 0.9, 8 cores, 8 gigs dedicated memory

I am performing a has_child query, and it doesn't seem to return
even after 15 minutes
Here is my query

curl -XPOST
"http://xxxx/people/household/_search?search_type=count
<http://xxxx/people/household/_search?search_type=count>" -d'
{
"query": {
    "has_child": {
      "type": "person",
      "query" : {
        "filtered": {
          "query": { "match_all": {}},
          "filter" : {
            "bool": { "must": [
                {"term": {"exact_age": "65"}},
              {"term": {"gender": "m"}}
            ]
          }
        }
      }
    }
  }
}
}'

Is there something I am doing wrong, is there a way to check what
is taking so long

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I have 8 gigs for my heap (50% of the total ram), the issue is that I see
in bigdesk that the memory climbs slowly to 7 gigs before the answer is
given. but the query took 30 minutes
so it seems that memory is not my bottleneck

I was under the impression from reading that filtered query is the fastest
query, I only need counting, also search_type=count is suppose to be better
the _count

On Thu, Jun 20, 2013 at 4:02 PM, Jörg Prante joergprante@gmail.com wrote:

Why not

curl -XPOST "http://xxxx/people/household/**_search?search_type=counthttp://xxxx/people/household/_search?search_type=count<
http://xxxx/people/household/**_search?search_type=counthttp://xxxx/people/household/_search?search_type=count>"
-d'

{
"query": {
"has_child": {
"type": "person",
"query" : {
"bool": { "must": [
{"term": {"exact_age": "65"}},
{"term": {"gender": "m"}}
]
}
}
}
}
}'

Why 3 shards and not 1 on single node?

Note that has_child loads all _id's on the heap, so adjust the heap size.
You did not tell us about your heap settings.

Jörg

Am 20.06.13 14:53, schrieb David MZ:

UPDATE: my query returned taking 30 minutes

On Thursday, June 20, 2013 3:05:01 PM UTC+3, David MZ wrote:

I have a 40 million child documents with 20 million parents on 3
shards all hosted within one machine
My specs are elastic 0.9, 8 cores, 8 gigs dedicated memory

I am performing a has_child query, and it doesn't seem to return
even after 15 minutes
Here is my query

curl -XPOST
"http://xxxx/people/household/**_search?search_type=count<http://xxxx/people/household/_search?search_type=count>
<http://xxxx/people/household/**_search?search_type=count<http://xxxx/people/household/_search?search_type=count>>"

-d'
{
"query": {
"has_child": {
"type": "person",
"query" : {
"filtered": {
"query": { "match_all": {}},
"filter" : {
"bool": { "must": [
{"term": {"exact_age": "65"}},
{"term": {"gender": "m"}}
]
}
}
}
}
}
}
}'

Is there something I am doing wrong, is there a way to check what
is taking so long

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/Pr0G-**j10IaM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/Pr0G-j10IaM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

has_child is always internally a filtered query.

Read the docs

"The has_child query works the same as the has_child filter, by
automatically wrapping the filter with a constant_score (when using the
default score type). "

Of course filters can be fast, but the price is high. They are fast if
you have enough memory and if all your doc terms and doc ids can be
loaded into memory. If not, they do not warn you, they are just becoming
very slow, because you just stress the JVM and the OS, and you have to
trace the numbers in the monitoring tools to find out if your heap is
really the problem, or if OS has a problem.

So at first, just run un-filtered query to see if your query works.
Later you can experiment with filters.

I can't tell if 8g is enough. Many people do not use large heaps, they
just add more nodes. 3 shards on 1 node are competing for the 8g. It
means, you have 2.66g for many millions of children. Did you calculate
the size per shard? So I assume with 1 shard per node you may be a
little better for has_child queries.

Jörg

Am 20.06.13 15:13, schrieb David MZ:

I was under the impression from reading that filtered query is the
fastest query, I only need counting, also search_type=count is suppose
to be better the _count

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

"So at first, just run un-filtered query to see if your query works. Later
you can experiment with filters."

This query took 10 second

curl -XPOST "xxx/_search" -d'
{
"query": {
"has_child": {
"type": "person",
"query" : {

       "match_all": {}

  }
}

}
}

I will try to index with one shard, my performance goals are quite big, I
need an under a second query for "bool" filters (range term and terms), I
just need the count of the parents, nothing more.

Any tips?
'

On Thu, Jun 20, 2013 at 4:43 PM, Jörg Prante joergprante@gmail.com wrote:

has_child is always internally a filtered query.

Read the docs http://www.elasticsearch.org/**
guide/reference/query-dsl/has-**child-query/http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/

"The has_child query works the same as the has_child filter, by
automatically wrapping the filter with a constant_score (when using the
default score type). "

Of course filters can be fast, but the price is high. They are fast if you
have enough memory and if all your doc terms and doc ids can be loaded into
memory. If not, they do not warn you, they are just becoming very slow,
because you just stress the JVM and the OS, and you have to trace the
numbers in the monitoring tools to find out if your heap is really the
problem, or if OS has a problem.

So at first, just run un-filtered query to see if your query works. Later
you can experiment with filters.

I can't tell if 8g is enough. Many people do not use large heaps, they
just add more nodes. 3 shards on 1 node are competing for the 8g. It means,
you have 2.66g for many millions of children. Did you calculate the size
per shard? So I assume with 1 shard per node you may be a little better for
has_child queries.

Jörg

Am 20.06.13 15:13, schrieb David MZ:

I was under the impression from reading that filtered query is the

fastest query, I only need counting, also search_type=count is suppose to
be better the _count

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/Pr0G-**j10IaM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/Pr0G-j10IaM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I have ran into the same problem (not 30 minute queries, but slow). They
are due to two things. The first is loading the ids into memory which is
the bulk of the slowness. There is no avoiding this, use warmers to make
sure these ids are already loaded before running your queries. The second
problem I found if that the has_child query loops over every single parent
id no matter if 1 or 1M parents have been identified. I have started
working on a patch for this that you can try if you like:

https://github.com/mattweber/elasticsearch/tree/haschildopt

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 6:43 AM, Jörg Prante joergprante@gmail.com wrote:

has_child is always internally a filtered query.

Read the docs http://www.elasticsearch.org/**
guide/reference/query-dsl/has-**child-query/http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/

"The has_child query works the same as the has_child filter, by
automatically wrapping the filter with a constant_score (when using the
default score type). "

Of course filters can be fast, but the price is high. They are fast if you
have enough memory and if all your doc terms and doc ids can be loaded into
memory. If not, they do not warn you, they are just becoming very slow,
because you just stress the JVM and the OS, and you have to trace the
numbers in the monitoring tools to find out if your heap is really the
problem, or if OS has a problem.

So at first, just run un-filtered query to see if your query works. Later
you can experiment with filters.

I can't tell if 8g is enough. Many people do not use large heaps, they
just add more nodes. 3 shards on 1 node are competing for the 8g. It means,
you have 2.66g for many millions of children. Did you calculate the size
per shard? So I assume with 1 shard per node you may be a little better for
has_child queries.

Jörg

Am 20.06.13 15:13, schrieb David MZ:

I was under the impression from reading that filtered query is the

fastest query, I only need counting, also search_type=count is suppose to
be better the _count

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

BTW, keep an eye on this issue:

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 6:57 AM, Matt Weber matt.weber@gmail.com wrote:

I have ran into the same problem (not 30 minute queries, but slow). They
are due to two things. The first is loading the ids into memory which is
the bulk of the slowness. There is no avoiding this, use warmers to make
sure these ids are already loaded before running your queries. The second
problem I found if that the has_child query loops over every single parent
id no matter if 1 or 1M parents have been identified. I have started
working on a patch for this that you can try if you like:

https://github.com/mattweber/elasticsearch/tree/haschildopt

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 6:43 AM, Jörg Prante joergprante@gmail.comwrote:

has_child is always internally a filtered query.

Read the docs http://www.elasticsearch.org/**
guide/reference/query-dsl/has-**child-query/http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/

"The has_child query works the same as the has_child filter, by
automatically wrapping the filter with a constant_score (when using the
default score type). "

Of course filters can be fast, but the price is high. They are fast if
you have enough memory and if all your doc terms and doc ids can be loaded
into memory. If not, they do not warn you, they are just becoming very
slow, because you just stress the JVM and the OS, and you have to trace the
numbers in the monitoring tools to find out if your heap is really the
problem, or if OS has a problem.

So at first, just run un-filtered query to see if your query works. Later
you can experiment with filters.

I can't tell if 8g is enough. Many people do not use large heaps, they
just add more nodes. 3 shards on 1 node are competing for the 8g. It means,
you have 2.66g for many millions of children. Did you calculate the size
per shard? So I assume with 1 shard per node you may be a little better for
has_child queries.

Jörg

Am 20.06.13 15:13, schrieb David MZ:

I was under the impression from reading that filtered query is the

fastest query, I only need counting, also search_type=count is suppose to
be better the _count

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I have 25 million parents and 45mil children, even after I run a second
query with slight change in filters it took 30 minutes, so there is
something wrong here, as the ids was suppose to be loaded

but the sequential query also took a long time

On Thu, Jun 20, 2013 at 4:57 PM, Matt Weber matt.weber@gmail.com wrote:

I have ran into the same problem (not 30 minute queries, but slow). They
are due to two things. The first is loading the ids into memory which is
the bulk of the slowness. There is no avoiding this, use warmers to make
sure these ids are already loaded before running your queries. The second
problem I found if that the has_child query loops over every single parent
id no matter if 1 or 1M parents have been identified. I have started
working on a patch for this that you can try if you like:

https://github.com/mattweber/elasticsearch/tree/haschildopt

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 6:43 AM, Jörg Prante joergprante@gmail.comwrote:

has_child is always internally a filtered query.

Read the docs http://www.elasticsearch.org/**
guide/reference/query-dsl/has-**child-query/http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/

"The has_child query works the same as the has_child filter, by
automatically wrapping the filter with a constant_score (when using the
default score type). "

Of course filters can be fast, but the price is high. They are fast if
you have enough memory and if all your doc terms and doc ids can be loaded
into memory. If not, they do not warn you, they are just becoming very
slow, because you just stress the JVM and the OS, and you have to trace the
numbers in the monitoring tools to find out if your heap is really the
problem, or if OS has a problem.

So at first, just run un-filtered query to see if your query works. Later
you can experiment with filters.

I can't tell if 8g is enough. Many people do not use large heaps, they
just add more nodes. 3 shards on 1 node are competing for the 8g. It means,
you have 2.66g for many millions of children. Did you calculate the size
per shard? So I assume with 1 shard per node you may be a little better for
has_child queries.

Jörg

Am 20.06.13 15:13, schrieb David MZ:

I was under the impression from reading that filtered query is the

fastest query, I only need counting, also search_type=count is suppose to
be better the _count

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Pr0G-j10IaM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I do not need to get the parent documents only count them, is there
anything I can do to make it faster

On Thursday, June 20, 2013 5:01:37 PM UTC+3, David MZ wrote:

I have 25 million parents and 45mil children, even after I run a second
query with slight change in filters it took 30 minutes, so there is
something wrong here, as the ids was suppose to be loaded

but the sequential query also took a long time

On Thu, Jun 20, 2013 at 4:57 PM, Matt Weber matt.weber@gmail.com wrote:

I have ran into the same problem (not 30 minute queries, but slow). They
are due to two things. The first is loading the ids into memory which is
the bulk of the slowness. There is no avoiding this, use warmers to make
sure these ids are already loaded before running your queries. The second
problem I found if that the has_child query loops over every single parent
id no matter if 1 or 1M parents have been identified. I have started
working on a patch for this that you can try if you like:

https://github.com/mattweber/elasticsearch/tree/haschildopt

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 6:43 AM, Jörg Prante joergprante@gmail.comwrote:

has_child is always internally a filtered query.

Read the docs http://www.elasticsearch.org/**
guide/reference/query-dsl/has-**child-query/http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/

"The has_child query works the same as the has_child filter, by
automatically wrapping the filter with a constant_score (when using the
default score type). "

Of course filters can be fast, but the price is high. They are fast if
you have enough memory and if all your doc terms and doc ids can be loaded
into memory. If not, they do not warn you, they are just becoming very
slow, because you just stress the JVM and the OS, and you have to trace the
numbers in the monitoring tools to find out if your heap is really the
problem, or if OS has a problem.

So at first, just run un-filtered query to see if your query works.
Later you can experiment with filters.

I can't tell if 8g is enough. Many people do not use large heaps, they
just add more nodes. 3 shards on 1 node are competing for the 8g. It means,
you have 2.66g for many millions of children. Did you calculate the size
per shard? So I assume with 1 shard per node you may be a little better for
has_child queries.

Jörg

Am 20.06.13 15:13, schrieb David MZ:

I was under the impression from reading that filtered query is the

fastest query, I only need counting, also search_type=count is suppose to
be better the _count

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Pr0G-j10IaM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

There is no avoiding the loading, but the queries should be faster after
the fact. Have you tried using a filter? It is generally faster than a
query.

{
"query": {
"constant_score": {
"filter": {
"has_child": {
"filter": {
"bool": {
"must": [
{"term": {"exact_age": "65"}},
{"term": {"gender": "m"}}
]
}
}
}
}
}
}
}

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 7:04 AM, David MZ david.mazvovsky@gmail.com wrote:

I do not need to get the parent documents only count them, is there
anything I can do to make it faster

On Thursday, June 20, 2013 5:01:37 PM UTC+3, David MZ wrote:

I have 25 million parents and 45mil children, even after I run a second
query with slight change in filters it took 30 minutes, so there is
something wrong here, as the ids was suppose to be loaded

but the sequential query also took a long time

On Thu, Jun 20, 2013 at 4:57 PM, Matt Weber matt.weber@gmail.com wrote:

I have ran into the same problem (not 30 minute queries, but slow).
They are due to two things. The first is loading the ids into memory
which is the bulk of the slowness. There is no avoiding this, use warmers
to make sure these ids are already loaded before running your queries. The
second problem I found if that the has_child query loops over every single
parent id no matter if 1 or 1M parents have been identified. I have
started working on a patch for this that you can try if you like:

https://github.com/mattweber/**elasticsearch/tree/haschildopthttps://github.com/mattweber/elasticsearch/tree/haschildopt

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 6:43 AM, Jörg Prante joergprante@gmail.comwrote:

has_child is always internally a filtered query.

Read the docs Elasticsearch Platform — Find real-time answers at scale | Elastic**
uide/reference/query-dsl/has-child-query/http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/

"The has_child query works the same as the has_child filter, by
automatically wrapping the filter with a constant_score (when using the
default score type). "

Of course filters can be fast, but the price is high. They are fast if
you have enough memory and if all your doc terms and doc ids can be loaded
into memory. If not, they do not warn you, they are just becoming very
slow, because you just stress the JVM and the OS, and you have to trace the
numbers in the monitoring tools to find out if your heap is really the
problem, or if OS has a problem.

So at first, just run un-filtered query to see if your query works.
Later you can experiment with filters.

I can't tell if 8g is enough. Many people do not use large heaps, they
just add more nodes. 3 shards on 1 node are competing for the 8g. It means,
you have 2.66g for many millions of children. Did you calculate the size
per shard? So I assume with 1 shard per node you may be a little better for
has_child queries.

Jörg

Am 20.06.13 15:13, schrieb David MZ:

I was under the impression from reading that filtered query is the

fastest query, I only need counting, also search_type=count is suppose to
be better the _count

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.

For more options, visit https://groups.google.com/**grou**ps/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/Pr0G-**j10IaM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/Pr0G-j10IaM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I tried both, there is no speed difference. I want to solve this issue, how
can I use your patch, will it help me?

On Thursday, June 20, 2013 5:22:09 PM UTC+3, Matt Weber wrote:

There is no avoiding the loading, but the queries should be faster after
the fact. Have you tried using a filter? It is generally faster than a
query.

{
"query": {
"constant_score": {
"filter": {
"has_child": {
"filter": {
"bool": {
"must": [
{"term": {"exact_age": "65"}},
{"term": {"gender": "m"}}
]
}
}
}
}
}
}
}

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 7:04 AM, David MZ <david.m...@gmail.com<javascript:>

wrote:

I do not need to get the parent documents only count them, is there
anything I can do to make it faster

On Thursday, June 20, 2013 5:01:37 PM UTC+3, David MZ wrote:

I have 25 million parents and 45mil children, even after I run a second
query with slight change in filters it took 30 minutes, so there is
something wrong here, as the ids was suppose to be loaded

but the sequential query also took a long time

On Thu, Jun 20, 2013 at 4:57 PM, Matt Weber <matt....@gmail.com<javascript:>

wrote:

I have ran into the same problem (not 30 minute queries, but slow).
They are due to two things. The first is loading the ids into memory
which is the bulk of the slowness. There is no avoiding this, use warmers
to make sure these ids are already loaded before running your queries. The
second problem I found if that the has_child query loops over every single
parent id no matter if 1 or 1M parents have been identified. I have
started working on a patch for this that you can try if you like:

https://github.com/mattweber/**elasticsearch/tree/haschildopthttps://github.com/mattweber/elasticsearch/tree/haschildopt

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 6:43 AM, Jörg Prante <joerg...@gmail.com<javascript:>

wrote:

has_child is always internally a filtered query.

Read the docs Elasticsearch Platform — Find real-time answers at scale | Elastic**
uide/reference/query-dsl/has-child-query/http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/

"The has_child query works the same as the has_child filter, by
automatically wrapping the filter with a constant_score (when using the
default score type). "

Of course filters can be fast, but the price is high. They are fast if
you have enough memory and if all your doc terms and doc ids can be loaded
into memory. If not, they do not warn you, they are just becoming very
slow, because you just stress the JVM and the OS, and you have to trace the
numbers in the monitoring tools to find out if your heap is really the
problem, or if OS has a problem.

So at first, just run un-filtered query to see if your query works.
Later you can experiment with filters.

I can't tell if 8g is enough. Many people do not use large heaps, they
just add more nodes. 3 shards on 1 node are competing for the 8g. It means,
you have 2.66g for many millions of children. Did you calculate the size
per shard? So I assume with 1 shard per node you may be a little better for
has_child queries.

Jörg

Am 20.06.13 15:13, schrieb David MZ:

I was under the impression from reading that filtered query is the

fastest query, I only need counting, also search_type=count is suppose to
be better the _count

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.

For more options, visit https://groups.google.com/**grou**ps/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/Pr0G-**j10IaM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/Pr0G-j10IaM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@**googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you are using ES version 0.90.0, then I recommend to upgrade to ES
version 0.90.1, memory usage have been improved. See this issue:

15 to 30 minutes is a very long time, are you sure you configured a
has_child query in the warmer for the index you're using?

Matt's improvement to the has_child query improves the query time in
certain cases. For example when a few child documents match, then the
execution of the has_child can be short circuited.

Depending on the number of parent document, the parent/child join can take
a bug chuck of the query time. The parent/child feature scales, so by
having more primary shards and adding more nodes, the query time should
become acceptable.

On 20 June 2013 16:35, David MZ david.mazvovsky@gmail.com wrote:

I tried both, there is no speed difference. I want to solve this issue,
how can I use your patch, will it help me?

On Thursday, June 20, 2013 5:22:09 PM UTC+3, Matt Weber wrote:

There is no avoiding the loading, but the queries should be faster after
the fact. Have you tried using a filter? It is generally faster than a
query.

{
"query": {
"constant_score": {
"filter": {
"has_child": {
"filter": {
"bool": {
"must": [
{"term": {"exact_age": "65"}},
{"term": {"gender": "m"}}
]
}
}
}
}
}
}
}

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 7:04 AM, David MZ david.m...@gmail.com wrote:

I do not need to get the parent documents only count them, is there
anything I can do to make it faster

On Thursday, June 20, 2013 5:01:37 PM UTC+3, David MZ wrote:

I have 25 million parents and 45mil children, even after I run a
second query with slight change in filters it took 30 minutes, so there is
something wrong here, as the ids was suppose to be loaded

but the sequential query also took a long time

On Thu, Jun 20, 2013 at 4:57 PM, Matt Weber matt....@gmail.com wrote:

I have ran into the same problem (not 30 minute queries, but slow).
They are due to two things. The first is loading the ids into memory
which is the bulk of the slowness. There is no avoiding this, use warmers
to make sure these ids are already loaded before running your queries. The
second problem I found if that the has_child query loops over every single
parent id no matter if 1 or 1M parents have been identified. I have
started working on a patch for this that you can try if you like:

https://github.com/mattweber/**e**lasticsearch/tree/haschildopthttps://github.com/mattweber/elasticsearch/tree/haschildopt

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 6:43 AM, Jörg Prante joerg...@gmail.comwrote:

has_child is always internally a filtered query.

Read the docs Elasticsearch Platform — Find real-time answers at scale | Elastic****
uide/reference/query-dsl/has-**c****hild-query/http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/

"The has_child query works the same as the has_child filter, by
automatically wrapping the filter with a constant_score (when using the
default score type). "

Of course filters can be fast, but the price is high. They are fast
if you have enough memory and if all your doc terms and doc ids can be
loaded into memory. If not, they do not warn you, they are just becoming
very slow, because you just stress the JVM and the OS, and you have to
trace the numbers in the monitoring tools to find out if your heap is
really the problem, or if OS has a problem.

So at first, just run un-filtered query to see if your query works.
Later you can experiment with filters.

I can't tell if 8g is enough. Many people do not use large heaps,
they just add more nodes. 3 shards on 1 node are competing for the 8g. It
means, you have 2.66g for many millions of children. Did you calculate the
size per shard? So I assume with 1 shard per node you may be a little
better for has_child queries.

Jörg

Am 20.06.13 15:13, schrieb David MZ:

I was under the impression from reading that filtered query is the

fastest query, I only need counting, also search_type=count is suppose to
be better the _count

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**goog****legroups.com.

For more options, visit https://groups.google.com/**grou****
ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**to
**pic/elasticsearch/Pr0G-**j10IaM/**unsubscribehttps://groups.google.com/d/topic/elasticsearch/Pr0G-j10IaM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/**grou**ps/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I have migrated to using nested documents, is solved all my problems
including performance

Thanks

On Tue, Jun 25, 2013 at 11:36 AM, Martijn v Groningen <
martijn.v.groningen@gmail.com> wrote:

If you are using ES version 0.90.0, then I recommend to upgrade to ES
version 0.90.1, memory usage have been improved. See this issue:
Parent-Child: Improve memory usage id cache · Issue #3028 · elastic/elasticsearch · GitHub

15 to 30 minutes is a very long time, are you sure you configured a
has_child query in the warmer for the index you're using?
Elasticsearch Platform — Find real-time answers at scale | Elastic

Matt's improvement to the has_child query improves the query time in
certain cases. For example when a few child documents match, then the
execution of the has_child can be short circuited.

Depending on the number of parent document, the parent/child join can take
a bug chuck of the query time. The parent/child feature scales, so by
having more primary shards and adding more nodes, the query time should
become acceptable.

On 20 June 2013 16:35, David MZ david.mazvovsky@gmail.com wrote:

I tried both, there is no speed difference. I want to solve this issue,
how can I use your patch, will it help me?

On Thursday, June 20, 2013 5:22:09 PM UTC+3, Matt Weber wrote:

There is no avoiding the loading, but the queries should be faster after
the fact. Have you tried using a filter? It is generally faster than a
query.

{
"query": {
"constant_score": {
"filter": {
"has_child": {
"filter": {
"bool": {
"must": [
{"term": {"exact_age": "65"}},
{"term": {"gender": "m"}}
]
}
}
}
}
}
}
}

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 7:04 AM, David MZ david.m...@gmail.com wrote:

I do not need to get the parent documents only count them, is there
anything I can do to make it faster

On Thursday, June 20, 2013 5:01:37 PM UTC+3, David MZ wrote:

I have 25 million parents and 45mil children, even after I run a
second query with slight change in filters it took 30 minutes, so there is
something wrong here, as the ids was suppose to be loaded

but the sequential query also took a long time

On Thu, Jun 20, 2013 at 4:57 PM, Matt Weber matt....@gmail.comwrote:

I have ran into the same problem (not 30 minute queries, but slow).
They are due to two things. The first is loading the ids into memory
which is the bulk of the slowness. There is no avoiding this, use warmers
to make sure these ids are already loaded before running your queries. The
second problem I found if that the has_child query loops over every single
parent id no matter if 1 or 1M parents have been identified. I have
started working on a patch for this that you can try if you like:

https://github.com/mattweber/**e**lasticsearch/tree/haschildopthttps://github.com/mattweber/elasticsearch/tree/haschildopt

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 6:43 AM, Jörg Prante joerg...@gmail.comwrote:

has_child is always internally a filtered query.

Read the docs Elasticsearch Platform — Find real-time answers at scale | Elastic****
uide/reference/query-dsl/has-**c****hild-query/http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/

"The has_child query works the same as the has_child filter, by
automatically wrapping the filter with a constant_score (when using the
default score type). "

Of course filters can be fast, but the price is high. They are fast
if you have enough memory and if all your doc terms and doc ids can be
loaded into memory. If not, they do not warn you, they are just becoming
very slow, because you just stress the JVM and the OS, and you have to
trace the numbers in the monitoring tools to find out if your heap is
really the problem, or if OS has a problem.

So at first, just run un-filtered query to see if your query works.
Later you can experiment with filters.

I can't tell if 8g is enough. Many people do not use large heaps,
they just add more nodes. 3 shards on 1 node are competing for the 8g. It
means, you have 2.66g for many millions of children. Did you calculate the
size per shard? So I assume with 1 shard per node you may be a little
better for has_child queries.

Jörg

Am 20.06.13 15:13, schrieb David MZ:

I was under the impression from reading that filtered query is the

fastest query, I only need counting, also search_type=count is suppose to
be better the _count

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**goog****legroups.com.

For more options, visit https://groups.google.com/**grou****
ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
to**pic/elasticsearch/Pr0G-**j10IaM/**unsubscribehttps://groups.google.com/d/topic/elasticsearch/Pr0G-j10IaM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/**grou**ps/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Pr0G-j10IaM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, the nested query is much faster than the has_child query, but less
flexible than parent/child in general. If that is okay with you then nested
is a good choice.

On 25 June 2013 11:05, David MZ david.mazvovsky@gmail.com wrote:

I have migrated to using nested documents, is solved all my problems
including performance

Thanks

On Tue, Jun 25, 2013 at 11:36 AM, Martijn v Groningen <
martijn.v.groningen@gmail.com> wrote:

If you are using ES version 0.90.0, then I recommend to upgrade to ES
version 0.90.1, memory usage have been improved. See this issue:
Parent-Child: Improve memory usage id cache · Issue #3028 · elastic/elasticsearch · GitHub

15 to 30 minutes is a very long time, are you sure you configured a
has_child query in the warmer for the index you're using?
Elasticsearch Platform — Find real-time answers at scale | Elastic

Matt's improvement to the has_child query improves the query time in
certain cases. For example when a few child documents match, then the
execution of the has_child can be short circuited.

Depending on the number of parent document, the parent/child join can
take a bug chuck of the query time. The parent/child feature scales, so by
having more primary shards and adding more nodes, the query time should
become acceptable.

On 20 June 2013 16:35, David MZ david.mazvovsky@gmail.com wrote:

I tried both, there is no speed difference. I want to solve this issue,
how can I use your patch, will it help me?

On Thursday, June 20, 2013 5:22:09 PM UTC+3, Matt Weber wrote:

There is no avoiding the loading, but the queries should be faster
after the fact. Have you tried using a filter? It is generally faster
than a query.

{
"query": {
"constant_score": {
"filter": {
"has_child": {
"filter": {
"bool": {
"must": [
{"term": {"exact_age": "65"}},
{"term": {"gender": "m"}}
]
}
}
}
}
}
}
}

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 7:04 AM, David MZ david.m...@gmail.com wrote:

I do not need to get the parent documents only count them, is there
anything I can do to make it faster

On Thursday, June 20, 2013 5:01:37 PM UTC+3, David MZ wrote:

I have 25 million parents and 45mil children, even after I run a
second query with slight change in filters it took 30 minutes, so there is
something wrong here, as the ids was suppose to be loaded

but the sequential query also took a long time

On Thu, Jun 20, 2013 at 4:57 PM, Matt Weber matt....@gmail.comwrote:

I have ran into the same problem (not 30 minute queries, but slow).
They are due to two things. The first is loading the ids into memory
which is the bulk of the slowness. There is no avoiding this, use warmers
to make sure these ids are already loaded before running your queries. The
second problem I found if that the has_child query loops over every single
parent id no matter if 1 or 1M parents have been identified. I have
started working on a patch for this that you can try if you like:

https://github.com/mattweber/**e**lasticsearch/tree/haschildopthttps://github.com/mattweber/elasticsearch/tree/haschildopt

Thanks,
Matt Weber

On Thu, Jun 20, 2013 at 6:43 AM, Jörg Prante joerg...@gmail.comwrote:

has_child is always internally a filtered query.

Read the docs Elasticsearch Platform — Find real-time answers at scale | Elastic****
uide/reference/query-dsl/has-**c****hild-query/http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/

"The has_child query works the same as the has_child filter, by
automatically wrapping the filter with a constant_score (when using the
default score type). "

Of course filters can be fast, but the price is high. They are fast
if you have enough memory and if all your doc terms and doc ids can be
loaded into memory. If not, they do not warn you, they are just becoming
very slow, because you just stress the JVM and the OS, and you have to
trace the numbers in the monitoring tools to find out if your heap is
really the problem, or if OS has a problem.

So at first, just run un-filtered query to see if your query works.
Later you can experiment with filters.

I can't tell if 8g is enough. Many people do not use large heaps,
they just add more nodes. 3 shards on 1 node are competing for the 8g. It
means, you have 2.66g for many millions of children. Did you calculate the
size per shard? So I assume with 1 shard per node you may be a little
better for has_child queries.

Jörg

Am 20.06.13 15:13, schrieb David MZ:

I was under the impression from reading that filtered query is the

fastest query, I only need counting, also search_type=count is suppose to
be better the _count

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**goog****legroups.com.

For more options, visit https://groups.google.com/**grou****
ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
to**pic/elasticsearch/Pr0G-**j10IaM/**unsubscribehttps://groups.google.com/d/topic/elasticsearch/Pr0G-j10IaM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/**grou**ps/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Pr0G-j10IaM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.