Elasticsearch query performance

I'm using elasticsearch to index two types of objects -

Data details -

Contract object ~ 60 properties (Object size - 120 bytes)
Risk Item Object ~ 125 properties (Object size - 250 bytes)

Contract is parent of risk item (_parent)

I'm storing 240 million such objects in single index (210 million risk
items, 30 million contracts)
Index size is - 322 gb

Cluster details -

11 m2.4x.large EC2 boxes [68 gb memory, 1.6 TB storage, 8 cores](1 box is a
load balancer node with node.data = false)
50 shards
1 replica

===
elasticsearch.yml -

node.data: true

http.enabled: false

index.number_of_shards: 50

index.number_of_replicas: 1

index.translog.flush_threshold_ops: 10000

index.merge.policy.use_compound_files: false

indices.memory.index_buffer_size: 30%

index.refresh_interval: 30s

index.store.type: mmapfs

path.data: /data-xvdf,/data-xvdg

===

I'm starting the elasticsearch nodes with following command -
/home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g

My problem is that I'm running following query on risk item type and it is
taking about 10-15 seconds to return data.

I'm running this with a load of 50 concurrent users and a bulk index load
of about 5000 risk items happening in parallel.

Query -

http://:9200/contractindex/riskitem/_search

{
"query": {
"has_parent": {
"parent_type": "contract",
"query": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
},
"filter": {
"and": [{
"query": {
"bool": {
"must": [{
"query_string": {
"fields": ["RiskItemProperty1"],
"query": "abc"
}
},
{
"query_string": {
"fields": ["RiskItemProperty2"],
"query": "xyz"
}
}]
}
}
}]
}
}

Can somebody please help me with how I can improve this query performance ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Can you profile the query without the indexing process happening in
parallel? The index_buffer_size setting seems high compared to the default
and your bulk load should only be just over a MB.

The has_parent query could easily be turned into a filter so that you can
take advantage of filtering caching. Is scoring important for that query? I
am assuming it is not since it is a range query.

Cheers,

Ivan

On Thu, Aug 15, 2013 at 7:04 PM, VB vishal.batghare@gmail.com wrote:

I'm using elasticsearch to index two types of objects -

Data details -

Contract object ~ 60 properties (Object size - 120 bytes)
Risk Item Object ~ 125 properties (Object size - 250 bytes)

Contract is parent of risk item (_parent)

I'm storing 240 million such objects in single index (210 million risk
items, 30 million contracts)
Index size is - 322 gb

Cluster details -

11 m2.4x.large EC2 boxes [68 gb memory, 1.6 TB storage, 8 cores](1 box is
a load balancer node with node.data = false)
50 shards
1 replica

===
elasticsearch.yml -

node.data: true

http.enabled: false

index.number_of_shards: 50

index.number_of_replicas: 1

index.translog.flush_threshold_ops: 10000

index.merge.policy.use_compound_files: false

indices.memory.index_buffer_size: 30%

index.refresh_interval: 30s

index.store.type: mmapfs

path.data: /data-xvdf,/data-xvdg

===

I'm starting the elasticsearch nodes with following command -
/home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g

My problem is that I'm running following query on risk item type and it is
taking about 10-15 seconds to return data.

I'm running this with a load of 50 concurrent users and a bulk index load
of about 5000 risk items happening in parallel.

Query -

http://:9200/contractindex/riskitem/_search

{
"query": {
"has_parent": {
"parent_type": "contract",
"query": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
},
"filter": {
"and": [{
"query": {
"bool": {
"must": [{
"query_string": {
"fields": ["RiskItemProperty1"],
"query": "abc"
}
},
{
"query_string": {
"fields": ["RiskItemProperty2"],
"query": "xyz"
}
}]
}
}
}]
}
}

Can somebody please help me with how I can improve this query performance ?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan,

Thanks for the reply.

We are new to elasticsearch, and yes we did run search queries without
indexing and and it still takes around 10 secs.

We can reduce buffer size or remove that setting from yml. Can we
remove/change it after indexes are created or we need to create indexes
again. Does it need server restart or we can call update setting API?

It would be highly appreciated if you can provide a filter version of the
query and scoring is also not important.

Regards,
VB

On Friday, 16 August 2013 10:28:09 UTC-7, Ivan Brusic wrote:

Can you profile the query without the indexing process happening in
parallel? The index_buffer_size setting seems high compared to the default
and your bulk load should only be just over a MB.

The has_parent query could easily be turned into a filter so that you can
take advantage of filtering caching. Is scoring important for that query? I
am assuming it is not since it is a range query.

Cheers,

Ivan

On Thu, Aug 15, 2013 at 7:04 PM, VB <vishal....@gmail.com <javascript:>>wrote:

I'm using elasticsearch to index two types of objects -

Data details -

Contract object ~ 60 properties (Object size - 120 bytes)
Risk Item Object ~ 125 properties (Object size - 250 bytes)

Contract is parent of risk item (_parent)

I'm storing 240 million such objects in single index (210 million risk
items, 30 million contracts)
Index size is - 322 gb

Cluster details -

11 m2.4x.large EC2 boxes [68 gb memory, 1.6 TB storage, 8 cores](1 box is
a load balancer node with node.data = false)
50 shards
1 replica

===
elasticsearch.yml -

node.data: true

http.enabled: false

index.number_of_shards: 50

index.number_of_replicas: 1

index.translog.flush_threshold_ops: 10000

index.merge.policy.use_compound_files: false

indices.memory.index_buffer_size: 30%

index.refresh_interval: 30s

index.store.type: mmapfs

path.data: /data-xvdf,/data-xvdg

===

I'm starting the elasticsearch nodes with following command -
/home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g

My problem is that I'm running following query on risk item type and it
is taking about 10-15 seconds to return data.

I'm running this with a load of 50 concurrent users and a bulk index load
of about 5000 risk items happening in parallel.

Query -

http://:9200/contractindex/riskitem/_search

{
"query": {
"has_parent": {
"parent_type": "contract",
"query": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
},
"filter": {
"and": [{
"query": {
"bool": {
"must": [{
"query_string": {
"fields": ["RiskItemProperty1"],
"query": "abc"
}
},
{
"query_string": {
"fields": ["RiskItemProperty2"],
"query": "xyz"
}
}]
}
}
}]
}
}

Can somebody please help me with how I can improve this query performance
?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

We have removed buffer size and restarted cluster/nodes.

Query is still taking around 10 seconds, CPU on all server is maxing out.

And tried looking for documentation to change has_parent/has_child queries
to normal filter queries. Could not find anything, any inputs will be
useful.

Regards,
VB

On Friday, 16 August 2013 13:50:51 UTC-7, VB wrote:

Ivan,

Thanks for the reply.

We are new to elasticsearch, and yes we did run search queries without
indexing and and it still takes around 10 secs.

We can reduce buffer size or remove that setting from yml. Can we
remove/change it after indexes are created or we need to create indexes
again. Does it need server restart or we can call update setting API?

It would be highly appreciated if you can provide a filter version of the
query and scoring is also not important.

Regards,
VB

On Friday, 16 August 2013 10:28:09 UTC-7, Ivan Brusic wrote:

Can you profile the query without the indexing process happening in
parallel? The index_buffer_size setting seems high compared to the default
and your bulk load should only be just over a MB.

The has_parent query could easily be turned into a filter so that you can
take advantage of filtering caching. Is scoring important for that query? I
am assuming it is not since it is a range query.

Cheers,

Ivan

On Thu, Aug 15, 2013 at 7:04 PM, VB vishal....@gmail.com wrote:

I'm using elasticsearch to index two types of objects -

Data details -

Contract object ~ 60 properties (Object size - 120 bytes)
Risk Item Object ~ 125 properties (Object size - 250 bytes)

Contract is parent of risk item (_parent)

I'm storing 240 million such objects in single index (210 million risk
items, 30 million contracts)
Index size is - 322 gb

Cluster details -

11 m2.4x.large EC2 boxes [68 gb memory, 1.6 TB storage, 8 cores](1 box
is a load balancer node with node.data = false)
50 shards
1 replica

===
elasticsearch.yml -

node.data: true

http.enabled: false

index.number_of_shards: 50

index.number_of_replicas: 1

index.translog.flush_threshold_ops: 10000

index.merge.policy.use_compound_files: false

indices.memory.index_buffer_size: 30%

index.refresh_interval: 30s

index.store.type: mmapfs

path.data: /data-xvdf,/data-xvdg

===

I'm starting the elasticsearch nodes with following command -
/home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g

My problem is that I'm running following query on risk item type and it
is taking about 10-15 seconds to return data.

I'm running this with a load of 50 concurrent users and a bulk index
load of about 5000 risk items happening in parallel.

Query -

http://:9200/contractindex/riskitem/_search

{
"query": {
"has_parent": {
"parent_type": "contract",
"query": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
},
"filter": {
"and": [{
"query": {
"bool": {
"must": [{
"query_string": {
"fields": ["RiskItemProperty1"],
"query": "abc"
}
},
{
"query_string": {
"fields": ["RiskItemProperty2"],
"query": "xyz"
}
}]
}
}
}]
}
}

Can somebody please help me with how I can improve this query
performance ?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi VB,

I do not know your use case but have you considered denormalizing your data? I other words storing parent object as part of its children json. Your query and facet performance wull be much better but the price to pay is having to update every child record if parent changes. Plus if majority of your searches need to return parent you would need to have some way of distincting single parent record out of potentially multiple hits on this parent/child. Still it may worth it. Your parent record is pretty small so the key is just how often it changes. Maybe bring only some of the parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alex we cannot go with denormalizing data, as you mentioned it would need
to update each parent document for any change any attribute of the child
document. Is there anything else you can propose.

In general also our queries from one table are also slower

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on
top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:

Hi VB,

I do not know your use case but have you considered denormalizing your
data? I other words storing parent object as part of its children json.
Your query and facet performance wull be much better but the price to pay
is having to update every child record if parent changes. Plus if majority
of your searches need to return parent you would need to have some way of
distincting single parent record out of potentially multiple hits on this
parent/child. Still it may worth it. Your parent record is pretty small so
the key is just how often it changes. Maybe bring only some of the parent
fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You should use a Bool Filter with must clauses, read this:

{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{"term": {"CommonCharacteristic_BuildingScheme":
"BuildingScheme1"}},
{"term": {"Address_Admin2Name": "Admin2Name1"}}
]
}
}
}
}
}

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:23 PM, VB vishal.batghare@gmail.com wrote:

Alex we cannot go with denormalizing data, as you mentioned it would need
to update each parent document for any change any attribute of the child
document. Is there anything else you can propose.

In general also our queries from one table are also slower

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on
top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:

Hi VB,

I do not know your use case but have you considered denormalizing your
data? I other words storing parent object as part of its children json.
Your query and facet performance wull be much better but the price to pay
is having to update every child record if parent changes. Plus if majority
of your searches need to return parent you would need to have some way of
distincting single parent record out of potentially multiple hits on this
parent/child. Still it may worth it. Your parent record is pretty small so
the key is just how often it changes. Maybe bring only some of the parent
fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Matt.

Can we use this on our parent child queries? and how to write parent child
queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not use it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:

You should use a Bool Filter with must clauses, read this:
http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{"term": {"CommonCharacteristic_BuildingScheme":
"BuildingScheme1"}},
{"term": {"Address_Admin2Name": "Admin2Name1"}}
]
}
}
}
}
}

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:23 PM, VB <vishal....@gmail.com <javascript:>>wrote:

Alex we cannot go with denormalizing data, as you mentioned it would need
to update each parent document for any change any attribute of the child
document. Is there anything else you can propose.

In general also our queries from one table are also slower

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on
top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:

Hi VB,

I do not know your use case but have you considered denormalizing your
data? I other words storing parent object as part of its children json.
Your query and facet performance wull be much better but the price to pay
is having to update every child record if parent changes. Plus if majority
of your searches need to return parent you would need to have some way of
distincting single parent record out of potentially multiple hits on this
parent/child. Still it may worth it. Your parent record is pretty small so
the key is just how often it changes. Maybe bring only some of the parent
fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I meant the opposite denormalize parent into child. You will not need to update parent on child change but all child on parent change which hopefully will be less frequent. And perhaps you only need some parent fields in the child which would make relevant pare nt changes less frequent.

I wonder if bool filter would make dramatic diff please let us know

Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

With your ES cluster node config, you tell ES that it should fill 30g of
heap for filter/cache. Do you use warming?

Another observation is that your index is 322g across 11 nodes, which makes
~30g per node and you have assigned 64g - 30g = 34g to file system and
other so your whole 322g files will fit into the file system cache.

My opinion is that 10s is blazingly fast to fill ~30g from the file system,
prepare your filter query in the heap which may use up to another 30g, and
execute the query plus delivering results.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, you can and should use bool query/filter with your parent/child
queries. Read the article that I linked to know when and when not to use
them. Looking at your original query, I would probably go with something
like this:

{
"query": {
"filtered" : {
"query": {
"bool": {
"must": [
{"match": {"RiskItemProperty1": "abc"}},
{"match": {"RiskItemProperty2": "xyz"}}
]
}
},
"filter": {
"has_parent": {
"parent_type": "contract",
"filter": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
}
}
}
}

Remember that your first couple has_parent or has_child filters and queries
are going to be slower due to id cache being loaded into memory.

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:46 PM, VB vishal.batghare@gmail.com wrote:

Thanks Matt.

Can we use this on our parent child queries? and how to write parent child
queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not use it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:

You should use a Bool Filter with must clauses, read this:
http://www.elasticsearch.org/blog/all-about-elasticsearch-
filter-bitsets/http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{"term": {"CommonCharacteristic_**BuildingScheme":
"BuildingScheme1"}},
{"term": {"Address_Admin2Name": "Admin2Name1"}}
]
}
}
}
}
}

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:23 PM, VB vishal....@gmail.com wrote:

Alex we cannot go with denormalizing data, as you mentioned it would
need to update each parent document for any change any attribute of the
child document. Is there anything else you can propose.

In general also our queries from one table are also slower

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_**BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on
top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:

Hi VB,

I do not know your use case but have you considered denormalizing your
data? I other words storing parent object as part of its children json.
Your query and facet performance wull be much better but the price to pay
is having to update every child record if parent changes. Plus if majority
of your searches need to return parent you would need to have some way of
distincting single parent record out of potentially multiple hits on this
parent/child. Still it may worth it. Your parent record is pretty small so
the key is just how often it changes. Maybe bring only some of the parent
fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Matt. I will run thorough this and post my observation.

On Monday, 19 August 2013 15:00:42 UTC-7, Matt Weber wrote:

Yes, you can and should use bool query/filter with your parent/child
queries. Read the article that I linked to know when and when not to use
them. Looking at your original query, I would probably go with something
like this:

{
"query": {
"filtered" : {
"query": {
"bool": {
"must": [
{"match": {"RiskItemProperty1": "abc"}},
{"match": {"RiskItemProperty2": "xyz"}}
]
}
},
"filter": {
"has_parent": {
"parent_type": "contract",
"filter": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
}
}
}
}

Remember that your first couple has_parent or has_child filters and
queries are going to be slower due to id cache being loaded into memory.

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:46 PM, VB <vishal....@gmail.com <javascript:>>wrote:

Thanks Matt.

Can we use this on our parent child queries? and how to write parent
child queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not use
it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:

You should use a Bool Filter with must clauses, read this:
http://www.elasticsearch.org/blog/all-about-elasticsearch-
filter-bitsets/http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{"term": {"CommonCharacteristic_**BuildingScheme":
"BuildingScheme1"}},
{"term": {"Address_Admin2Name": "Admin2Name1"}}
]
}
}
}
}
}

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:23 PM, VB vishal....@gmail.com wrote:

Alex we cannot go with denormalizing data, as you mentioned it would
need to update each parent document for any change any attribute of the
child document. Is there anything else you can propose.

In general also our queries from one table are also slower

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_**BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort
on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:

Hi VB,

I do not know your use case but have you considered denormalizing your
data? I other words storing parent object as part of its children json.
Your query and facet performance wull be much better but the price to pay
is having to update every child record if parent changes. Plus if majority
of your searches need to return parent you would need to have some way of
distincting single parent record out of potentially multiple hits on this
parent/child. Still it may worth it. Your parent record is pretty small so
the key is just how often it changes. Maybe bring only some of the parent
fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi all,

We made changes as suggested by Matt to use bitsets.

We ran 50 concurrent users (Read Only) for an hour. All our queries are
performing 4 to 5 times faster, except parent child query (query in
question) it has gone down from 7 seconds to 3 seconds.

Matt, thank you so much fort helping us. Is there anything else we can do
in parent child one or in general.

I have one more query with has_child in it. Do you think we can further
improve this one?

{
"query": {
"filtered": {
"query": {
"bool": {
"must": [{
"match": {
"LineOfBusiness": "LOBValue1"
}
}]
}
},
"filter": {
"has_child": {
"type": "riskitem",
"filter": {
"bool": {
"must": [{
"term": {
"Address_Admin1Name": "Admin1Name1"
}
}]
}
}
}
}
}
}
}

Regards,
VB.

On Monday, 19 August 2013 15:04:58 UTC-7, VB wrote:

Thanks Matt. I will run thorough this and post my observation.

On Monday, 19 August 2013 15:00:42 UTC-7, Matt Weber wrote:

Yes, you can and should use bool query/filter with your parent/child
queries. Read the article that I linked to know when and when not to use
them. Looking at your original query, I would probably go with something
like this:

{
"query": {
"filtered" : {
"query": {
"bool": {
"must": [
{"match": {"RiskItemProperty1": "abc"}},
{"match": {"RiskItemProperty2": "xyz"}}
]
}
},
"filter": {
"has_parent": {
"parent_type": "contract",
"filter": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
}
}
}
}

Remember that your first couple has_parent or has_child filters and
queries are going to be slower due to id cache being loaded into memory.

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:46 PM, VB vishal....@gmail.com wrote:

Thanks Matt.

Can we use this on our parent child queries? and how to write parent
child queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not use
it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:

You should use a Bool Filter with must clauses, read this:
http://www.elasticsearch.org/blog/all-about-elasticsearch-
filter-bitsets/http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{"term": {"CommonCharacteristic_**BuildingScheme":
"BuildingScheme1"}},
{"term": {"Address_Admin2Name": "Admin2Name1"}}
]
}
}
}
}
}

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:23 PM, VB vishal....@gmail.com wrote:

Alex we cannot go with denormalizing data, as you mentioned it would
need to update each parent document for any change any attribute of the
child document. Is there anything else you can propose.

In general also our queries from one table are also slower

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_**BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort
on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:

Hi VB,

I do not know your use case but have you considered denormalizing
your data? I other words storing parent object as part of its children
json. Your query and facet performance wull be much better but the price to
pay is having to update every child record if parent changes. Plus if
majority of your searches need to return parent you would need to have some
way of distincting single parent record out of potentially multiple hits on
this parent/child. Still it may worth it. Your parent record is pretty
small so the key is just how often it changes. Maybe bring only some of the
parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

And one more of this type which needs improvement.

{
"query": {
"bool": {
"must": [{
"range": {
"InceptionDate": {
"gt": "2009-01-01"
}
}
},
{
"range": {
"ExpirationDate": {
"lt": "2013-01-01"
}
}
},
{
"has_child": {
"type": "riskitem",
"query": {
"filtered": {
"filter": {
"or": [{
"bool": {
"must": [{
"term": {
"Address_Admin2Name": "tureni"
}
},
{
"term": {
"Address_Admin2Name_US": "burlington"
}
},
{
"term": {
"CommonCharacteristic_BuildingClass": "62"
}
}]
}
},
{
"bool": {
"must": [{
"term": {
"CommonCharacteristic_ConstructionName": "heavy"
}
},
{
"term": {
"CommonCharacteristic_BuildingScheme": "rms"
}
},
{
"terms": {
"CommonCharacteristic_ValuationType": ["reported",
"reported"]
}
}]
}
}]
}
}
}
}
}]
}
}
}

On Tuesday, 20 August 2013 10:24:48 UTC-7, VB wrote:

Hi all,

We made changes as suggested by Matt to use bitsets.

We ran 50 concurrent users (Read Only) for an hour. All our queries are
performing 4 to 5 times faster, except parent child query (query in
question) it has gone down from 7 seconds to 3 seconds.

Matt, thank you so much fort helping us. Is there anything else we can do
in parent child one or in general.

I have one more query with has_child in it. Do you think we can further
improve this one?

{
"query": {
"filtered": {
"query": {
"bool": {
"must": [{
"match": {
"LineOfBusiness": "LOBValue1"
}
}]
}
},
"filter": {
"has_child": {
"type": "riskitem",
"filter": {
"bool": {
"must": [{
"term": {
"Address_Admin1Name": "Admin1Name1"
}
}]
}
}
}
}
}
}
}

Regards,
VB.

On Monday, 19 August 2013 15:04:58 UTC-7, VB wrote:

Thanks Matt. I will run thorough this and post my observation.

On Monday, 19 August 2013 15:00:42 UTC-7, Matt Weber wrote:

Yes, you can and should use bool query/filter with your parent/child
queries. Read the article that I linked to know when and when not to use
them. Looking at your original query, I would probably go with something
like this:

{
"query": {
"filtered" : {
"query": {
"bool": {
"must": [
{"match": {"RiskItemProperty1": "abc"}},
{"match": {"RiskItemProperty2": "xyz"}}
]
}
},
"filter": {
"has_parent": {
"parent_type": "contract",
"filter": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
}
}
}
}

Remember that your first couple has_parent or has_child filters and
queries are going to be slower due to id cache being loaded into memory.

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:46 PM, VB vishal....@gmail.com wrote:

Thanks Matt.

Can we use this on our parent child queries? and how to write parent
child queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not use
it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:

You should use a Bool Filter with must clauses, read this:
http://www.elasticsearch.org/blog/all-about-elasticsearch-
filter-bitsets/http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{"term": {"CommonCharacteristic_**BuildingScheme":
"BuildingScheme1"}},
{"term": {"Address_Admin2Name": "Admin2Name1"}}
]
}
}
}
}
}

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:23 PM, VB vishal....@gmail.com wrote:

Alex we cannot go with denormalizing data, as you mentioned it
would need to update each parent document for any change any attribute of
the child document. Is there anything else you can propose.

In general also our queries from one table are also slower

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_**BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort
on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of
data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:

Hi VB,

I do not know your use case but have you considered denormalizing
your data? I other words storing parent object as part of its children
json. Your query and facet performance wull be much better but the price to
pay is having to update every child record if parent changes. Plus if
majority of your searches need to return parent you would need to have some
way of distincting single parent record out of potentially multiple hits on
this parent/child. Still it may worth it. Your parent record is pretty
small so the key is just how often it changes. Maybe bring only some of the
parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Can anyone please comment/help?

On Tuesday, 20 August 2013 11:33:39 UTC-7, VB wrote:

And one more of this type which needs improvement.

{
"query": {
"bool": {
"must": [{
"range": {
"InceptionDate": {
"gt": "2009-01-01"
}
}
},
{
"range": {
"ExpirationDate": {
"lt": "2013-01-01"
}
}
},
{
"has_child": {
"type": "riskitem",
"query": {
"filtered": {
"filter": {
"or": [{
"bool": {
"must": [{
"term": {
"Address_Admin2Name": "tureni"
}
},
{
"term": {
"Address_Admin2Name_US": "burlington"
}
},
{
"term": {
"CommonCharacteristic_BuildingClass": "62"
}
}]
}
},
{
"bool": {
"must": [{
"term": {
"CommonCharacteristic_ConstructionName": "heavy"
}
},
{
"term": {
"CommonCharacteristic_BuildingScheme": "rms"
}
},
{
"terms": {
"CommonCharacteristic_ValuationType": ["reported",
"reported"]
}
}]
}
}]
}
}
}
}
}]
}
}
}

On Tuesday, 20 August 2013 10:24:48 UTC-7, VB wrote:

Hi all,

We made changes as suggested by Matt to use bitsets.

We ran 50 concurrent users (Read Only) for an hour. All our queries are
performing 4 to 5 times faster, except parent child query (query in
question) it has gone down from 7 seconds to 3 seconds.

Matt, thank you so much fort helping us. Is there anything else we can do
in parent child one or in general.

I have one more query with has_child in it. Do you think we can further
improve this one?

{
"query": {
"filtered": {
"query": {
"bool": {
"must": [{
"match": {
"LineOfBusiness": "LOBValue1"
}
}]
}
},
"filter": {
"has_child": {
"type": "riskitem",
"filter": {
"bool": {
"must": [{
"term": {
"Address_Admin1Name": "Admin1Name1"
}
}]
}
}
}
}
}
}
}

Regards,
VB.

On Monday, 19 August 2013 15:04:58 UTC-7, VB wrote:

Thanks Matt. I will run thorough this and post my observation.

On Monday, 19 August 2013 15:00:42 UTC-7, Matt Weber wrote:

Yes, you can and should use bool query/filter with your parent/child
queries. Read the article that I linked to know when and when not to use
them. Looking at your original query, I would probably go with something
like this:

{
"query": {
"filtered" : {
"query": {
"bool": {
"must": [
{"match": {"RiskItemProperty1": "abc"}},
{"match": {"RiskItemProperty2": "xyz"}}
]
}
},
"filter": {
"has_parent": {
"parent_type": "contract",
"filter": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
}
}
}
}

Remember that your first couple has_parent or has_child filters and
queries are going to be slower due to id cache being loaded into memory.

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:46 PM, VB vishal....@gmail.com wrote:

Thanks Matt.

Can we use this on our parent child queries? and how to write parent
child queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not
use it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:

You should use a Bool Filter with must clauses, read this:
http://www.elasticsearch.org/blog/all-about-elasticsearch-
filter-bitsets/http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{"term": {"CommonCharacteristic_**BuildingScheme":
"BuildingScheme1"}},
{"term": {"Address_Admin2Name":
"Admin2Name1"}}
]
}
}
}
}
}

Thanks,
Matt Weber

On Mon, Aug 19, 2013 at 2:23 PM, VB vishal....@gmail.com wrote:

Alex we cannot go with denormalizing data, as you mentioned it
would need to update each parent document for any change any attribute of
the child document. Is there anything else you can propose.

In general also our queries from one table are also slower

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_**BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has
sort on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of
data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:

Hi VB,

I do not know your use case but have you considered denormalizing
your data? I other words storing parent object as part of its children
json. Your query and facet performance wull be much better but the price to
pay is having to update every child record if parent changes. Plus if
majority of your searches need to return parent you would need to have some
way of distincting single parent record out of potentially multiple hits on
this parent/child. Still it may worth it. Your parent record is pretty
small so the key is just how often it changes. Maybe bring only some of the
parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.