Peformance issues with has_parent filters

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search
system, and we would like to get some ideas/thoughts about this issue from
your guys.

Here is our use case: we have three types of documents in one index:
“campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of
“campaign”, and “campaign” is the parent of “ad”. Each document type has
about 10 simple properties, such as string, long, short. The three kinds of
documents all have a property “user”(long) and a property
“run_status”(short). Documents are hashed by “user”, documents with the
same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3 master
node, and 21 data nodes, and each shard has too replica. The total data
size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1)
checks the run_status of “ads”(run_status is a short type), and it takes
about 100 milliseconds. The query(2) checks both the run_status of “ad”,
and the run_status of its parent, and it takes about 2000 milliseconds. It
looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because
ES cannot support has_parent well)? Or something else cloud result this
problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses cases.

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

===========================Query(2)====================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  },

  {

      "has_parent" : {

          "parent_type": "campaign",

          "filter" : {

              "terms" : {

                  "run_status" : [1]

              }

          }

      }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

What happen if you change size to 10?

David

Le 10 déc. 2014 à 03:53, Xiaolin Xie alphabnu@gmail.com a écrit :

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search system, and we would like to get some ideas/thoughts about this issue from your guys.

Here is our use case: we have three types of documents in one index: “campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of “campaign”, and “campaign” is the parent of “ad”. Each document type has about 10 simple properties, such as string, long, short. The three kinds of documents all have a property “user”(long) and a property “run_status”(short). Documents are hashed by “user”, documents with the same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3 master node, and 21 data nodes, and each shard has too replica. The total data size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1) checks the run_status of “ads”(run_status is a short type), and it takes about 100 milliseconds. The query(2) checks both the run_status of “ad”, and the run_status of its parent, and it takes about 2000 milliseconds. It looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because ES cannot support has_parent well)? Or something else cloud result this problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses cases.

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

===========================Query(2)====================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  },

  {

      "has_parent" : {

          "parent_type": "campaign",

          "filter" : {

              "terms" : {

                  "run_status" : [1]

              }

          }

      }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/B9C2DFF9-2368-4D0E-B09A-96D6A7EFBB78%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Hi David

Both queries return only 3 records. If I change the size to 10, the time
that the two queries take does not change . The first query still takes
about 100 milliseconds, and the second one still takes about 2000
milliseconds. Thanks a lot!

Xiaolin.

On Tuesday, December 9, 2014 10:53:51 PM UTC-8, David Pilato wrote:

What happen if you change size to 10?

David

Le 10 déc. 2014 à 03:53, Xiaolin Xie <alph...@gmail.com <javascript:>> a
écrit :

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search
system, and we would like to get some ideas/thoughts about this issue from
your guys.

Here is our use case: we have three types of documents in one index:
“campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of
“campaign”, and “campaign” is the parent of “ad”. Each document type has
about 10 simple properties, such as string, long, short. The three kinds of
documents all have a property “user”(long) and a property
“run_status”(short). Documents are hashed by “user”, documents with the
same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3 master
node, and 21 data nodes, and each shard has too replica. The total data
size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1)
checks the run_status of “ads”(run_status is a short type), and it takes
about 100 milliseconds. The query(2) checks both the run_status of “ad”,
and the run_status of its parent, and it takes about 2000 milliseconds. It
looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because
ES cannot support has_parent well)? Or something else cloud result this
problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses cases.

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

===========================Query(2)====================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  },

  {

      "has_parent" : {

          "parent_type": "campaign",

          "filter" : {

              "terms" : {

                  "run_status" : [1]

              }

          }

      }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66a2dabb-4508-4c91-b278-3f4aa3d87212%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

There is in fact a performance difference between has_parent and other
filters, as well as a difference in memory/cache use - especially in
earlier versions of ES. This is due to the way in which ES has to query the
parent/child relationship.

I do believe that there are some significant performance improvements to
parent/child documents in 1.3.0+ - check the release notes. Also, I believe
there might have been some tuning and monitoring additions in the newer
versions that might help you. (I'm a user of our cluster, not so much an
administrator, so I'm not so sure on the latter...)

--
Les Barstow, Senior Software Engineer
Return Path, Inc.

On Tue, Dec 9, 2014 at 7:53 PM, Xiaolin Xie alphabnu@gmail.com wrote:

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search
system, and we would like to get some ideas/thoughts about this issue from
your guys.

Here is our use case: we have three types of documents in one index:
“campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of
“campaign”, and “campaign” is the parent of “ad”. Each document type has
about 10 simple properties, such as string, long, short. The three kinds of
documents all have a property “user”(long) and a property
“run_status”(short). Documents are hashed by “user”, documents with the
same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3 master
node, and 21 data nodes, and each shard has too replica. The total data
size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1)
checks the run_status of “ads”(run_status is a short type), and it takes
about 100 milliseconds. The query(2) checks both the run_status of “ad”,
and the run_status of its parent, and it takes about 2000 milliseconds. It
looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because
ES cannot support has_parent well)? Or something else cloud result this
problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses
cases.

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

===========================Query(2)====================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  },

  {

      "has_parent" : {

          "parent_type": "campaign",

          "filter" : {

              "terms" : {

                  "run_status" : [1]

              }

          }

      }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOppbCVrYWBi1EWbuNi0WphqUyxkhmP%2BTiRsk_yb5eFBt7UVLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

We had poor experience with has_parent queries and we ended up implementing
it ourselves using 2 steps:

  1. Filter the parent documents to get a list of IDs.
  2. Filter the child documents and look only for IDs in the list of IDs
    from 1

On Wednesday, December 10, 2014 7:48:55 PM UTC+2, Xiaolin Xie wrote:

Hi David

Both queries return only 3 records. If I change the size to 10, the time
that the two queries take does not change . The first query still takes
about 100 milliseconds, and the second one still takes about 2000
milliseconds. Thanks a lot!

Xiaolin.

On Tuesday, December 9, 2014 10:53:51 PM UTC-8, David Pilato wrote:

What happen if you change size to 10?

David

Le 10 déc. 2014 à 03:53, Xiaolin Xie alph...@gmail.com a écrit :

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search
system, and we would like to get some ideas/thoughts about this issue from
your guys.

Here is our use case: we have three types of documents in one index:
“campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of
“campaign”, and “campaign” is the parent of “ad”. Each document type has
about 10 simple properties, such as string, long, short. The three kinds of
documents all have a property “user”(long) and a property
“run_status”(short). Documents are hashed by “user”, documents with the
same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3
master node, and 21 data nodes, and each shard has too replica. The
total data size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1)
checks the run_status of “ads”(run_status is a short type), and it takes
about 100 milliseconds. The query(2) checks both the run_status of “ad”,
and the run_status of its parent, and it takes about 2000 milliseconds. It
looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because
ES cannot support has_parent well)? Or something else cloud result this
problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses cases.

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

===========================Query(2)====================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  },

  {

      "has_parent" : {

          "parent_type": "campaign",

          "filter" : {

              "terms" : {

                  "run_status" : [1]

              }

          }

      }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c534701d-cd81-4d3c-8150-e3b797cf941a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Ron.

Thanks a lot for the information. we are considering this plan for our
case.

Xiaolin.

On Wednesday, December 10, 2014 10:12:31 AM UTC-8, Ron Sher wrote:

We had poor experience with has_parent queries and we ended up
implementing it ourselves using 2 steps:

  1. Filter the parent documents to get a list of IDs.
  2. Filter the child documents and look only for IDs in the list of IDs
    from 1

On Wednesday, December 10, 2014 7:48:55 PM UTC+2, Xiaolin Xie wrote:

Hi David

Both queries return only 3 records. If I change the size to 10, the time
that the two queries take does not change . The first query still takes
about 100 milliseconds, and the second one still takes about 2000
milliseconds. Thanks a lot!

Xiaolin.

On Tuesday, December 9, 2014 10:53:51 PM UTC-8, David Pilato wrote:

What happen if you change size to 10?

David

Le 10 déc. 2014 à 03:53, Xiaolin Xie alph...@gmail.com a écrit :

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search
system, and we would like to get some ideas/thoughts about this issue from
your guys.

Here is our use case: we have three types of documents in one index:
“campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of
“campaign”, and “campaign” is the parent of “ad”. Each document type has
about 10 simple properties, such as string, long, short. The three kinds of
documents all have a property “user”(long) and a property
“run_status”(short). Documents are hashed by “user”, documents with the
same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3
master node, and 21 data nodes, and each shard has too replica. The
total data size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1)
checks the run_status of “ads”(run_status is a short type), and it takes
about 100 milliseconds. The query(2) checks both the run_status of “ad”,
and the run_status of its parent, and it takes about 2000 milliseconds.
It looks like there are some performance issues with the has_parent
filter.

Do your guys have any thoughts about this problem? Is it
expected(because ES cannot support has_parent well)? Or something else
cloud result this problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses
cases.

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

===========================Query(2)====================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  },

  {

      "has_parent" : {

          "parent_type": "campaign",

          "filter" : {

              "terms" : {

                  "run_status" : [1]

              }

          }

      }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5364fed6-e9c7-4c3b-8799-4c788ed455db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Les

From the release notes of
1.3.0(Elasticsearch Platform — Find real-time answers at scale | Elastic), It does not mention
about the performance improvements to parent/child documents queries. I did
not find it 1.4.0(Elasticsearch Platform — Find real-time answers at scale | Elastic) either.
How did you find that there are significant performance improvements to
parent/child queries? What kind of improvements has it done? and how
significant the improvement is?

Thanks a lot for the help.

Xiaolin.

On Wednesday, December 10, 2014 10:12:32 AM UTC-8, Les Barstow wrote:

There is in fact a performance difference between has_parent and other
filters, as well as a difference in memory/cache use - especially in
earlier versions of ES. This is due to the way in which ES has to query the
parent/child relationship.

I do believe that there are some significant performance improvements to
parent/child documents in 1.3.0+ - check the release notes. Also, I believe
there might have been some tuning and monitoring additions in the newer
versions that might help you. (I'm a user of our cluster, not so much an
administrator, so I'm not so sure on the latter...)

--
Les Barstow, Senior Software Engineer
Return Path, Inc.

On Tue, Dec 9, 2014 at 7:53 PM, Xiaolin Xie <alph...@gmail.com
<javascript:>> wrote:

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search
system, and we would like to get some ideas/thoughts about this issue from
your guys.

Here is our use case: we have three types of documents in one index:
“campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of
“campaign”, and “campaign” is the parent of “ad”. Each document type has
about 10 simple properties, such as string, long, short. The three kinds of
documents all have a property “user”(long) and a property
“run_status”(short). Documents are hashed by “user”, documents with the
same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3
master node, and 21 data nodes, and each shard has too replica. The
total data size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing
query(1) checks the run_status of “ads”(run_status is a short type), and it
takes about 100 milliseconds. The query(2) checks both the run_status of
“ad”, and the run_status of its parent, and it takes about 2000
milliseconds. It looks like there are some performance issues with the
has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because
ES cannot support has_parent well)? Or something else cloud result this
problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses
cases.

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

===========================Query(2)====================

{

"filter":{

"and":[

  {

    "term":{

      "user":1436594776581528

    }

  },

  {

    "terms":{

      "run_status":[

        1

      ]

    }

  },

  {

      "has_parent" : {

          "parent_type": "campaign",

          "filter" : {

              "terms" : {

                  "run_status" : [1]

              }

          }

      }

  }

]

},

"sort":{

"_uid":"desc"

},

"size":1000000,

"from":0

}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85c1c4aa-e43e-47e2-ac62-87495f385245%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.