Refresh + child documents

Hi,

In our unit tests for our app, we're seeing a couple of intermittent search failures when doing the following:

  1. Create / Update a bunch of resources
  2. Trigger an index refresh (with consistency == 'all')
  3. Search
    -> Failure because of missing expected results

The search in step 3 is a query that searches through child documents.
Is it possible that when the refresh from step 2 returns, the child documents haven't been
fully re-indexed yet?

We're only seeing the failure intermittently on a low-spec box under heavy load.

Kind regards,

Simon

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

can you share the test maybe?

--Alex

On Thu, Sep 12, 2013 at 4:50 PM, Simon Gaeremynck gaeremyncks@gmail.comwrote:

Hi,

In our unit tests for our app, we're seeing a couple of intermittent
search failures when doing the following:

  1. Create / Update a bunch of resources
  2. Trigger an index refresh (with consistency == 'all')
  3. Search
    -> Failure because of missing expected results

The search in step 3 is a query that searches through child documents.
Is it possible that when the refresh from step 2 returns, the child
documents haven't been
fully re-indexed yet?

We're only seeing the failure intermittently on a low-spec box under heavy
load.

Kind regards,

Simon

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Alexander,

The test can be found at [1] but it's not that straightforward to set up as
the app requires quite a bit of dependencies.

First some background info:

Our app has lots of content that can be shared with lots of users. Content
can be marked as private to a set of users.
We've modelled that in ES by having top-level content documents which have
a child document per user that has access to the content item.

When we do a general search we add the user id of the current user and run
a has_child filter as well.

I'll try my best to explain what the test is doing and what happens in the
background.

  1. The first couple of lines creates a couple of users
  2. A piece of content gets created by user A
  3. That content item gets shared with user B
  4. The content items gets put in the index
  5. All the users who have access to that piece of content are added as
    child documents
  6. Refresh the search index (consistency = all)
  7. Search for the content item
  8. Assert we get the content item

Occasionally (25% of the time maybe) the assertion in step 8 fails.

Is it possible that when our application gets a response from the refresh
request (6) ES hasn't
actually fully re-indexed everything?

FWIW, we've set the number of shards to 1 and the number of replica's to 0
as per the elasticsearch.yml recommendation
for dev environments and that seems to help somewhat. (presumably because
there are less shards and replicas to process
thus causing less IO)

The full query can be found at [2]

Kind regards,

Simon

[1] https://github.com/oaeproject/Hilary/blob/master/node_modules/oae-content/tests/test-library-search.js#L247
[2]
{
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"q_high^2.0",
"q_low^0.75"
],
"query": "*"
}
},
"filter": {
"and": [
{
"term": {
"_type": "resource"
}
},
{
"term": {
"resourceType": "content"
}
},
{
"has_child": {
"type": "resource_members",
"query": {
"terms": {
"direct_members": [
"u:camtest:gJ-kkTf-2W"
]
}
}
}
}
]
}
}
},
"from": 1,
"size": 25,
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"sort": "asc"
}
],
"min_score": 0.2
}

On Thursday, September 12, 2013 4:28:06 PM UTC+1, Alexander Reelsen wrote:

Hey,

can you share the test maybe?

--Alex

On Thu, Sep 12, 2013 at 4:50 PM, Simon Gaeremynck <gaere...@gmail.com<javascript:>

wrote:

Hi,

In our unit tests for our app, we're seeing a couple of intermittent
search failures when doing the following:

  1. Create / Update a bunch of resources
  2. Trigger an index refresh (with consistency == 'all')
  3. Search
    -> Failure because of missing expected results

The search in step 3 is a query that searches through child documents.
Is it possible that when the refresh from step 2 returns, the child
documents haven't been
fully re-indexed yet?

We're only seeing the failure intermittently on a low-spec box under
heavy load.

Kind regards,

Simon

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

Where do you execute the explicit refresh operation? (I can't see it in the
script you have shared)

Martijn

On 13 September 2013 17:10, Simon Gaeremynck gaeremyncks@gmail.com wrote:

Hi Alexander,

The test can be found at [1] but it's not that straightforward to set up
as the app requires quite a bit of dependencies.

First some background info:

Our app has lots of content that can be shared with lots of users. Content
can be marked as private to a set of users.
We've modelled that in ES by having top-level content documents which have
a child document per user that has access to the content item.

When we do a general search we add the user id of the current user and run
a has_child filter as well.

I'll try my best to explain what the test is doing and what happens in the
background.

  1. The first couple of lines creates a couple of users
  2. A piece of content gets created by user A
  3. That content item gets shared with user B
  4. The content items gets put in the index
  5. All the users who have access to that piece of content are added as
    child documents
  6. Refresh the search index (consistency = all)
  7. Search for the content item
  8. Assert we get the content item

Occasionally (25% of the time maybe) the assertion in step 8 fails.

Is it possible that when our application gets a response from the refresh
request (6) ES hasn't
actually fully re-indexed everything?

FWIW, we've set the number of shards to 1 and the number of replica's to 0
as per the elasticsearch.yml recommendation
for dev environments and that seems to help somewhat. (presumably because
there are less shards and replicas to process
thus causing less IO)

The full query can be found at [2]

Kind regards,

Simon

[1]
https://github.com/oaeproject/Hilary/blob/master/node_modules/oae-content/tests/test-library-search.js#L247
[2]
{
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"q_high^2.0",
"q_low^0.75"
],
"query": "*"
}
},
"filter": {
"and": [
{
"term": {
"_type": "resource"
}
},
{
"term": {
"resourceType": "content"
}
},
{
"has_child": {
"type": "resource_members",
"query": {
"terms": {
"direct_members": [
"u:camtest:gJ-kkTf-2W"
]
}
}
}
}
]
}
}
},
"from": 1,
"size": 25,
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"sort": "asc"
}
],
"min_score": 0.2
}

On Thursday, September 12, 2013 4:28:06 PM UTC+1, Alexander Reelsen wrote:

Hey,

can you share the test maybe?

--Alex

On Thu, Sep 12, 2013 at 4:50 PM, Simon Gaeremynck gaere...@gmail.comwrote:

Hi,

In our unit tests for our app, we're seeing a couple of intermittent
search failures when doing the following:

  1. Create / Update a bunch of resources
  2. Trigger an index refresh (with consistency == 'all')
  3. Search
    -> Failure because of missing expected results

The search in step 3 is a query that searches through child documents.
Is it possible that when the refresh from step 2 returns, the child
documents haven't been
fully re-indexed yet?

We're only seeing the failure intermittently on a low-spec box under
heavy load.

Kind regards,

Simon

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

That happens in SearchTestsUtil.searchAll [1].
We wait till:

  • all items from the "index doc" queue have been picked off
  • all items from the "delete doc" queue have been picked off
  • the index has been refreshed (with searchRefreshed).
  • Get all the data

Kind regards,

Simon

[1] https://github.com/oaeproject/Hilary/blob/master/node_modules/oae-search/lib/test/util.js#L30

On 16 Sep 2013, at 09:19, Martijn v Groningen martijn.v.groningen@gmail.com wrote:

Hi,

Where do you execute the explicit refresh operation? (I can't see it in the script you have shared)

Martijn

On 13 September 2013 17:10, Simon Gaeremynck gaeremyncks@gmail.com wrote:
Hi Alexander,

The test can be found at [1] but it's not that straightforward to set up as the app requires quite a bit of dependencies.

First some background info:

Our app has lots of content that can be shared with lots of users. Content can be marked as private to a set of users.
We've modelled that in ES by having top-level content documents which have a child document per user that has access to the content item.

When we do a general search we add the user id of the current user and run a has_child filter as well.

I'll try my best to explain what the test is doing and what happens in the background.

  1. The first couple of lines creates a couple of users
  2. A piece of content gets created by user A
  3. That content item gets shared with user B
  4. The content items gets put in the index
  5. All the users who have access to that piece of content are added as child documents
  6. Refresh the search index (consistency = all)
  7. Search for the content item
  8. Assert we get the content item

Occasionally (25% of the time maybe) the assertion in step 8 fails.

Is it possible that when our application gets a response from the refresh request (6) ES hasn't
actually fully re-indexed everything?

FWIW, we've set the number of shards to 1 and the number of replica's to 0 as per the elasticsearch.yml recommendation
for dev environments and that seems to help somewhat. (presumably because there are less shards and replicas to process
thus causing less IO)

The full query can be found at [2]

Kind regards,

Simon

[1] https://github.com/oaeproject/Hilary/blob/master/node_modules/oae-content/tests/test-library-search.js#L247
[2]
{
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"q_high^2.0",
"q_low^0.75"
],
"query": "*"
}
},
"filter": {
"and": [
{
"term": {
"_type": "resource"
}
},
{
"term": {
"resourceType": "content"
}
},
{
"has_child": {
"type": "resource_members",
"query": {
"terms": {
"direct_members": [
"u:camtest:gJ-kkTf-2W"
]
}
}
}
}
]
}
}
},
"from": 1,
"size": 25,
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"sort": "asc"
}
],
"min_score": 0.2
}

On Thursday, September 12, 2013 4:28:06 PM UTC+1, Alexander Reelsen wrote:
Hey,

can you share the test maybe?

--Alex

On Thu, Sep 12, 2013 at 4:50 PM, Simon Gaeremynck gaere...@gmail.com wrote:
Hi,

In our unit tests for our app, we're seeing a couple of intermittent search failures when doing the following:

  1. Create / Update a bunch of resources
  2. Trigger an index refresh (with consistency == 'all')
  3. Search
    -> Failure because of missing expected results

The search in step 3 is a query that searches through child documents.
Is it possible that when the refresh from step 2 returns, the child documents haven't been
fully re-indexed yet?

We're only seeing the failure intermittently on a low-spec box under heavy load.

Kind regards,

Simon

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/M-ByUcpEDSM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

So this executes the ES refresh?

ElasticSearch.refresh(function(err) {
      ...
});

I don't know this library, so I can't really tell if the actual refresh is
performed. The refresh response header should indicate on how many shards
it successfully succeeded. Can you verify if this is equal to the number of
shards of the index that you're refreshing? (primary and replica shards)

On 16 September 2013 10:25, Simon Gaeremynck gaeremyncks@gmail.com wrote:

Hi,

That happens in SearchTestsUtil.searchAll [1].
We wait till:

  • all items from the "index doc" queue have been picked off
  • all items from the "delete doc" queue have been picked off
  • the index has been refreshed (with searchRefreshed).
  • Get all the data

Kind regards,

Simon

[1]
https://github.com/oaeproject/Hilary/blob/master/node_modules/oae-search/lib/test/util.js#L30

On 16 Sep 2013, at 09:19, Martijn v Groningen <
martijn.v.groningen@gmail.com> wrote:

Hi,

Where do you execute the explicit refresh operation? (I can't see it in
the script you have shared)

Martijn

On 13 September 2013 17:10, Simon Gaeremynck gaeremyncks@gmail.com
wrote:
Hi Alexander,

The test can be found at [1] but it's not that straightforward to set up
as the app requires quite a bit of dependencies.

First some background info:

Our app has lots of content that can be shared with lots of users.
Content can be marked as private to a set of users.
We've modelled that in ES by having top-level content documents which
have a child document per user that has access to the content item.

When we do a general search we add the user id of the current user and
run a has_child filter as well.

I'll try my best to explain what the test is doing and what happens in
the background.

  1. The first couple of lines creates a couple of users
  2. A piece of content gets created by user A
  3. That content item gets shared with user B
  4. The content items gets put in the index
  5. All the users who have access to that piece of content are added as
    child documents
  6. Refresh the search index (consistency = all)
  7. Search for the content item
  8. Assert we get the content item

Occasionally (25% of the time maybe) the assertion in step 8 fails.

Is it possible that when our application gets a response from the
refresh request (6) ES hasn't
actually fully re-indexed everything?

FWIW, we've set the number of shards to 1 and the number of replica's to
0 as per the elasticsearch.yml recommendation
for dev environments and that seems to help somewhat. (presumably
because there are less shards and replicas to process
thus causing less IO)

The full query can be found at [2]

Kind regards,

Simon

[1]
https://github.com/oaeproject/Hilary/blob/master/node_modules/oae-content/tests/test-library-search.js#L247
[2]
{
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"q_high^2.0",
"q_low^0.75"
],
"query": "*"
}
},
"filter": {
"and": [
{
"term": {
"_type": "resource"
}
},
{
"term": {
"resourceType": "content"
}
},
{
"has_child": {
"type": "resource_members",
"query": {
"terms": {
"direct_members": [
"u:camtest:gJ-kkTf-2W"
]
}
}
}
}
]
}
}
},
"from": 1,
"size": 25,
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"sort": "asc"
}
],
"min_score": 0.2
}

On Thursday, September 12, 2013 4:28:06 PM UTC+1, Alexander Reelsen
wrote:
Hey,

can you share the test maybe?

--Alex

On Thu, Sep 12, 2013 at 4:50 PM, Simon Gaeremynck gaere...@gmail.com
wrote:
Hi,

In our unit tests for our app, we're seeing a couple of intermittent
search failures when doing the following:

  1. Create / Update a bunch of resources
  2. Trigger an index refresh (with consistency == 'all')
  3. Search
    -> Failure because of missing expected results

The search in step 3 is a query that searches through child documents.
Is it possible that when the refresh from step 2 returns, the child
documents haven't been
fully re-indexed yet?

We're only seeing the failure intermittently on a low-spec box under
heavy load.

Kind regards,

Simon

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/M-ByUcpEDSM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

That is correct.

I can confirm the refresh is performed. I get the following response:
{
ok: true,
_shards: {
total: 2,
successful: 1,
failed: 0
}
}

That's on a single node instance with
index.number_of_shards: 1
index.number_of_replicas: 0

I don't fully understand why the response has a total of 2 when there is only 1 shard though.

Kind regards,

Simon

On 16 Sep 2013, at 20:39, Martijn v Groningen martijn.v.groningen@gmail.com wrote:

So this executes the ES refresh?

ElasticSearch.refresh(function(err) {
      ...
});

I don't know this library, so I can't really tell if the actual refresh is performed. The refresh response header should indicate on how many shards it successfully succeeded. Can you verify if this is equal to the number of shards of the index that you're refreshing? (primary and replica shards)

On 16 September 2013 10:25, Simon Gaeremynck gaeremyncks@gmail.com wrote:
Hi,

That happens in SearchTestsUtil.searchAll [1].
We wait till:

  • all items from the "index doc" queue have been picked off
  • all items from the "delete doc" queue have been picked off
  • the index has been refreshed (with searchRefreshed).
  • Get all the data

Kind regards,

Simon

[1] https://github.com/oaeproject/Hilary/blob/master/node_modules/oae-search/lib/test/util.js#L30

On 16 Sep 2013, at 09:19, Martijn v Groningen martijn.v.groningen@gmail.com wrote:

Hi,

Where do you execute the explicit refresh operation? (I can't see it in the script you have shared)

Martijn

On 13 September 2013 17:10, Simon Gaeremynck gaeremyncks@gmail.com wrote:
Hi Alexander,

The test can be found at [1] but it's not that straightforward to set up as the app requires quite a bit of dependencies.

First some background info:

Our app has lots of content that can be shared with lots of users. Content can be marked as private to a set of users.
We've modelled that in ES by having top-level content documents which have a child document per user that has access to the content item.

When we do a general search we add the user id of the current user and run a has_child filter as well.

I'll try my best to explain what the test is doing and what happens in the background.

  1. The first couple of lines creates a couple of users
  2. A piece of content gets created by user A
  3. That content item gets shared with user B
  4. The content items gets put in the index
  5. All the users who have access to that piece of content are added as child documents
  6. Refresh the search index (consistency = all)
  7. Search for the content item
  8. Assert we get the content item

Occasionally (25% of the time maybe) the assertion in step 8 fails.

Is it possible that when our application gets a response from the refresh request (6) ES hasn't
actually fully re-indexed everything?

FWIW, we've set the number of shards to 1 and the number of replica's to 0 as per the elasticsearch.yml recommendation
for dev environments and that seems to help somewhat. (presumably because there are less shards and replicas to process
thus causing less IO)

The full query can be found at [2]

Kind regards,

Simon

[1] https://github.com/oaeproject/Hilary/blob/master/node_modules/oae-content/tests/test-library-search.js#L247
[2]
{
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"q_high^2.0",
"q_low^0.75"
],
"query": "*"
}
},
"filter": {
"and": [
{
"term": {
"_type": "resource"
}
},
{
"term": {
"resourceType": "content"
}
},
{
"has_child": {
"type": "resource_members",
"query": {
"terms": {
"direct_members": [
"u:camtest:gJ-kkTf-2W"
]
}
}
}
}
]
}
}
},
"from": 1,
"size": 25,
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"sort": "asc"
}
],
"min_score": 0.2
}

On Thursday, September 12, 2013 4:28:06 PM UTC+1, Alexander Reelsen wrote:
Hey,

can you share the test maybe?

--Alex

On Thu, Sep 12, 2013 at 4:50 PM, Simon Gaeremynck gaere...@gmail.com wrote:
Hi,

In our unit tests for our app, we're seeing a couple of intermittent search failures when doing the following:

  1. Create / Update a bunch of resources
  2. Trigger an index refresh (with consistency == 'all')
  3. Search
    -> Failure because of missing expected results

The search in step 3 is a query that searches through child documents.
Is it possible that when the refresh from step 2 returns, the child documents haven't been
fully re-indexed yet?

We're only seeing the failure intermittently on a low-spec box under heavy load.

Kind regards,

Simon

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/M-ByUcpEDSM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/M-ByUcpEDSM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.