Shard failures on a lot of terms but not every terms

Hello,

I have this shard failures error when I'm trying to search some terms but I
don't know why because for some other terms search is working.
I got this in the log file:
SearchPhaseExecutionException[Failed to execute phase [query], total failure
; shardFailures {[9JV9YILBSlek2mjI55A6ig][website][2]:
QueryPhaseExecutionException[[website][2]

So I know that [9JV9YILBSlek2mjI55A6ig] is my cluster name, [website] is
the name of my index but I don't know what is the [2]. And this number
changes, sometimes it is a [4]. Because I don't know what it is, I don't
know where to look for a fix.

I made a _status and I got this, maybe it'll be helpful: (Sorry it is
pretty big)

"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "9JV9YILBSlek2mjI55A6ig",
"relocating_node" : null,
"shard" : 0,
"index" : "website"
},
"state" : "STARTED",
"index" : {
"size" : "192.2mb",
"size_in_bytes" : 201611783
},
"translog" : {
"id" : 1370618985492,
"operations" : 2
},
"docs" : {
"num_docs" : 55003,
"max_doc" : 55046,
"deleted_docs" : 43
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 18433,
"total_time" : "1.7m",
"total_time_in_millis" : 106656
},
"flush" : {
"total" : 141,
"total_time" : "12.7s",
"total_time_in_millis" : 12788
}
} ],
"1" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "9JV9YILBSlek2mjI55A6ig",
"relocating_node" : null,
"shard" : 1,
"index" : "website"
},
"state" : "STARTED",
"index" : {
"size" : "210.8mb",
"size_in_bytes" : 221089213
},
"translog" : {
"id" : 1370618985965,
"operations" : 0
},
"docs" : {
"num_docs" : 55064,
"max_doc" : 55101,
"deleted_docs" : 37
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 29480,
"total_time" : "3m",
"total_time_in_millis" : 181118
},
"flush" : {
"total" : 141,
"total_time" : "18.9s",
"total_time_in_millis" : 18929
}
} ],
"2" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "9JV9YILBSlek2mjI55A6ig",
"relocating_node" : null,
"shard" : 2,
"index" : "website"
},
"state" : "STARTED",
"index" : {
"size" : "175.9mb",
"size_in_bytes" : 184467723
},
"translog" : {
"id" : 1370618985663,
"operations" : 0
},
"docs" : {
"num_docs" : 54940,
"max_doc" : 54983,
"deleted_docs" : 43
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 18258,
"total_time" : "1.7m",
"total_time_in_millis" : 105009
},
"flush" : {
"total" : 141,
"total_time" : "10.4s",
"total_time_in_millis" : 10453
}
} ],
"3" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "9JV9YILBSlek2mjI55A6ig",
"relocating_node" : null,
"shard" : 3,
"index" : "website"
},
"state" : "STARTED",
"index" : {
"size" : "178.2mb",
"size_in_bytes" : 186951530
},
"translog" : {
"id" : 1370618985706,
"operations" : 2
},
"docs" : {
"num_docs" : 55039,
"max_doc" : 55072,
"deleted_docs" : 33
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "247ms",
"total_time_in_millis" : 247,
"total_docs" : 3914,
"total_size" : "3.9mb",
"total_size_in_bytes" : 4185347
},
"refresh" : {
"total" : 18369,
"total_time" : "1.7m",
"total_time_in_millis" : 105709
},
"flush" : {
"total" : 141,
"total_time" : "12.5s",
"total_time_in_millis" : 12570
}
} ],
"4" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "9JV9YILBSlek2mjI55A6ig",
"relocating_node" : null,
"shard" : 4,
"index" : "website"
},
"state" : "STARTED",
"index" : {
"size" : "193.9mb",
"size_in_bytes" : 203330352
},
"translog" : {
"id" : 1370618985861,
"operations" : 5
},
"docs" : {
"num_docs" : 55010,
"max_doc" : 55042,
"deleted_docs" : 32
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 18409,
"total_time" : "1.7m",
"total_time_in_millis" : 106237
},
"flush" : {
"total" : 141,
"total_time" : "13.1s",
"total_time_in_millis" : 13145
}

Does anyone knows where it could come from?
Thanks!

-Damien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

can you reproduce this with certain terms? Or does it happen randomly? Is
there anything in the logfile maybe?
Can you create a reproduce gist we could try ourselves?

--Alex

On Mon, Jun 10, 2013 at 6:04 PM, Damien damien.soulard@gmail.com wrote:

Hello,

I have this shard failures error when I'm trying to search some terms but
I don't know why because for some other terms search is working.
I got this in the log file:
SearchPhaseExecutionException[Failed to execute phase [query], total
failure; shardFailures {[9JV9YILBSlek2mjI55A6ig][website][2]:
QueryPhaseExecutionException[[website][2]

So I know that [9JV9YILBSlek2mjI55A6ig] is my cluster name, [website] is
the name of my index but I don't know what is the [2]. And this number
changes, sometimes it is a [4]. Because I don't know what it is, I don't
know where to look for a fix.

I made a _status and I got this, maybe it'll be helpful: (Sorry it is
pretty big)

"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "9JV9YILBSlek2mjI55A6ig",
"relocating_node" : null,
"shard" : 0,
"index" : "website"
},
"state" : "STARTED",
"index" : {
"size" : "192.2mb",
"size_in_bytes" : 201611783
},
"translog" : {
"id" : 1370618985492,
"operations" : 2
},
"docs" : {
"num_docs" : 55003,
"max_doc" : 55046,
"deleted_docs" : 43
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 18433,
"total_time" : "1.7m",
"total_time_in_millis" : 106656
},
"flush" : {
"total" : 141,
"total_time" : "12.7s",
"total_time_in_millis" : 12788
}
} ],
"1" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "9JV9YILBSlek2mjI55A6ig",
"relocating_node" : null,
"shard" : 1,
"index" : "website"
},
"state" : "STARTED",
"index" : {
"size" : "210.8mb",
"size_in_bytes" : 221089213
},
"translog" : {
"id" : 1370618985965,
"operations" : 0
},
"docs" : {
"num_docs" : 55064,
"max_doc" : 55101,
"deleted_docs" : 37
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 29480,
"total_time" : "3m",
"total_time_in_millis" : 181118
},
"flush" : {
"total" : 141,
"total_time" : "18.9s",
"total_time_in_millis" : 18929
}
} ],
"2" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "9JV9YILBSlek2mjI55A6ig",
"relocating_node" : null,
"shard" : 2,
"index" : "website"
},
"state" : "STARTED",
"index" : {
"size" : "175.9mb",
"size_in_bytes" : 184467723
},
"translog" : {
"id" : 1370618985663,
"operations" : 0
},
"docs" : {
"num_docs" : 54940,
"max_doc" : 54983,
"deleted_docs" : 43
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 18258,
"total_time" : "1.7m",
"total_time_in_millis" : 105009
},
"flush" : {
"total" : 141,
"total_time" : "10.4s",
"total_time_in_millis" : 10453
}
} ],
"3" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "9JV9YILBSlek2mjI55A6ig",
"relocating_node" : null,
"shard" : 3,
"index" : "website"
},
"state" : "STARTED",
"index" : {
"size" : "178.2mb",
"size_in_bytes" : 186951530
},
"translog" : {
"id" : 1370618985706,
"operations" : 2
},
"docs" : {
"num_docs" : 55039,
"max_doc" : 55072,
"deleted_docs" : 33
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "247ms",
"total_time_in_millis" : 247,
"total_docs" : 3914,
"total_size" : "3.9mb",
"total_size_in_bytes" : 4185347
},
"refresh" : {
"total" : 18369,
"total_time" : "1.7m",
"total_time_in_millis" : 105709
},
"flush" : {
"total" : 141,
"total_time" : "12.5s",
"total_time_in_millis" : 12570
}
} ],
"4" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "9JV9YILBSlek2mjI55A6ig",
"relocating_node" : null,
"shard" : 4,
"index" : "website"
},
"state" : "STARTED",
"index" : {
"size" : "193.9mb",
"size_in_bytes" : 203330352
},
"translog" : {
"id" : 1370618985861,
"operations" : 5
},
"docs" : {
"num_docs" : 55010,
"max_doc" : 55042,
"deleted_docs" : 32
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 18409,
"total_time" : "1.7m",
"total_time_in_millis" : 106237
},
"flush" : {
"total" : 141,
"total_time" : "13.1s",
"total_time_in_millis" : 13145
}

Does anyone knows where it could come from?
Thanks!

-Damien

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Alex,

Thanks for your reply.
And yes, it always happens with the same terms.
Here is what I have in the log file perexample with the 'music' term.

request.INFO: Matched route "_website_search" (parameters: "term": "null",
"_controller":
"Ology\SocialBundle\Controller\Website\SearchController::getSearchPage",
"_route": "_website_search")
[2013-06-11 10:56:15] request.CRITICAL: Elastica_Exception_Response:
SearchPhaseExecutionException[Failed to execute phase [query], total
failure; shardFailures {[9JV9YILBSlek2mjI55A6ig][website][0]:
QueryPhaseExecutionException[[website][0]: query[filtered(custom score
(name:"music" name:"musicology" name:"music
ology",function=script[myCustomScript]

[Failed to execute main query]]; nested:
}{[9JV9YILBSlek2mjI55A6ig][website][3]:
QueryPhaseExecutionException[[website][3]: query[filtered(custom score
(name:"music" name:"musicology" name:"music ology",function=script[
myCustomScript]

[Failed to execute main query]]; nested:
}{[9JV9YILBSlek2mjI55A6ig][website][4]:
QueryPhaseExecutionException[[website][4]: query[filtered(custom score
(name:"music" name:"musicology" name:"music ology",function=script[
myCustomScript]

And sorry but I don't know how to create a gist to reproduce it and I don't
think it is possible because it'd have to have the same indexes as mine.
Thanks for your time.

-Damien

Le mardi 11 juin 2013 03:12:16 UTC-4, Alexander Reelsen a écrit :

Hey,

can you reproduce this with certain terms? Or does it happen randomly? Is
there anything in the logfile maybe?
Can you create a reproduce gist we could try ourselves?

--Alex

On Mon, Jun 10, 2013 at 6:04 PM, Damien <damien....@gmail.com<javascript:>

wrote:

Hello,

I have this shard failures error when I'm trying to search some terms but
I don't know why because for some other terms search is working.
I got this in the log file:
SearchPhaseExecutionException[Failed to execute phase [query], total
failure; shardFailures {[9JV9YILBSlek2mjI55A6ig][website][2]:
QueryPhaseExecutionException[[website][2]

So I know that [9JV9YILBSlek2mjI55A6ig] is my cluster name, [website] is
the name of my index but I don't know what is the [2]. And this number
changes, sometimes it is a [4]. Because I don't know what it is, I don't
know where to look for a fix.

I made a _status and I got this, maybe it'll be helpful: (Sorry it is
pretty big)

"shards" : {
"0" : [ {

      "routing" : {
        "state" : "STARTED",

        "primary" : true,

        "node" : "9JV9YILBSlek2mjI55A6ig",

        "relocating_node" : null,

        "shard" : 0,

        "index" : "website"
      },

      "state" : "STARTED",

      "index" : {
        "size" : "192.2mb",

        "size_in_bytes" : 201611783
      },

      "translog" : {
        "id" : 1370618985492,

        "operations" : 2
      },

      "docs" : {
        "num_docs" : 55003,

        "max_doc" : 55046,

        "deleted_docs" : 43
      },

      "merges" : {
        "current" : 0,

        "current_docs" : 0,

        "current_size" : "0b",

        "current_size_in_bytes" : 0,

        "total" : 0,

        "total_time" : "0s",

        "total_time_in_millis" : 0,

        "total_docs" : 0,

        "total_size" : "0b",

        "total_size_in_bytes" : 0
      },

      "refresh" : {
        "total" : 18433,

        "total_time" : "1.7m",

        "total_time_in_millis" : 106656
      },

      "flush" : {
        "total" : 141,

        "total_time" : "12.7s",

        "total_time_in_millis" : 12788
      }

    } ],
    "1" : [ {

      "routing" : {
        "state" : "STARTED",

        "primary" : true,

        "node" : "9JV9YILBSlek2mjI55A6ig",

        "relocating_node" : null,

        "shard" : 1,

        "index" : "website"
      },

      "state" : "STARTED",

      "index" : {
        "size" : "210.8mb",

        "size_in_bytes" : 221089213
      },

      "translog" : {
        "id" : 1370618985965,

        "operations" : 0
      },

      "docs" : {
        "num_docs" : 55064,

        "max_doc" : 55101,

        "deleted_docs" : 37
      },

      "merges" : {
        "current" : 0,

        "current_docs" : 0,

        "current_size" : "0b",

        "current_size_in_bytes" : 0,

        "total" : 0,

        "total_time" : "0s",

        "total_time_in_millis" : 0,

        "total_docs" : 0,

        "total_size" : "0b",

        "total_size_in_bytes" : 0
      },

      "refresh" : {
        "total" : 29480,

        "total_time" : "3m",

        "total_time_in_millis" : 181118
      },

      "flush" : {
        "total" : 141,

        "total_time" : "18.9s",

        "total_time_in_millis" : 18929
      }

    } ],
    "2" : [ {

      "routing" : {
        "state" : "STARTED",

        "primary" : true,

        "node" : "9JV9YILBSlek2mjI55A6ig",

        "relocating_node" : null,

        "shard" : 2,

        "index" : "website"
      },

      "state" : "STARTED",

      "index" : {
        "size" : "175.9mb",

        "size_in_bytes" : 184467723
      },

      "translog" : {
        "id" : 1370618985663,

        "operations" : 0
      },

      "docs" : {
        "num_docs" : 54940,

        "max_doc" : 54983,

        "deleted_docs" : 43
      },

      "merges" : {
        "current" : 0,

        "current_docs" : 0,

        "current_size" : "0b",

        "current_size_in_bytes" : 0,

        "total" : 0,

        "total_time" : "0s",

        "total_time_in_millis" : 0,

        "total_docs" : 0,

        "total_size" : "0b",

        "total_size_in_bytes" : 0
      },

      "refresh" : {
        "total" : 18258,

        "total_time" : "1.7m",

        "total_time_in_millis" : 105009
      },

      "flush" : {
        "total" : 141,

        "total_time" : "10.4s",

        "total_time_in_millis" : 10453
      }

    } ],
    "3" : [ {

      "routing" : {
        "state" : "STARTED",

        "primary" : true,

        "node" : "9JV9YILBSlek2mjI55A6ig",

        "relocating_node" : null,

        "shard" : 3,

        "index" : "website"
      },

      "state" : "STARTED",

      "index" : {
        "size" : "178.2mb",

        "size_in_bytes" : 186951530
      },

      "translog" : {
        "id" : 1370618985706,

        "operations" : 2
      },

      "docs" : {
        "num_docs" : 55039,

        "max_doc" : 55072,

        "deleted_docs" : 33
      },

      "merges" : {
        "current" : 0,

        "current_docs" : 0,

        "current_size" : "0b",

        "current_size_in_bytes" : 0,

        "total" : 1,

        "total_time" : "247ms",

        "total_time_in_millis" : 247,

        "total_docs" : 3914,

        "total_size" : "3.9mb",

        "total_size_in_bytes" : 4185347
      },

      "refresh" : {
        "total" : 18369,

        "total_time" : "1.7m",

        "total_time_in_millis" : 105709
      },

      "flush" : {
        "total" : 141,

        "total_time" : "12.5s",

        "total_time_in_millis" : 12570
      }

    } ],
    "4" : [ {

      "routing" : {
        "state" : "STARTED",

        "primary" : true,

        "node" : "9JV9YILBSlek2mjI55A6ig",

        "relocating_node" : null,

        "shard" : 4,

        "index" : "website"
      },

      "state" : "STARTED",

      "index" : {
        "size" : "193.9mb",

        "size_in_bytes" : 203330352
      },

      "translog" : {
        "id" : 1370618985861,

        "operations" : 5
      },

      "docs" : {
        "num_docs" : 55010,

        "max_doc" : 55042,

        "deleted_docs" : 32
      },

      "merges" : {
        "current" : 0,

        "current_docs" : 0,

        "current_size" : "0b",

        "current_size_in_bytes" : 0,

        "total" : 0,

        "total_time" : "0s",

        "total_time_in_millis" : 0,

        "total_docs" : 0,

        "total_size" : "0b",

        "total_size_in_bytes" : 0
      },

      "refresh" : {
        "total" : 18409,

        "total_time" : "1.7m",

        "total_time_in_millis" : 106237
      },

      "flush" : {
        "total" : 141,

        "total_time" : "13.1s",

        "total_time_in_millis" : 13145
      }

Does anyone knows where it could come from?
Thanks!

-Damien

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

So after a lot of investigation, I found that my customScript is the
problem but I don't know why. It works for a lot of other terms!
Here is my customScript just in case it can help:

timeModifier = (param14 - doc['creationDate'].value) / param15;
score =
((doc['nbEditorMembers'].value +

doc['nbRegularMembers'].value) * param1 +
(doc['nbEditorPosts'].value
+

doc['nbRegularPosts'].value) * param2 +
doc['nbPostComments'].value

  • param3 +
    doc['nbPostsViews'].value *
    param4 +

doc['nbPostsOlogized'].value * param5);
if (timeModifier >= 365){
score * param13 * _score;
}
else if (timeModifier >= 279){
score * param12 * _score;
}
else if (timeModifier >= 186){
score * param11 * _score;
}
else if (timeModifier >= 93){
score * param10 * _score;
}
else if (timeModifier >= 31){
score * param9 * _score;
}
else if (timeModifier >= 7){
score * param8 * _score;
}
else if (timeModifier >= 1){
score * param7 * _score;
}
else {
score * param6 * _score;
}

The only parameters that are not the same between every terms are the
doc['xxx'].value.
Can the error might come from one of these?

-Damien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ok, I finally found the bug. Apparently, elasticsearch script is very
sensitive to spaces/indentations and for some reasons I don't know, my
indentation breaks the search with some terms. I just changed the
indentation of the script and now it works.

-Damien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.