Fragment_size not used for simple queries

Neamar_Tucote · February 25, 2014, 9:39am

Hello,

Using the highlight API for a simple query like this:

curl localhost:9200/company_52fb7b90c8318c4dc800006b/_search -d'{
"fields": [],
"query": {
"filtered": {
"query": {
"match": {
"_all": "i do not"
}
}
}
},
"highlight": {
"fields": {
"metadatas.*": {
"number_of_fragments" : 1,
"fragment_size" : 20
}
}
}
}'

This should return snippet whose size does not exceeds 20 characters. Most
of the time, this works, however i do have one document analyzed with the
same mappings which yields really long snippets - in fact, it is not
truncated, and contains all text.

Here is a sample working as expected:

{"took":21,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":19,"max_score":0.24860834,"hits":[{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd8","_score":0.24860834,"highlight":{"metadatas.text":[",
and do not
hesitate"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd6","_score":0.14883985,"highlight":{"metadatas.text":["
take his child.\nI
do"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c57a9ba7daaa265ffdc8","_score":0.1365959,"highlight":{"metadatas.text":["
resident of DC, I am"]}}]}}

And here is the unruly one:

{"took":122,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":19,"max_score":0.24860834,"hits":[{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd8","_score":0.24860834,"highlight":{"metadatas.text":[",
and do not
hesitate"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd6","_score":0.14883985,"highlight":{"metadatas.text":["
take his child.\nI
do"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c57a9ba7daaa265ffdc8","_score":0.1365959,"highlight":{"metadatas.text":["
resident of DC, I
am"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c57a9ba7daaa265ffdc7","_score":0.13437755,"highlight":{"metadatas.text":[".\nI
do not enlighten those who are not eager to learn, nor
arouse\nthose who are not anxious to give an explanation themselves. If
I\nhave presented one corner of the square and they cannot
come\nback to me with the other three, I should not go over the
points\nagain.\n― Confucius\nBesides explaining JavaScript, this book tries
to be an introduction to the basic\nprinciples of programming. Programming,
it turns out, is hard. The\nfundamental rules are, most of the time, simple
and clear. But programs,\nwhile built on top of these basic rules, tend to
become complex enough to\nintroduce their own rules, their own complexity.
Because of this, programming\nis rarely simple or predictable. As Donald
Knuth, who is something of a\nfounding father of the field, says, it is an
art.\nTo get something out of this book, more than just passive reading is
required.\nTry to stay sharp, make an effort to solve the exercises, and
only continue on\nwhen you are reasonably sure you understand the material
that came before.\nThe computer programmer is a creator of universes for
which he\nalone is responsible. Universes of virtually unlimited complexity
can\nbe created in the form of computer programs.\n― Joseph Weizenbaum,
Computer Power and Human Reason\nA program is many things. It is a piece of
text typed by a programmer, it is\nthe directing force that makes the
computer do what it does, it is data in the\ncomputer's memory,
yet it controls the actions performed on this same\nmemory. Analogies that
try to compare programs to objects we are familiar\nwith tend to fall
short, but a superficially fitting one is that of a machine. The\ngears of
a mechanical watch fit together ingeniously, and if the watchmaker\nwas any
good, it will accurately show the time for many years. The elements\nof a
program fit together in a similar way, and if the programmer knows what\nhe
is doing, the program will run without crashing.\nA computer is a machine
built to act as a host for these immaterial machines.\nComputers themselves
can only do stupidly straightforward things. The reason\nthey are
so useful is that they do these things at an incredibly high
speed. A\nprogram can, by ingeniously combining many of these simple
actions, do very\ncomplicated things.\nTo some of us, writing
computer programs is a fascinating game. A program\nis a building of
thought. It is costless to build, weightless, growing easily under\nour
typing hands. If we get carried away, its size and complexity will grow
out\nof control, confusing even the one who created it. This is the main
problem of\nprogramming. It is why so much of today's software tends to
crash, fail,\nscrew up.\nWhen a program works, it is beautiful. The art of
programming is the skill of\ncontrolling complexity. The great program is
subdued, made simple in its\ncomplexity.\nToday, many programmers believe
that this complexity is best managed by\nusing only a small set of
well-understood techniques in their programs. They\nhave composed strict
rules about the form programs should have, and the\nmore zealous among them
will denounce those who break these rules as bad\nprogrammers.\nWhat
hostility to the richness of programming! To try to reduce it to\nsomething
straightforward and predictable, to place a taboo on all the weird\nand
beautiful programs. The landscape of programming techniques is\nenormous,
fascinating in its diversity, still largely unexplored. It is
certainly\nlittered with traps and snares, luring the inexperienced
programmer into all\nkinds of horrible mistakes, but that only means you
should proceed with\ncaution, keep your wits about you. As you learn, there
will always be new\nchallenges, new territory to explore. The programmer
who refuses to keep\nexploring will surely stagnate, forget his joy, lose
the will to program (and\nbecome a manager).\nAs far as I am
concerned, the definite criterion for a program is whether it is\ncorrect.
Efficiency, clarity, and size are also important, but how to balance\nthese
against each other is always a matter of judgement, a judgement that\neach
programmer must make for himself. Rules of thumb are useful, but
one\nshould never be afraid to break them.\nIn the beginning, at the birth
of computing, there were no programming\nlanguages. Programs looked
something like this:\n00110001 00000000 00000000\n00110001 00000001
00000001\n00110011 00000001 00000010\n01010001 00001011 00000010\n00100010
00000010 00001000\n01000011 00000001 00000000\n01000001 00000001
00000001\n00010000 00000010 00000000\n01100010 00000000 00000000\nThat is a
program to add the numbers from one to ten together, and print out\nthe
result (1 + 2 + ... + 10 = 55). It could run on a very simple kind
of\ncomputer. To program early computers, it was necessary to set large
arrays\nof switches in the right position, or punch holes in strips of
cardboard and\nfeed them to the computer. You can imagine how this was a
tedious,\nerror-prone procedure. Even the writing of simple programs
required much\ncleverness and discipline, complex ones were nearly
inconceivable.\nOf course, manually entering these arcane patterns of bits
(which is what the\n1s and 0s above are generally called) did give the
programmer a profound\nsense of being a mighty wizard. And that has to be
worth something, in terms\nof job satisfaction.\nEach line of the program
contains a single instruction. It could be written in\nEnglish like
this:\nStore the number 0 in memory location 01.\nStore the number 1 in
memory location 12.\nStore the value of memory location 1 in memory
location 23.\nSubtract the number 11 from the value in memory location
24.\nIf the value in memory location 2 is the number 0, continue
with\ninstruction 9\n5.\nAdd the value of memory location 1 to memory
location 06.\nAdd the number 1 to the value of memory location
17.\nContinue with instruction 38.\nOutput the value of memory location
09.\nWhile that is more readable than the binary soup, it is still rather
unpleasant.\nIt might help to use names instead of numbers for the
instructions and\nmemory locations:\nSet 'total' to 0\nSet 'count' to
1\n[loop]\nSet 'compare' to 'count'\nSubtract 11 from 'compare'\nIf
'compare' is zero, continue at [end]\nAdd 'count' to 'total'\nAdd 1 to
'count'\nContinue at [loop]\n[end]\nOutput 'total'\nAt this point it is not
too hard to see how the program works. Can you? The\nfirst two lines give
two memory locations their starting values: total will be\nused to build up
the result of the program, and count keeps track of the\nnumber that we are
currently looking at. The lines using compare are probably\nthe weirdest
ones. What the program wants to do is see if count is equal
to\n11, in order to decide whether it can stop yet. Because the machine is
so\nprimitive, it can only test whether a number is zero, and make a
decision\n(jump) based on that. So it uses the memory location labelled
compare to\ncompute the value of count - 11, and makes a decision based on
that value.\nThe next two lines add the value of count to the result, and
increment count\nby one every time the program has decided that it is not
11 yet.\nHere is the same program in JavaScript:\nvar total = 0, count =
1;\nwhile (count <= 10) {\ntotal += count;\ncount +=
1;\n}\nprint(total);\nThis gives us a few more improvements. Most
importantly, there is no need\nto specify the way we want the program to
jump back and forth anymore.\nThe magic word while takes care of that. It
continues executing the lines\nbelow it as long as the condition it was
given holds: count <= 10, which means\n'count is less than or equal to
10'. Apparently, there is no need anymore to\ncreate a temporary value and
compare that to zero. This was a stupid little\ndetail, and the power of
programming languages is that they take care of\nstupid little details for
us.\nFinally, here is what the program could look like if we happened to
have the\nconvenient operations range and sum available, which respectively
create a\ncollection of numbers within a range and compute the sum of a
collection of\nnumbers:\nprint(sum(range(1, 10)));\nThe moral of this
story, then, is that the same program can be expressed in\nlong and short,
unreadable and readable ways. The first version of the\nprogram was
extremely obscure, while this last one is almost English: print\nthe sum of
the range of numbers from 1 to 10. (We will see in later chapters\nhow to
build things like sum and range.)\nA good programming language helps the
programmer by providing a more\nabstract way to express himself. It hides
uninteresting details, provides\nconvenient building blocks (such as the
while construct), and, most of the\ntime, allows the programmer to add
building blocks himself (such as the sum\nand range
operations).\nJavaScript is the language that is, at the moment, mostly
being used to do all\nki......[truncated]

Am I doing anything wrong? Over the course of 3 months, the problem was
only reported twice (on two distinct documents), all other documents
behaved correctly.
Interestingly, updating the query to something more complex returns valid
snippet, correctly truncated.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b342a193-8f98-4202-a9c1-84ec100e94ae%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

javanna · February 25, 2014, 1:44pm

It would be useful if you can post a complete recreation, mappings
included. Which highlighter are you using?

On Tuesday, February 25, 2014 10:39:10 AM UTC+1, Neamar Tucote wrote:

Hello,

Using the highlight API for a simple query like this:

curl localhost:9200/company_52fb7b90c8318c4dc800006b/_search -d'{
"fields": ,
"query": {
"filtered": {
"query": {
"match": {
"_all": "i do not"
}
}
}
},
"highlight": {
"fields": {
"metadatas.*": {
"number_of_fragments" : 1,
"fragment_size" : 20
}
}
}
}'

This should return snippet whose size does not exceeds 20 characters. Most
of the time, this works, however i do have one document analyzed with the
same mappings which yields really long snippets - in fact, it is not
truncated, and contains all text.

Here is a sample working as expected:

{"took":21,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":19,"max_score":0.24860834,"hits":[{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd8","_score":0.24860834,"highlight":{"metadatas.text":[",
and do not
hesitate"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd6","_score":0.14883985,"highlight":{"metadatas.text":["
take his child.\nI
do"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c57a9ba7daaa265ffdc8","_score":0.1365959,"highlight":{"metadatas.text":["
resident of DC, I am"]}}]}}

And here is the unruly one:

{"took":122,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":19,"max_score":0.24860834,"hits":[{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd8","_score":0.24860834,"highlight":{"metadatas.text":[",
and do not
hesitate"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd6","_score":0.14883985,"highlight":{"metadatas.text":["
take his child.\nI
do"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c57a9ba7daaa265ffdc8","_score":0.1365959,"highlight":{"metadatas.text":["
resident of DC, I
am"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c57a9ba7daaa265ffdc7","_score":0.13437755,"highlight":{"metadatas.text":[".\nI
do not enlighten those who are not eager to learn, nor
arouse\nthose who are not anxious to give an explanation themselves. If
I\nhave presented one corner of the square and they cannot
come\nback to me with the other three, I should not go over the
points\nagain.\n― Confucius\nBesides explaining JavaScript, this book tries
to be an introduction to the basic\nprinciples of programming. Programming,
it turns out, is hard. The\nfundamental rules are, most of the time, simple
and clear. But programs,\nwhile built on top of these basic rules, tend to
become complex enough to\nintroduce their own rules, their own complexity.
Because of this, programming\nis rarely simple or predictable. As Donald
Knuth, who is something of a\nfounding father of the field, says, it is an
art.\nTo get something out of this book, more than just passive reading is
required.\nTry to stay sharp, make an effort to solve the exercises, and
only continue on\nwhen you are reasonably sure you understand the material
that came before.\nThe computer programmer is a creator of universes for
which he\nalone is responsible. Universes of virtually unlimited complexity
can\nbe created in the form of computer programs.\n― Joseph Weizenbaum,
Computer Power and Human Reason\nA program is many things. It is a piece of
text typed by a programmer, it is\nthe directing force that makes the
computer do what it does, it is data in the\ncomputer's memory,
yet it controls the actions performed on this same\nmemory. Analogies that
try to compare programs to objects we are familiar\nwith tend to fall
short, but a superficially fitting one is that of a machine. The\ngears of
a mechanical watch fit together ingeniously, and if the watchmaker\nwas any
good, it will accurately show the time for many years. The elements\nof a
program fit together in a similar way, and if the programmer knows what\nhe
is doing, the program will run without crashing.\nA computer is a machine
built to act as a host for these immaterial machines.\nComputers themselves
can only do stupidly straightforward things. The reason\nthey are
so useful is that they do these things at an incredibly high
speed. A\nprogram can, by ingeniously combining many of these simple
actions, do very\ncomplicated things.\nTo some of us, writing
computer programs is a fascinating game. A program\nis a building of
thought. It is costless to build, weightless, growing easily under\nour
typing hands. If we get carried away, its size and complexity will grow
out\nof control, confusing even the one who created it. This is the main
problem of\nprogramming. It is why so much of today's software tends to
crash, fail,\nscrew up.\nWhen a program works, it is beautiful. The art of
programming is the skill of\ncontrolling complexity. The great program is
subdued, made simple in its\ncomplexity.\nToday, many programmers believe
that this complexity is best managed by\nusing only a small set of
well-understood techniques in their programs. They\nhave composed strict
rules about the form programs should have, and the\nmore zealous among them
will denounce those who break these rules as bad\nprogrammers.\nWhat
hostility to the richness of programming! To try to reduce it to\nsomething
straightforward and predictable, to place a taboo on all the weird\nand
beautiful programs. The landscape of programming techniques is\nenormous,
fascinating in its diversity, still largely unexplored. It is
certainly\nlittered with traps and snares, luring the inexperienced
programmer into all\nkinds of horrible mistakes, but that only means you
should proceed with\ncaution, keep your wits about you. As you learn, there
will always be new\nchallenges, new territory to explore. The programmer
who refuses to keep\nexploring will surely stagnate, forget his joy, lose
the will to program (and\nbecome a manager).\nAs far as I am
concerned, the definite criterion for a program is whether it is\ncorrect.
Efficiency, clarity, and size are also important, but how to balance\nthese
against each other is always a matter of judgement, a judgement that\neach
programmer must make for himself. Rules of thumb are useful, but
one\nshould never be afraid to break them.\nIn the beginning, at the birth
of computing, there were no programming\nlanguages. Programs looked
something like this:\n00110001 00000000 00000000\n00110001 00000001
00000001\n00110011 00000001 00000010\n01010001 00001011 00000010\n00100010
00000010 00001000\n01000011 00000001 00000000\n01000001 00000001
00000001\n00010000 00000010 00000000\n01100010 00000000 00000000\nThat is a
program to add the numbers from one to ten together, and print out\nthe
result (1 + 2 + ... + 10 = 55). It could run on a very simple kind
of\ncomputer. To program early computers, it was necessary to set large
arrays\nof switches in the right position, or punch holes in strips of
cardboard and\nfeed them to the computer. You can imagine how this was a
tedious,\nerror-prone procedure. Even the writing of simple programs
required much\ncleverness and discipline, complex ones were nearly
inconceivable.\nOf course, manually entering these arcane patterns of bits
(which is what the\n1s and 0s above are generally called) did give the
programmer a profound\nsense of being a mighty wizard. And that has to be
worth something, in terms\nof job satisfaction.\nEach line of the program
contains a single instruction. It could be written in\nEnglish like
this:\nStore the number 0 in memory location 01.\nStore the number 1 in
memory location 12.\nStore the value of memory location 1 in memory
location 23.\nSubtract the number 11 from the value in memory location
24.\nIf the value in memory location 2 is the number 0, continue
with\ninstruction 9\n5.\nAdd the value of memory location 1 to memory
location 06.\nAdd the number 1 to the value of memory location
17.\nContinue with instruction 38.\nOutput the value of memory location
09.\nWhile that is more readable than the binary soup, it is still rather
unpleasant.\nIt might help to use names instead of numbers for the
instructions and\nmemory locations:\nSet 'total' to 0\nSet 'count' to
1\n[loop]\nSet 'compare' to 'count'\nSubtract 11 from 'compare'\nIf
'compare' is zero, continue at [end]\nAdd 'count' to 'total'\nAdd 1 to
'count'\nContinue at [loop]\n[end]\nOutput 'total'\nAt this point it is not
too hard to see how the program works. Can you? The\nfirst two lines give
two memory locations their starting values: total will be\nused to build up
the result of the program, and count keeps track of the\nnumber that we are
currently looking at. The lines using compare are probably\nthe weirdest
ones. What the program wants to do is see if count is equal
to\n11, in order to decide whether it can stop yet. Because the machine is
so\nprimitive, it can only test whether a number is zero, and make a
decision\n(jump) based on that. So it uses the memory location labelled
compare to\ncompute the value of count - 11, and makes a decision based on
that value.\nThe next two lines add the value of count to the result, and
increment count\nby one every time the program has decided that it is not
11 yet.\nHere is the same program in JavaScript:\nvar total = 0, count =
1;\nwhile (count <= 10) {\ntotal += count;\ncount +=
1;\n}\nprint(total);\nThis gives us a few more improvements. Most
importantly, there is no need\nto specify the way we want the program to
jump back and forth anymore.\nThe magic word while takes care of that. It
continues executing the lines\nbelow it as long as the condition it was
given holds: count <= 10, which means\n'count is less than or equal to
10'. Apparently, there is no need anymore to\ncreate a temporary value and
compare that to zero. This was a stupid little\ndetail, and the power of
programming languages is that they take care of\nstupid little details for
us.\nFinally, here is what the program could look like if we happened to
have the\nconvenient operations range and sum available, which respectively
create a\ncollection of numbers within a range and compute the sum of a
collection of\nnumbers:\nprint(sum(range(1, 10)));\nThe moral of this
story, then, is that the same program can be expressed in\nlong and short,
unreadable and readable ways. The first version of the\nprogram was
extremely obscure, while this last one is almost English: print\nthe sum of
the range of numbers from 1 to 10. (We will see in later chapters\nhow to
build things like sum and range.)\nA good programming language helps the
programmer by providing a more\nabstract way to express himself. It hides
uninteresting details, provides\nconvenient building blocks (such as the
while construct), and, most of the\ntime, allows the programmer to add
building blocks himself (such as the sum\nand range
operations).\nJavaScript is the language that is, at the moment, mostly
being used to do all\nki......[truncated]

Am I doing anything wrong? Over the course of 3 months, the problem was
only reported twice (on two distinct documents), all other documents
behaved correctly.
Interestingly, updating the query to something more complex returns valid
snippet, correctly truncated.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e8e0dc53-b821-41e7-805f-e7dd29fefa2a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Matthieu_Neamar · February 25, 2014, 2:02pm

Sadly, i can't make a full recreation for now.

Regarding your questions,
I use the default highlighter, with the following (simplified) mapping:

{
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"metadatas": {
"type": "object",
"dynamic": true,
"properties": {
"title": {
"type": "string",
"boost": 4
},
"text": {
"type": "string",
"boost": 1.5
}
}
}
}
}

On Tue, Feb 25, 2014 at 2:44 PM, Luca Cavanna cavannaluca@gmail.com wrote:

It would be useful if you can post a complete recreation, mappings
included. Which highlighter are you using?

On Tuesday, February 25, 2014 10:39:10 AM UTC+1, Neamar Tucote wrote:

Hello,

Using the highlight API for a simple query like this:

curl localhost:9200/company_52fb7b90c8318c4dc800006b/_search -d'{
"fields": ,
"query": {
"filtered": {
"query": {
"match": {
"_all": "i do not"
}
}
}
},
"highlight": {
"fields": {
"metadatas.*": {
"number_of_fragments" : 1,
"fragment_size" : 20
}
}
}
}'

This should return snippet whose size does not exceeds 20 characters.
Most of the time, this works, however i do have one document analyzed with
the same mappings which yields really long snippets - in fact, it is not
truncated, and contains all text.

Here is a sample working as expected:

{"took":21,"timed_out":false,"_shards":{"total":5,"
successful":5,"failed":0},"hits":{"total":19,"max_score":
0.24860834,"hits":[{"index":"company_52fb7b90c8318c4dc800006b","
type":"document","id":"5309c5949ba7daaa265ffdd8","
score":0.24860834,"highlight":{"metadatas.text":[", and do not
hesitate"]}},{"index":"company_52fb7b90c8318c4dc800006b","
type":"document","id":"5309c5949ba7daaa265ffdd6","
score":0.14883985,"highlight":{"metadatas.text":[" take his
child.\nI do"]}},{"index":"company
52fb7b90c8318c4dc800006b","_type":"document","_id":"
5309c57a9ba7daaa265ffdc8","_score":0.1365959,"highlight":{"metadatas.text":["
resident of DC, I am"]}}]}}

And here is the unruly one:

{"took":122,"timed_out":false,"_shards":{"total":5,"
successful":5,"failed":0},"hits":{"total":19,"max_score":
0.24860834,"hits":[{"index":"company_52fb7b90c8318c4dc800006b","
type":"document","id":"5309c5949ba7daaa265ffdd8","
score":0.24860834,"highlight":{"metadatas.text":[", and do not
hesitate"]}},{"index":"company_52fb7b90c8318c4dc800006b","
type":"document","id":"5309c5949ba7daaa265ffdd6","
score":0.14883985,"highlight":{"metadatas.text":[" take his
child.\nI do"]}},{"index":"company
52fb7b90c8318c4dc800006b","_type":"document","_id":"
5309c57a9ba7daaa265ffdc8","_score":0.1365959,"highlight":{"metadatas.text":["
resident of DC, I am"]}},{"index":"company
52fb7b90c8318c4dc800006b","_type":"document","_id":"
5309c57a9ba7daaa265ffdc7","_score":0.13437755,"highlight":
{"metadatas.text":[".\nI do not enlighten those who
are not eager to learn, nor arouse\nthose who are not anxious to give an
explanation themselves. If I\nhave presented one corner of the
square and they cannot come\nback to me with the other three, I
should not go over the points\nagain.\n-- Confucius\nBesides explaining
JavaScript, this book tries to be an introduction to the basic\nprinciples
of programming. Programming, it turns out, is hard. The\nfundamental rules
are, most of the time, simple and clear. But programs,\nwhile built on top
of these basic rules, tend to become complex enough to\nintroduce their own
rules, their own complexity. Because of this, programming\nis rarely simple
or predictable. As Donald Knuth, who is something of a\nfounding father of
the field, says, it is an art.\nTo get something out of this book, more
than just passive reading is required.\nTry to stay sharp, make an effort
to solve the exercises, and only continue on\nwhen you are reasonably sure
you understand the material that came before.\nThe computer programmer is a
creator of universes for which he\nalone is responsible. Universes of
virtually unlimited complexity can\nbe created in the form of computer
programs.\n-- Joseph Weizenbaum, Computer Power and Human Reason\nA program
is many things. It is a piece of text typed by a programmer, it is\nthe
directing force that makes the computer do what it does, it is
data in the\ncomputer's memory, yet it controls the actions performed on
this same\nmemory. Analogies that try to compare programs to objects we are
familiar\nwith tend to fall short, but a superficially fitting one is that
of a machine. The\ngears of a mechanical watch fit together ingeniously,
and if the watchmaker\nwas any good, it will accurately show the time for
many years. The elements\nof a program fit together in a similar way, and
if the programmer knows what\nhe is doing, the program will run without
crashing.\nA computer is a machine built to act as a host for these
immaterial machines.\nComputers themselves can only do stupidly
straightforward things. The reason\nthey are so useful is that they
do these things at an incredibly high speed. A\nprogram can, by
ingeniously combining many of these simple actions, do
very\ncomplicated things.\nTo some of us, writing computer programs is a
fascinating game. A program\nis a building of thought. It is costless to
build, weightless, growing easily under\nour typing hands. If we get
carried away, its size and complexity will grow out\nof control, confusing
even the one who created it. This is the main problem of\nprogramming. It
is why so much of today's software tends to crash, fail,\nscrew up.\nWhen a
program works, it is beautiful. The art of programming is the skill
of\ncontrolling complexity. The great program is subdued, made simple in
its\ncomplexity.\nToday, many programmers believe that this complexity is
best managed by\nusing only a small set of well-understood techniques in
their programs. They\nhave composed strict rules about the form programs
should have, and the\nmore zealous among them will denounce those who break
these rules as bad\nprogrammers.\nWhat hostility to the richness of
programming! To try to reduce it to\nsomething straightforward and
predictable, to place a taboo on all the weird\nand beautiful programs. The
landscape of programming techniques is\nenormous, fascinating in its
diversity, still largely unexplored. It is certainly\nlittered with traps
and snares, luring the inexperienced programmer into all\nkinds of horrible
mistakes, but that only means you should proceed with\ncaution, keep your
wits about you. As you learn, there will always be new\nchallenges, new
territory to explore. The programmer who refuses to keep\nexploring will
surely stagnate, forget his joy, lose the will to program (and\nbecome a
manager).\nAs far as I am concerned, the definite criterion for a
program is whether it is\ncorrect. Efficiency, clarity, and size are also
important, but how to balance\nthese against each other is always a matter
of judgement, a judgement that\neach programmer must make for himself.
Rules of thumb are useful, but one\nshould never be afraid to break
them.\nIn the beginning, at the birth of computing, there were no
programming\nlanguages. Programs looked something like this:\n00110001
00000000 00000000\n00110001 00000001 00000001\n00110011 00000001
00000010\n01010001 00001011 00000010\n00100010 00000010 00001000\n01000011
00000001 00000000\n01000001 00000001 00000001\n00010000 00000010
00000000\n01100010 00000000 00000000\nThat is a program to add the numbers
from one to ten together, and print out\nthe result (1 + 2 + ... + 10 =
55). It could run on a very simple kind of\ncomputer. To program early
computers, it was necessary to set large arrays\nof switches in the right
position, or punch holes in strips of cardboard and\nfeed them to the
computer. You can imagine how this was a tedious,\nerror-prone procedure.
Even the writing of simple programs required much\ncleverness and
discipline, complex ones were nearly inconceivable.\nOf course, manually
entering these arcane patterns of bits (which is what the\n1s and 0s above
are generally called) did give the programmer a profound\nsense of being a
mighty wizard. And that has to be worth something, in terms\nof job
satisfaction.\nEach line of the program contains a single instruction. It
could be written in\nEnglish like this:\nStore the number 0 in memory
location 01.\nStore the number 1 in memory location 12.\nStore the value of
memory location 1 in memory location 23.\nSubtract the number 11 from the
value in memory location 24.\nIf the value in memory location 2 is the
number 0, continue with\ninstruction 9\n5.\nAdd the value of memory
location 1 to memory location 06.\nAdd the number 1 to the value of memory
location 17.\nContinue with instruction 38.\nOutput the value of memory
location 09.\nWhile that is more readable than the binary soup, it is still
rather unpleasant.\nIt might help to use names instead of numbers for the
instructions and\nmemory locations:\nSet 'total' to 0\nSet 'count' to
1\n[loop]\nSet 'compare' to 'count'\nSubtract 11 from 'compare'\nIf
'compare' is zero, continue at [end]\nAdd 'count' to 'total'\nAdd 1 to
'count'\nContinue at [loop]\n[end]\nOutput 'total'\nAt this point it is not
too hard to see how the program works. Can you? The\nfirst two lines give
two memory locations their starting values: total will be\nused to build up
the result of the program, and count keeps track of the\nnumber that we are
currently looking at. The lines using compare are probably\nthe weirdest
ones. What the program wants to do is see if count is equal
to\n11, in order to decide whether it can stop yet. Because the machine is
so\nprimitive, it can only test whether a number is zero, and make a
decision\n(jump) based on that. So it uses the memory location labelled
compare to\ncompute the value of count - 11, and makes a decision based on
that value.\nThe next two lines add the value of count to the result, and
increment count\nby one every time the program has decided that it is not
11 yet.\nHere is the same program in JavaScript:\nvar total = 0, count =
1;\nwhile (count <= 10) {\ntotal += count;\ncount +=
1;\n}\nprint(total);\nThis gives us a few more improvements. Most
importantly, there is no need\nto specify the way we want the program to
jump back and forth anymore.\nThe magic word while takes care of that. It
continues executing the lines\nbelow it as long as the condition it was
given holds: count <= 10, which means\n'count is less than or equal to
10'. Apparently, there is no need anymore to\ncreate a temporary value and
compare that to zero. This was a stupid little\ndetail, and the power of
programming languages is that they take care of\nstupid little details for
us.\nFinally, here is what the program could look like if we happened to
have the\nconvenient operations range and sum available, which respectively
create a\ncollection of numbers within a range and compute the sum of a
collection of\nnumbers:\nprint(sum(range(1, 10)));\nThe moral of this
story, then, is that the same program can be expressed in\nlong and short,
unreadable and readable ways. The first version of the\nprogram was
extremely obscure, while this last one is almost English: print\nthe sum of
the range of numbers from 1 to 10. (We will see in later chapters\nhow to
build things like sum and range.)\nA good programming language helps the
programmer by providing a more\nabstract way to express himself. It hides
uninteresting details, provides\nconvenient building blocks (such as the
while construct), and, most of the\ntime, allows the programmer to add
building blocks himself (such as the sum\nand range
operations).\nJavaScript is the language that is, at the moment, mostly
being used to do all\nki......[truncated]

Am I doing anything wrong? Over the course of 3 months, the problem was
only reported twice (on two distinct documents), all other documents
behaved correctly.
Interestingly, updating the query to something more complex returns valid
snippet, correctly truncated.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/1_sMKN1M3jE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e8e0dc53-b821-41e7-805f-e7dd29fefa2a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANGMQEY9gQ1jW1NK0Hr%2BFk1CJ3zpW-mCdvwqEgXdCHoDfcFyBw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Fragment_size doesn't work with quoted phrase? Elasticsearch	5	411	July 6, 2017
Highlight - Fragment size not working as expected Elasticsearch	2	499	November 6, 2019
Unified highlighter snippet fragmenter issues Elasticsearch	4	504	February 25, 2022
Very excessive highlight fragments Elasticsearch	4	276	July 6, 2017
Elasticsearch Highlighting: Return all matches Elasticsearch	1	451	August 28, 2019

Fragment_size not used for simple queries

Related topics