Hello,
Using the highlight API for a simple query like this:
curl localhost:9200/company_52fb7b90c8318c4dc800006b/_search -d'{
"fields": [],
"query": {
"filtered": {
"query": {
"match": {
"_all": "i do not"
}
}
}
},
"highlight": {
"fields": {
"metadatas.*": {
"number_of_fragments" : 1,
"fragment_size" : 20
}
}
}
}'
This should return snippet whose size does not exceeds 20 characters. Most
of the time, this works, however i do have one document analyzed with the
same mappings which yields really long snippets - in fact, it is not
truncated, and contains all text.
Here is a sample working as expected:
{"took":21,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":19,"max_score":0.24860834,"hits":[{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd8","_score":0.24860834,"highlight":{"metadatas.text":[",
and do not
hesitate"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd6","_score":0.14883985,"highlight":{"metadatas.text":["
take his child.\nI
do"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c57a9ba7daaa265ffdc8","_score":0.1365959,"highlight":{"metadatas.text":["
resident of DC, I am"]}}]}}
And here is the unruly one:
{"took":122,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":19,"max_score":0.24860834,"hits":[{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd8","_score":0.24860834,"highlight":{"metadatas.text":[",
and do not
hesitate"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c5949ba7daaa265ffdd6","_score":0.14883985,"highlight":{"metadatas.text":["
take his child.\nI
do"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c57a9ba7daaa265ffdc8","_score":0.1365959,"highlight":{"metadatas.text":["
resident of DC, I
am"]}},{"_index":"company_52fb7b90c8318c4dc800006b","_type":"document","_id":"5309c57a9ba7daaa265ffdc7","_score":0.13437755,"highlight":{"metadatas.text":[".\nI
do not enlighten those who are not eager to learn, nor
arouse\nthose who are not anxious to give an explanation themselves. If
I\nhave presented one corner of the square and they cannot
come\nback to me with the other three, I should not go over the
points\nagain.\n― Confucius\nBesides explaining JavaScript, this book tries
to be an introduction to the basic\nprinciples of programming. Programming,
it turns out, is hard. The\nfundamental rules are, most of the time, simple
and clear. But programs,\nwhile built on top of these basic rules, tend to
become complex enough to\nintroduce their own rules, their own complexity.
Because of this, programming\nis rarely simple or predictable. As Donald
Knuth, who is something of a\nfounding father of the field, says, it is an
art.\nTo get something out of this book, more than just passive reading is
required.\nTry to stay sharp, make an effort to solve the exercises, and
only continue on\nwhen you are reasonably sure you understand the material
that came before.\nThe computer programmer is a creator of universes for
which he\nalone is responsible. Universes of virtually unlimited complexity
can\nbe created in the form of computer programs.\n― Joseph Weizenbaum,
Computer Power and Human Reason\nA program is many things. It is a piece of
text typed by a programmer, it is\nthe directing force that makes the
computer do what it does, it is data in the\ncomputer's memory,
yet it controls the actions performed on this same\nmemory. Analogies that
try to compare programs to objects we are familiar\nwith tend to fall
short, but a superficially fitting one is that of a machine. The\ngears of
a mechanical watch fit together ingeniously, and if the watchmaker\nwas any
good, it will accurately show the time for many years. The elements\nof a
program fit together in a similar way, and if the programmer knows what\nhe
is doing, the program will run without crashing.\nA computer is a machine
built to act as a host for these immaterial machines.\nComputers themselves
can only do stupidly straightforward things. The reason\nthey are
so useful is that they do these things at an incredibly high
speed. A\nprogram can, by ingeniously combining many of these simple
actions, do very\ncomplicated things.\nTo some of us, writing
computer programs is a fascinating game. A program\nis a building of
thought. It is costless to build, weightless, growing easily under\nour
typing hands. If we get carried away, its size and complexity will grow
out\nof control, confusing even the one who created it. This is the main
problem of\nprogramming. It is why so much of today's software tends to
crash, fail,\nscrew up.\nWhen a program works, it is beautiful. The art of
programming is the skill of\ncontrolling complexity. The great program is
subdued, made simple in its\ncomplexity.\nToday, many programmers believe
that this complexity is best managed by\nusing only a small set of
well-understood techniques in their programs. They\nhave composed strict
rules about the form programs should have, and the\nmore zealous among them
will denounce those who break these rules as bad\nprogrammers.\nWhat
hostility to the richness of programming! To try to reduce it to\nsomething
straightforward and predictable, to place a taboo on all the weird\nand
beautiful programs. The landscape of programming techniques is\nenormous,
fascinating in its diversity, still largely unexplored. It is
certainly\nlittered with traps and snares, luring the inexperienced
programmer into all\nkinds of horrible mistakes, but that only means you
should proceed with\ncaution, keep your wits about you. As you learn, there
will always be new\nchallenges, new territory to explore. The programmer
who refuses to keep\nexploring will surely stagnate, forget his joy, lose
the will to program (and\nbecome a manager).\nAs far as I am
concerned, the definite criterion for a program is whether it is\ncorrect.
Efficiency, clarity, and size are also important, but how to balance\nthese
against each other is always a matter of judgement, a judgement that\neach
programmer must make for himself. Rules of thumb are useful, but
one\nshould never be afraid to break them.\nIn the beginning, at the birth
of computing, there were no programming\nlanguages. Programs looked
something like this:\n00110001 00000000 00000000\n00110001 00000001
00000001\n00110011 00000001 00000010\n01010001 00001011 00000010\n00100010
00000010 00001000\n01000011 00000001 00000000\n01000001 00000001
00000001\n00010000 00000010 00000000\n01100010 00000000 00000000\nThat is a
program to add the numbers from one to ten together, and print out\nthe
result (1 + 2 + ... + 10 = 55). It could run on a very simple kind
of\ncomputer. To program early computers, it was necessary to set large
arrays\nof switches in the right position, or punch holes in strips of
cardboard and\nfeed them to the computer. You can imagine how this was a
tedious,\nerror-prone procedure. Even the writing of simple programs
required much\ncleverness and discipline, complex ones were nearly
inconceivable.\nOf course, manually entering these arcane patterns of bits
(which is what the\n1s and 0s above are generally called) did give the
programmer a profound\nsense of being a mighty wizard. And that has to be
worth something, in terms\nof job satisfaction.\nEach line of the program
contains a single instruction. It could be written in\nEnglish like
this:\nStore the number 0 in memory location 01.\nStore the number 1 in
memory location 12.\nStore the value of memory location 1 in memory
location 23.\nSubtract the number 11 from the value in memory location
24.\nIf the value in memory location 2 is the number 0, continue
with\ninstruction 9\n5.\nAdd the value of memory location 1 to memory
location 06.\nAdd the number 1 to the value of memory location
17.\nContinue with instruction 38.\nOutput the value of memory location
09.\nWhile that is more readable than the binary soup, it is still rather
unpleasant.\nIt might help to use names instead of numbers for the
instructions and\nmemory locations:\nSet 'total' to 0\nSet 'count' to
1\n[loop]\nSet 'compare' to 'count'\nSubtract 11 from 'compare'\nIf
'compare' is zero, continue at [end]\nAdd 'count' to 'total'\nAdd 1 to
'count'\nContinue at [loop]\n[end]\nOutput 'total'\nAt this point it is not
too hard to see how the program works. Can you? The\nfirst two lines give
two memory locations their starting values: total will be\nused to build up
the result of the program, and count keeps track of the\nnumber that we are
currently looking at. The lines using compare are probably\nthe weirdest
ones. What the program wants to do is see if count is equal
to\n11, in order to decide whether it can stop yet. Because the machine is
so\nprimitive, it can only test whether a number is zero, and make a
decision\n(jump) based on that. So it uses the memory location labelled
compare to\ncompute the value of count - 11, and makes a decision based on
that value.\nThe next two lines add the value of count to the result, and
increment count\nby one every time the program has decided that it is not
11 yet.\nHere is the same program in JavaScript:\nvar total = 0, count =
1;\nwhile (count <= 10) {\ntotal += count;\ncount +=
1;\n}\nprint(total);\nThis gives us a few more improvements. Most
importantly, there is no need\nto specify the way we want the program to
jump back and forth anymore.\nThe magic word while takes care of that. It
continues executing the lines\nbelow it as long as the condition it was
given holds: count <= 10, which means\n'count is less than or equal to
10'. Apparently, there is no need anymore to\ncreate a temporary value and
compare that to zero. This was a stupid little\ndetail, and the power of
programming languages is that they take care of\nstupid little details for
us.\nFinally, here is what the program could look like if we happened to
have the\nconvenient operations range and sum available, which respectively
create a\ncollection of numbers within a range and compute the sum of a
collection of\nnumbers:\nprint(sum(range(1, 10)));\nThe moral of this
story, then, is that the same program can be expressed in\nlong and short,
unreadable and readable ways. The first version of the\nprogram was
extremely obscure, while this last one is almost English: print\nthe sum of
the range of numbers from 1 to 10. (We will see in later chapters\nhow to
build things like sum and range.)\nA good programming language helps the
programmer by providing a more\nabstract way to express himself. It hides
uninteresting details, provides\nconvenient building blocks (such as the
while construct), and, most of the\ntime, allows the programmer to add
building blocks himself (such as the sum\nand range
operations).\nJavaScript is the language that is, at the moment, mostly
being used to do all\nki......[truncated]
Am I doing anything wrong? Over the course of 3 months, the problem was
only reported twice (on two distinct documents), all other documents
behaved correctly.
Interestingly, updating the query to something more complex returns valid
snippet, correctly truncated.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b342a193-8f98-4202-a9c1-84ec100e94ae%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.