On Sun, Apr 15, 2012 at 21:16, Clinton Gormley firstname.lastname@example.org wrote:
I've added a glossary of common terms that might be useful to new
I don't think my definitions are particularly good, so improvements and
additions are welcome.
I was going to send you a patch describing how "token" and
"tokenization" relates to ElasticSearch, which is a question I've been
asked about it in the past.
When you analyze something you're taking full text and turning it into
terms. Is "analysis" and "term" synonymous with "tokenization" and
"token" as it's commonly used in some other parsing libraries, or is
there some distinction.
I vaguely recall that "term" might be a token with additional
metadata, i.e. "Foo Bar" is not only tokenized into [ "foo", "bar" ]
but terms that have metadata about start/end positions, is that