Glossary of terms added to docs


(Clinton Gormley) #1

Hi all

I've added a glossary of common terms that might be useful to new
ElasticSearch users:

http://www.elasticsearch.org/guide/appendix/glossary.html

I don't think my definitions are particularly good, so improvements and
additions are welcome.

thanks

Clint

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


(Ævar Arnfjörð Bjarmason) #2

On Sun, Apr 15, 2012 at 21:16, Clinton Gormley clint@traveljury.com wrote:

I've added a glossary of common terms that might be useful to new
ElasticSearch users:

http://www.elasticsearch.org/guide/appendix/glossary.html

I don't think my definitions are particularly good, so improvements and
additions are welcome.

I was going to send you a patch describing how "token" and
"tokenization" relates to ElasticSearch, which is a question I've been
asked about it in the past.

When you analyze something you're taking full text and turning it into
terms. Is "analysis" and "term" synonymous with "tokenization" and
"token" as it's commonly used in some other parsing libraries, or is
there some distinction.

I vaguely recall that "term" might be a token with additional
metadata, i.e. "Foo Bar" is not only tokenized into [ "foo", "bar" ]
but terms that have metadata about start/end positions, is that
accurate?


(Clinton Gormley) #3

Hiya Avar

I was going to send you a patch describing how "token" and
"tokenization" relates to ElasticSearch, which is a question I've been
asked about it in the past.

When you analyze something you're taking full text and turning it into
terms. Is "analysis" and "term" synonymous with "tokenization" and
"token" as it's commonly used in some other parsing libraries, or is
there some distinction.

I think it is the same thing - and yes, we should probably mention
'tokenization' as well.

I vaguely recall that "term" might be a token with additional
metadata, i.e. "Foo Bar" is not only tokenized into [ "foo", "bar" ]
but terms that have metadata about start/end positions, is that
accurate?

In the glossary, I'm aiming for "broadly true", rather than absolute
truth. The idea is that it gives a simple definition, enough to allow a
newbie to orient themselves, without overloading them with all the
specifics.

ta

clint


(Iftekharul Haque) #4

Apologies in advance if this has been covered, I must have missed it:
is there a corresponding github repository we can send pull requests
to if we had patches?

  • Ifty.

On Mon, Apr 16, 2012 at 12:12 PM, Clinton Gormley clint@traveljury.com wrote:

Hiya Avar

I was going to send you a patch describing how "token" and
"tokenization" relates to ElasticSearch, which is a question I've been
asked about it in the past.

When you analyze something you're taking full text and turning it into
terms. Is "analysis" and "term" synonymous with "tokenization" and
"token" as it's commonly used in some other parsing libraries, or is
there some distinction.

I think it is the same thing - and yes, we should probably mention
'tokenization' as well.

I vaguely recall that "term" might be a token with additional
metadata, i.e. "Foo Bar" is not only tokenized into [ "foo", "bar" ]
but terms that have metadata about start/end positions, is that
accurate?

In the glossary, I'm aiming for "broadly true", rather than absolute
truth. The idea is that it gives a simple definition, enough to allow a
newbie to orient themselves, without overloading them with all the
specifics.

ta

clint


(Ævar Arnfjörð Bjarmason) #5

On Mon, Apr 16, 2012 at 22:06, Iftekharul Haque
iftekharul.haque@gmail.com wrote:

Apologies in advance if this has been covered, I must have missed it:
is there a corresponding github repository we can send pull requests
to if we had patches?

https://github.com/elasticsearch/elasticsearch.github.com


(system) #6