Accessing tf-idf


(Ben McCann) #1

Can you access the tf-idf to use outside of ElasticSearch? Also, is the
tf-idf calculated on a per-field basis or a per-document basis?

Thanks,
Ben

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #2

The various elements of scoring are exposed in the explanation (if
enabled). Not an ideal format to process programmatically, but the results
are there.

TF-IDF is calculated per-field, with the score of the document being a
combination of the various TF-IDF of the fields involved.

--
Ivan

On Wed, Oct 16, 2013 at 11:34 AM, Ben McCann benjamin.j.mccann@gmail.comwrote:

Can you access the tf-idf to use outside of ElasticSearch? Also, is the
tf-idf calculated on a per-field basis or a per-document basis?

Thanks,
Ben

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ben McCann-2) #3

Hi Ivan,

Thanks for the tip! I'm not familiar with the explanation. Is that the
same as the Explain
APIhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-explain.html
for computing
a score explanation for a query and a specific document? I'd really like to
get a list of all the terms that appear in a field in my index and not only
for a particular query.

Thanks,
Ben

On Wed, Oct 16, 2013 at 12:27 PM, Ivan Brusic ivan@brusic.com wrote:

The various elements of scoring are exposed in the explanation (if
enabled). Not an ideal format to process programmatically, but the results
are there.

TF-IDF is calculated per-field, with the score of the document being a
combination of the various TF-IDF of the fields involved.

--
Ivan

On Wed, Oct 16, 2013 at 11:34 AM, Ben McCann benjamin.j.mccann@gmail.comwrote:

Can you access the tf-idf to use outside of ElasticSearch? Also, is the
tf-idf calculated on a per-field basis or a per-document basis?

Thanks,
Ben

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ZhDlIksA5pE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
about.me/benmccann

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #4

Not quite the explain API, but the score explanation for any query:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html

When enabled, it is similar to the Explain API, but for each document
returned by the query instead of just one.

TF-IDF only matters in the context of a query. If you want all the terms,
you can use a term facet with a large size, or use Jorg's plugin:

Cheers,

Ivan

On Wed, Oct 16, 2013 at 12:38 PM, Ben McCann ben@benmccann.com wrote:

Hi Ivan,

Thanks for the tip! I'm not familiar with the explanation. Is that the
same as the Explain APIhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-explain.html for computing
a score explanation for a query and a specific document? I'd really like to
get a list of all the terms that appear in a field in my index and not only
for a particular query.

Thanks,
Ben

On Wed, Oct 16, 2013 at 12:27 PM, Ivan Brusic ivan@brusic.com wrote:

The various elements of scoring are exposed in the explanation (if
enabled). Not an ideal format to process programmatically, but the results
are there.

TF-IDF is calculated per-field, with the score of the document being a
combination of the various TF-IDF of the fields involved.

--
Ivan

On Wed, Oct 16, 2013 at 11:34 AM, Ben McCann <benjamin.j.mccann@gmail.com

wrote:

Can you access the tf-idf to use outside of ElasticSearch? Also, is the
tf-idf calculated on a per-field basis or a per-document basis?

Thanks,
Ben

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ZhDlIksA5pE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
about.me/benmccann

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ben McCann-2) #5

Thanks Ivan! Jorg's plugin is exactly what I was looking for. I just sent
him a pull request to update to 0.90.5 and he released a new version with
it, so I'll give it a go and see how it works. Also, good point that tf-idf
might not be exactly the right term for what I was looking for. I mainly
care about the IDF portion.

Thanks again!

-Ben

P.S. you've helped me a couple times, so just wanted to say thanks! And
also I'm sure you get just as much recruiter spam as I do, so i won't bug
you, but if you ever want to explore the possibility of working on
elasticsearch with a bunch of ex-googlers then I'd love to share with you
what we're up to in case it's interesting to you

On Wed, Oct 16, 2013 at 12:57 PM, Ivan Brusic ivan@brusic.com wrote:

Not quite the explain API, but the score explanation for any query:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html

When enabled, it is similar to the Explain API, but for each document
returned by the query instead of just one.

TF-IDF only matters in the context of a query. If you want all the terms,
you can use a term facet with a large size, or use Jorg's plugin:
https://github.com/jprante/elasticsearch-index-termlist

Cheers,

Ivan

On Wed, Oct 16, 2013 at 12:38 PM, Ben McCann ben@benmccann.com wrote:

Hi Ivan,

Thanks for the tip! I'm not familiar with the explanation. Is that the
same as the Explain APIhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-explain.html for computing
a score explanation for a query and a specific document? I'd really like to
get a list of all the terms that appear in a field in my index and not only
for a particular query.

Thanks,
Ben

On Wed, Oct 16, 2013 at 12:27 PM, Ivan Brusic ivan@brusic.com wrote:

The various elements of scoring are exposed in the explanation (if
enabled). Not an ideal format to process programmatically, but the results
are there.

TF-IDF is calculated per-field, with the score of the document being a
combination of the various TF-IDF of the fields involved.

--
Ivan

On Wed, Oct 16, 2013 at 11:34 AM, Ben McCann <
benjamin.j.mccann@gmail.com> wrote:

Can you access the tf-idf to use outside of ElasticSearch? Also, is the
tf-idf calculated on a per-field basis or a per-document basis?

Thanks,
Ben

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ZhDlIksA5pE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
about.me/benmccann

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ZhDlIksA5pE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
about.me/benmccann

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #6

Thank Jorg for the plugin. The code is standard in Lucene (accessing the
TermEnums), just adapted for elasticsearch.

--
Ivan

On Wed, Oct 16, 2013 at 5:30 PM, Ben McCann ben@benmccann.com wrote:

Thanks Ivan! Jorg's plugin is exactly what I was looking for. I just sent
him a pull request to update to 0.90.5 and he released a new version with
it, so I'll give it a go and see how it works. Also, good point that tf-idf
might not be exactly the right term for what I was looking for. I mainly
care about the IDF portion.

Thanks again!

-Ben

P.S. you've helped me a couple times, so just wanted to say thanks! And
also I'm sure you get just as much recruiter spam as I do, so i won't bug
you, but if you ever want to explore the possibility of working on
elasticsearch with a bunch of ex-googlers then I'd love to share with you
what we're up to in case it's interesting to you

On Wed, Oct 16, 2013 at 12:57 PM, Ivan Brusic ivan@brusic.com wrote:

Not quite the explain API, but the score explanation for any query:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html

When enabled, it is similar to the Explain API, but for each document
returned by the query instead of just one.

TF-IDF only matters in the context of a query. If you want all the
terms, you can use a term facet with a large size, or use Jorg's plugin:
https://github.com/jprante/elasticsearch-index-termlist

Cheers,

Ivan

On Wed, Oct 16, 2013 at 12:38 PM, Ben McCann ben@benmccann.com wrote:

Hi Ivan,

Thanks for the tip! I'm not familiar with the explanation. Is that the
same as the Explain APIhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-explain.html for computing
a score explanation for a query and a specific document? I'd really like to
get a list of all the terms that appear in a field in my index and not only
for a particular query.

Thanks,
Ben

On Wed, Oct 16, 2013 at 12:27 PM, Ivan Brusic ivan@brusic.com wrote:

The various elements of scoring are exposed in the explanation (if
enabled). Not an ideal format to process programmatically, but the results
are there.

TF-IDF is calculated per-field, with the score of the document being a
combination of the various TF-IDF of the fields involved.

--
Ivan

On Wed, Oct 16, 2013 at 11:34 AM, Ben McCann <
benjamin.j.mccann@gmail.com> wrote:

Can you access the tf-idf to use outside of ElasticSearch? Also, is
the tf-idf calculated on a per-field basis or a per-document basis?

Thanks,
Ben

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ZhDlIksA5pE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
about.me/benmccann

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ZhDlIksA5pE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
about.me/benmccann

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #7

I love to help people getting in touch with Elasticsearch. Don't hesitate
to ask questions - there are no silly questions, only silly answers.

Thanks and kudos to Shay who is so dedicated and passionate and great by
deciding to release so valuable distributed scalable code to the public!

Jörg
Am 17.10.2013 07:23 schrieb "Ivan Brusic" ivan@brusic.com:

Thank Jorg for the plugin. The code is standard in Lucene (accessing the
TermEnums), just adapted for elasticsearch.

--
Ivan

On Wed, Oct 16, 2013 at 5:30 PM, Ben McCann ben@benmccann.com wrote:

Thanks Ivan! Jorg's plugin is exactly what I was looking for. I just
sent him a pull request to update to 0.90.5 and he released a new version
with it, so I'll give it a go and see how it works. Also, good point that
tf-idf might not be exactly the right term for what I was looking for. I
mainly care about the IDF portion.

Thanks again!

-Ben

P.S. you've helped me a couple times, so just wanted to say thanks! And
also I'm sure you get just as much recruiter spam as I do, so i won't bug
you, but if you ever want to explore the possibility of working on
elasticsearch with a bunch of ex-googlers then I'd love to share with you
what we're up to in case it's interesting to you

On Wed, Oct 16, 2013 at 12:57 PM, Ivan Brusic ivan@brusic.com wrote:

Not quite the explain API, but the score explanation for any query:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html

When enabled, it is similar to the Explain API, but for each document
returned by the query instead of just one.

TF-IDF only matters in the context of a query. If you want all the
terms, you can use a term facet with a large size, or use Jorg's plugin:
https://github.com/jprante/elasticsearch-index-termlist

Cheers,

Ivan

On Wed, Oct 16, 2013 at 12:38 PM, Ben McCann ben@benmccann.com wrote:

Hi Ivan,

Thanks for the tip! I'm not familiar with the explanation. Is that the
same as the Explain APIhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-explain.html for computing
a score explanation for a query and a specific document? I'd really like to
get a list of all the terms that appear in a field in my index and not only
for a particular query.

Thanks,
Ben

On Wed, Oct 16, 2013 at 12:27 PM, Ivan Brusic ivan@brusic.com wrote:

The various elements of scoring are exposed in the explanation (if
enabled). Not an ideal format to process programmatically, but the results
are there.

TF-IDF is calculated per-field, with the score of the document being a
combination of the various TF-IDF of the fields involved.

--
Ivan

On Wed, Oct 16, 2013 at 11:34 AM, Ben McCann <
benjamin.j.mccann@gmail.com> wrote:

Can you access the tf-idf to use outside of ElasticSearch? Also, is
the tf-idf calculated on a per-field basis or a per-document basis?

Thanks,
Ben

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ZhDlIksA5pE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
about.me/benmccann

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ZhDlIksA5pE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
about.me/benmccann

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ramdev Wudali) #8

A variant on this particular request:

I would like to get the tf-idf for an indexed field. (the field is a body
of a news document). I would like to find discriminating terms in the
document set (the document set is a result of executing a filter on the
search index.
The discriminating terms are to help with improving the query as the number
of documents returned are too many and relevant documents are getting lost
in the search result (of executing a filter).

Is it possible to run the tf-idf calculations that Elastic does while
indexing the document.(the API to access the TF-IDF calculations)

Thanks

Ramdev

On Thursday, 17 October 2013 04:12:35 UTC-5, Jörg Prante wrote:

I love to help people getting in touch with Elasticsearch. Don't hesitate
to ask questions - there are no silly questions, only silly answers.

Thanks and kudos to Shay who is so dedicated and passionate and great by
deciding to release so valuable distributed scalable code to the public!

Jörg
Am 17.10.2013 07:23 schrieb "Ivan Brusic" <iv...@brusic.com <javascript:>

:

Thank Jorg for the plugin. The code is standard in Lucene (accessing the
TermEnums), just adapted for elasticsearch.

--
Ivan

On Wed, Oct 16, 2013 at 5:30 PM, Ben McCann <b...@benmccann.com<javascript:>

wrote:

Thanks Ivan! Jorg's plugin is exactly what I was looking for. I just
sent him a pull request to update to 0.90.5 and he released a new version
with it, so I'll give it a go and see how it works. Also, good point that
tf-idf might not be exactly the right term for what I was looking for. I
mainly care about the IDF portion.

Thanks again!

-Ben

P.S. you've helped me a couple times, so just wanted to say thanks! And
also I'm sure you get just as much recruiter spam as I do, so i won't bug
you, but if you ever want to explore the possibility of working on
elasticsearch with a bunch of ex-googlers then I'd love to share with you
what we're up to in case it's interesting to you

On Wed, Oct 16, 2013 at 12:57 PM, Ivan Brusic <iv...@brusic.com<javascript:>

wrote:

Not quite the explain API, but the score explanation for any query:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html

When enabled, it is similar to the Explain API, but for each document
returned by the query instead of just one.

TF-IDF only matters in the context of a query. If you want all the
terms, you can use a term facet with a large size, or use Jorg's plugin:
https://github.com/jprante/elasticsearch-index-termlist

Cheers,

Ivan

On Wed, Oct 16, 2013 at 12:38 PM, Ben McCann <b...@benmccann.com<javascript:>

wrote:

Hi Ivan,

Thanks for the tip! I'm not familiar with the explanation. Is that
the same as the Explain APIhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-explain.html for computing
a score explanation for a query and a specific document? I'd really like to
get a list of all the terms that appear in a field in my index and not only
for a particular query.

Thanks,
Ben

On Wed, Oct 16, 2013 at 12:27 PM, Ivan Brusic <iv...@brusic.com<javascript:>

wrote:

The various elements of scoring are exposed in the explanation (if
enabled). Not an ideal format to process programmatically, but the results
are there.

TF-IDF is calculated per-field, with the score of the document being
a combination of the various TF-IDF of the fields involved.

--
Ivan

On Wed, Oct 16, 2013 at 11:34 AM, Ben McCann <benjamin...@gmail.com<javascript:>

wrote:

Can you access the tf-idf to use outside of ElasticSearch? Also, is
the tf-idf calculated on a per-field basis or a per-document basis?

Thanks,
Ben

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com <javascript:>.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ZhDlIksA5pE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.

For more options, visit https://groups.google.com/groups/opt_out.

--
about.me/benmccann

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ZhDlIksA5pE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
about.me/benmccann

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fd6c72a7-9f3f-4764-b512-9234a6769958%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #9

Can you provide a small example of what you are trying to achieve? Are the
discriminating terms known beforehand or is it dependent on the document?
Have you looked into the new text scoring features which have been released
since the original post? It is worth looking into:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

You can probably calculate the TF values during indexing, but not the IDF
since that value is based on all of the documents in a shard.

Cheers,

Ivan

On Fri, Apr 25, 2014 at 8:46 AM, Ramdev Wudali agastya71@gmail.com wrote:

A variant on this particular request:

I would like to get the tf-idf for an indexed field. (the field is a body
of a news document). I would like to find discriminating terms in the
document set (the document set is a result of executing a filter on the
search index.
The discriminating terms are to help with improving the query as the
number of documents returned are too many and relevant documents are
getting lost in the search result (of executing a filter).

Is it possible to run the tf-idf calculations that Elastic does while
indexing the document.(the API to access the TF-IDF calculations)

Thanks

Ramdev

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAhqGSS8L5sEGe-d%2BwYmwC-fiMZP0FhyDa_UCf9xB8GGA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ramdev Wudali) #10

Ivan:
I filter the index for documents containing AAPL(the ticket symbol) (as
part of a field that is filterable).
I get back 1000 documents in no particular order as the request was just a
filter. To this filter, I would like to add a "discriminating/significant"
text that would be found in the 1000 documents. So that the documents
returned are in a sense only those that are significant.

I do not want the terms to be significant against the whole index, but only
against the documents that are returned for the query. Hence I would like
to run some extra analysis against this filter request result to identify
these "discriminating/significant" terms.

I was wondering if I can access the elastic API /underlying implementation
to do the calculations.

Ramdev

On Friday, 25 April 2014 13:09:35 UTC-5, Ivan Brusic wrote:

Can you provide a small example of what you are trying to achieve? Are the
discriminating terms known beforehand or is it dependent on the document?
Have you looked into the new text scoring features which have been released
since the original post? It is worth looking into:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

You can probably calculate the TF values during indexing, but not the IDF
since that value is based on all of the documents in a shard.

Cheers,

Ivan

On Fri, Apr 25, 2014 at 8:46 AM, Ramdev Wudali <agas...@gmail.com<javascript:>

wrote:

A variant on this particular request:

I would like to get the tf-idf for an indexed field. (the field is a body
of a news document). I would like to find discriminating terms in the
document set (the document set is a result of executing a filter on the
search index.
The discriminating terms are to help with improving the query as the
number of documents returned are too many and relevant documents are
getting lost in the search result (of executing a filter).

Is it possible to run the tf-idf calculations that Elastic does while
indexing the document.(the API to access the TF-IDF calculations)

Thanks

Ramdev

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/81a1726e-3b08-4de8-b9ea-28b159516e40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #11

Would it be possible to create some sort of numerical value from the
discriminating/significant
text at index time in order to sort the documents by?

You can index the documents with term vectors, which will allow you to
access the term frequency values:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

Not sure if those values can be used in script or even to sort by. Using
scripts, you can get access to the fields. It would be time-consuming, but
you can iterate through each term of a field and use the text scoring
features to get the appropriate values.

Cheers,

Ivan

On Mon, Apr 28, 2014 at 6:48 AM, Ramdev Wudali agastya71@gmail.com wrote:

Ivan:
I filter the index for documents containing AAPL(the ticket symbol) (as
part of a field that is filterable).
I get back 1000 documents in no particular order as the request was just a
filter. To this filter, I would like to add a "discriminating/significant"
text that would be found in the 1000 documents. So that the documents
returned are in a sense only those that are significant.

I do not want the terms to be significant against the whole index, but
only against the documents that are returned for the query. Hence I would
like to run some extra analysis against this filter request result to
identify these "discriminating/significant" terms.

I was wondering if I can access the elastic API /underlying implementation
to do the calculations.

Ramdev

On Friday, 25 April 2014 13:09:35 UTC-5, Ivan Brusic wrote:

Can you provide a small example of what you are trying to achieve? Are
the discriminating terms known beforehand or is it dependent on the
document? Have you looked into the new text scoring features which have
been released since the original post? It is worth looking into:

http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/modules-advanced-scripting.html

You can probably calculate the TF values during indexing, but not the IDF
since that value is based on all of the documents in a shard.

Cheers,

Ivan

On Fri, Apr 25, 2014 at 8:46 AM, Ramdev Wudali agas...@gmail.com wrote:

A variant on this particular request:

I would like to get the tf-idf for an indexed field. (the field is a
body of a news document). I would like to find discriminating terms in the
document set (the document set is a result of executing a filter on the
search index.
The discriminating terms are to help with improving the query as the
number of documents returned are too many and relevant documents are
getting lost in the search result (of executing a filter).

Is it possible to run the tf-idf calculations that Elastic does while
indexing the document.(the API to access the TF-IDF calculations)

Thanks

Ramdev

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/81a1726e-3b08-4de8-b9ea-28b159516e40%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/81a1726e-3b08-4de8-b9ea-28b159516e40%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgHyiEDcs1zLdAMqVuQV6SO9nOk9SZHNLSyXjC3tHDSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Vladi Feigin) #12

Hello,

I have a requirement to retrieve the terms frequency (TF) from all recently
indexed documents (last 24 hours)
So in a query I have to supply the time range and expect to get the TFs of
all terms in the given time range
Is it possible to do in ES? If yes, please refer me to the documentation.
Thank you in advance,
Vlad

On Wednesday, October 16, 2013 9:34:37 PM UTC+3, Ben McCann wrote:

Can you access the tf-idf to use outside of ElasticSearch? Also, is the
tf-idf calculated on a per-field basis or a per-document basis?

Thanks,
Ben

--
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of the
addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c0e230bc-e7db-45d4-a263-54e10467a696%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #13