Sourceless Highlighting

Hello,

I'm trying to make search highlighting work without storing sources in
ElasticSearch.

In other to do this, I intend to get search hits position from Lucene's *TermFreqVector
*class. I gather from an older thread that I need to write a plugin for
ElasticSearch.

I have two questions :

  1. Is it feasible? Is there a way to extend that part of ElasticSearch?
  2. If so, where should I hook my plugin?

Cheers,
Greg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Before you start and create a plugin / fork ES, you can just store a
specific field instead of the whole source and
highlight on that instead. Is that sufficient in your case?

Martijn

On 18 February 2013 14:10, gelmajjouti gelmajjouti@gmail.com wrote:

Hello,

I'm trying to make search highlighting work without storing sources in
Elasticsearch.

In other to do this, I intend to get search hits position from Lucene's *TermFreqVector
*class. I gather from an older thread that I need to write a plugin for
Elasticsearch.

I have two questions :

  1. Is it feasible? Is there a way to extend that part of Elasticsearch?
  2. If so, where should I hook my plugin?

Cheers,
Greg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for your reply.

Most of my data is searchable fulltext. I can't afford to store it in ES.

Greg

On Monday, February 18, 2013 4:45:12 PM UTC+1, Martijn v Groningen wrote:

Before you start and create a plugin / fork ES, you can just store a
specific field instead of the whole source and
highlight on that instead. Is that sufficient in your case?

Martijn

On 18 February 2013 14:10, gelmajjouti <gelma...@gmail.com <javascript:>>wrote:

Hello,

I'm trying to make search highlighting work without storing sources in
Elasticsearch.

In other to do this, I intend to get search hits position from Lucene's *TermFreqVector
*class. I gather from an older thread that I need to write a plugin for
Elasticsearch.

I have two questions :

  1. Is it feasible? Is there a way to extend that part of
    Elasticsearch?
  2. If so, where should I hook my plugin?

Cheers,
Greg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

it might be a useful feature to add returning a stream of offset tuples
rather than highlighted strings. Even if we expose TermVectors they might
be too expensive to transfer while a "snipet descriptor" might be what some
people need.

simon

On Monday, February 18, 2013 4:55:39 PM UTC+1, gelmajjouti wrote:

Thanks for your reply.

Most of my data is searchable fulltext. I can't afford to store it in ES.

Greg

On Monday, February 18, 2013 4:45:12 PM UTC+1, Martijn v Groningen wrote:

Before you start and create a plugin / fork ES, you can just store a
specific field instead of the whole source and
highlight on that instead. Is that sufficient in your case?

Martijn

On 18 February 2013 14:10, gelmajjouti gelma...@gmail.com wrote:

Hello,

I'm trying to make search highlighting work without storing sources in
Elasticsearch.

In other to do this, I intend to get search hits position from Lucene's
*TermFreqVector *class. I gather from an older thread that I need to
write a plugin for Elasticsearch.

I have two questions :

  1. Is it feasible? Is there a way to extend that part of
    Elasticsearch?
  2. If so, where should I hook my plugin?

Cheers,
Greg

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

+100 To me getting data about highlighting is more useful than highlighted
text.
In case of searching _all (or any other non stored multi_field) I would
like to get say a list of matched tokens (ideally including any "fuzzy"
tokens) and I will do client side highlighting of my text based on that.
For cases when we search against set of stored/source fields, name of
field(s) with array of offset tuples (or matched tokens) would be great!.
We often end up highlighting data from our business objects
(JPA/JDO/Hibernate) and not elastic stored data so for this case, most
useful would be a highlight consisting of list of matched fields with array
of matched tokens from the source (case does not matter)

On Monday, February 18, 2013 1:51:11 PM UTC-5, simonw wrote:

it might be a useful feature to add returning a stream of offset tuples
rather than highlighted strings. Even if we expose TermVectors they might
be too expensive to transfer while a "snipet descriptor" might be what some
people need.

simon

On Monday, February 18, 2013 4:55:39 PM UTC+1, gelmajjouti wrote:

Thanks for your reply.

Most of my data is searchable fulltext. I can't afford to store it in ES.

Greg

On Monday, February 18, 2013 4:45:12 PM UTC+1, Martijn v Groningen wrote:

Before you start and create a plugin / fork ES, you can just store a
specific field instead of the whole source and
highlight on that instead. Is that sufficient in your case?

Martijn

On 18 February 2013 14:10, gelmajjouti gelma...@gmail.com wrote:

Hello,

I'm trying to make search highlighting work without storing sources in
Elasticsearch.

In other to do this, I intend to get search hits position from Lucene's
*TermFreqVector *class. I gather from an older thread that I need to
write a plugin for Elasticsearch.

I have two questions :

  1. Is it feasible? Is there a way to extend that part of
    Elasticsearch?
  2. If so, where should I hook my plugin?

Cheers,
Greg

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Not the cleanest solution, but you can extract some of this information
from the explain object.

--
Ivan

On Mon, Feb 18, 2013 at 2:43 PM, AlexR roytmana@gmail.com wrote:

+100 To me getting data about highlighting is more useful than highlighted
text.
In case of searching _all (or any other non stored multi_field) I would
like to get say a list of matched tokens (ideally including any "fuzzy"
tokens) and I will do client side highlighting of my text based on that.
For cases when we search against set of stored/source fields, name of
field(s) with array of offset tuples (or matched tokens) would be great!.
We often end up highlighting data from our business objects
(JPA/JDO/Hibernate) and not elastic stored data so for this case, most
useful would be a highlight consisting of list of matched fields with array
of matched tokens from the source (case does not matter)

On Monday, February 18, 2013 1:51:11 PM UTC-5, simonw wrote:

it might be a useful feature to add returning a stream of offset tuples
rather than highlighted strings. Even if we expose TermVectors they might
be too expensive to transfer while a "snipet descriptor" might be what some
people need.

simon

On Monday, February 18, 2013 4:55:39 PM UTC+1, gelmajjouti wrote:

Thanks for your reply.

Most of my data is searchable fulltext. I can't afford to store it in ES.

Greg

On Monday, February 18, 2013 4:45:12 PM UTC+1, Martijn v Groningen wrote:

Before you start and create a plugin / fork ES, you can just store a
specific field instead of the whole source and
highlight on that instead. Is that sufficient in your case?

Martijn

On 18 February 2013 14:10, gelmajjouti gelma...@gmail.com wrote:

Hello,

I'm trying to make search highlighting work without storing sources in
Elasticsearch.

In other to do this, I intend to get search hits position from
Lucene's *TermFreqVector *class. I gather from an older thread that I
need to write a plugin for Elasticsearch.

I have two questions :

  1. Is it feasible? Is there a way to extend that part of
    Elasticsearch?
  2. If so, where should I hook my plugin?

Cheers,
Greg

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.**com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

+1 exactly a solution I might need. An array of matched field with position
instead of highlighted text would be great !

On Monday, February 18, 2013 5:43:35 PM UTC-5, AlexR wrote:

+100 To me getting data about highlighting is more useful than highlighted
text.
In case of searching _all (or any other non stored multi_field) I would
like to get say a list of matched tokens (ideally including any "fuzzy"
tokens) and I will do client side highlighting of my text based on that.
For cases when we search against set of stored/source fields, name of
field(s) with array of offset tuples (or matched tokens) would be great!.
We often end up highlighting data from our business objects
(JPA/JDO/Hibernate) and not elastic stored data so for this case, most
useful would be a highlight consisting of list of matched fields with array
of matched tokens from the source (case does not matter)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

i do not know if it is possible, but if it would be fantastic if it were
possible to store start/end of individual fields in _all (or similar
composite fields) and translate highlight positions in _all to which source
fields got matched and positions in those fields in the source. That would
take care of virtually all highlighting needs!

On Mon, Feb 18, 2013 at 9:04 PM, Gildas Houmard ghoumard@gmail.com wrote:

+1 exactly a solution I might need. An array of matched field with
position instead of highlighted text would be great !

On Monday, February 18, 2013 5:43:35 PM UTC-5, AlexR wrote:

+100 To me getting data about highlighting is more useful than
highlighted text.
In case of searching _all (or any other non stored multi_field) I would
like to get say a list of matched tokens (ideally including any "fuzzy"
tokens) and I will do client side highlighting of my text based on that.
For cases when we search against set of stored/source fields, name of
field(s) with array of offset tuples (or matched tokens) would be great!.
We often end up highlighting data from our business objects
(JPA/JDO/Hibernate) and not elastic stored data so for this case, most
useful would be a highlight consisting of list of matched fields with array
of matched tokens from the source (case does not matter)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.