I'm trying to make search highlighting work without storing sources in
ElasticSearch.
In other to do this, I intend to get search hits position from Lucene's *TermFreqVector
*class. I gather from an older thread that I need to write a plugin for
ElasticSearch.
I have two questions :
Is it feasible? Is there a way to extend that part of ElasticSearch?
Before you start and create a plugin / fork ES, you can just store a
specific field instead of the whole source and
highlight on that instead. Is that sufficient in your case?
I'm trying to make search highlighting work without storing sources in
Elasticsearch.
In other to do this, I intend to get search hits position from Lucene's *TermFreqVector
*class. I gather from an older thread that I need to write a plugin for
Elasticsearch.
I have two questions :
Is it feasible? Is there a way to extend that part of Elasticsearch?
Most of my data is searchable fulltext. I can't afford to store it in ES.
Greg
On Monday, February 18, 2013 4:45:12 PM UTC+1, Martijn v Groningen wrote:
Before you start and create a plugin / fork ES, you can just store a
specific field instead of the whole source and
highlight on that instead. Is that sufficient in your case?
Martijn
On 18 February 2013 14:10, gelmajjouti <gelma...@gmail.com <javascript:>>wrote:
Hello,
I'm trying to make search highlighting work without storing sources in
Elasticsearch.
In other to do this, I intend to get search hits position from Lucene's *TermFreqVector
*class. I gather from an older thread that I need to write a plugin for
Elasticsearch.
I have two questions :
Is it feasible? Is there a way to extend that part of
Elasticsearch?
If so, where should I hook my plugin?
Cheers,
Greg
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
it might be a useful feature to add returning a stream of offset tuples
rather than highlighted strings. Even if we expose TermVectors they might
be too expensive to transfer while a "snipet descriptor" might be what some
people need.
simon
On Monday, February 18, 2013 4:55:39 PM UTC+1, gelmajjouti wrote:
Thanks for your reply.
Most of my data is searchable fulltext. I can't afford to store it in ES.
Greg
On Monday, February 18, 2013 4:45:12 PM UTC+1, Martijn v Groningen wrote:
Before you start and create a plugin / fork ES, you can just store a
specific field instead of the whole source and
highlight on that instead. Is that sufficient in your case?
I'm trying to make search highlighting work without storing sources in
Elasticsearch.
In other to do this, I intend to get search hits position from Lucene's
*TermFreqVector *class. I gather from an older thread that I need to
write a plugin for Elasticsearch.
I have two questions :
Is it feasible? Is there a way to extend that part of
Elasticsearch?
+100 To me getting data about highlighting is more useful than highlighted
text.
In case of searching _all (or any other non stored multi_field) I would
like to get say a list of matched tokens (ideally including any "fuzzy"
tokens) and I will do client side highlighting of my text based on that.
For cases when we search against set of stored/source fields, name of
field(s) with array of offset tuples (or matched tokens) would be great!.
We often end up highlighting data from our business objects
(JPA/JDO/Hibernate) and not elastic stored data so for this case, most
useful would be a highlight consisting of list of matched fields with array
of matched tokens from the source (case does not matter)
On Monday, February 18, 2013 1:51:11 PM UTC-5, simonw wrote:
it might be a useful feature to add returning a stream of offset tuples
rather than highlighted strings. Even if we expose TermVectors they might
be too expensive to transfer while a "snipet descriptor" might be what some
people need.
simon
On Monday, February 18, 2013 4:55:39 PM UTC+1, gelmajjouti wrote:
Thanks for your reply.
Most of my data is searchable fulltext. I can't afford to store it in ES.
Greg
On Monday, February 18, 2013 4:45:12 PM UTC+1, Martijn v Groningen wrote:
Before you start and create a plugin / fork ES, you can just store a
specific field instead of the whole source and
highlight on that instead. Is that sufficient in your case?
I'm trying to make search highlighting work without storing sources in
Elasticsearch.
In other to do this, I intend to get search hits position from Lucene's
*TermFreqVector *class. I gather from an older thread that I need to
write a plugin for Elasticsearch.
I have two questions :
Is it feasible? Is there a way to extend that part of
Elasticsearch?
+100 To me getting data about highlighting is more useful than highlighted
text.
In case of searching _all (or any other non stored multi_field) I would
like to get say a list of matched tokens (ideally including any "fuzzy"
tokens) and I will do client side highlighting of my text based on that.
For cases when we search against set of stored/source fields, name of
field(s) with array of offset tuples (or matched tokens) would be great!.
We often end up highlighting data from our business objects
(JPA/JDO/Hibernate) and not elastic stored data so for this case, most
useful would be a highlight consisting of list of matched fields with array
of matched tokens from the source (case does not matter)
On Monday, February 18, 2013 1:51:11 PM UTC-5, simonw wrote:
it might be a useful feature to add returning a stream of offset tuples
rather than highlighted strings. Even if we expose TermVectors they might
be too expensive to transfer while a "snipet descriptor" might be what some
people need.
simon
On Monday, February 18, 2013 4:55:39 PM UTC+1, gelmajjouti wrote:
Thanks for your reply.
Most of my data is searchable fulltext. I can't afford to store it in ES.
Greg
On Monday, February 18, 2013 4:45:12 PM UTC+1, Martijn v Groningen wrote:
Before you start and create a plugin / fork ES, you can just store a
specific field instead of the whole source and
highlight on that instead. Is that sufficient in your case?
I'm trying to make search highlighting work without storing sources in
Elasticsearch.
In other to do this, I intend to get search hits position from
Lucene's *TermFreqVector *class. I gather from an older thread that I
need to write a plugin for Elasticsearch.
I have two questions :
Is it feasible? Is there a way to extend that part of
Elasticsearch?
+1 exactly a solution I might need. An array of matched field with position
instead of highlighted text would be great !
On Monday, February 18, 2013 5:43:35 PM UTC-5, AlexR wrote:
+100 To me getting data about highlighting is more useful than highlighted
text.
In case of searching _all (or any other non stored multi_field) I would
like to get say a list of matched tokens (ideally including any "fuzzy"
tokens) and I will do client side highlighting of my text based on that.
For cases when we search against set of stored/source fields, name of
field(s) with array of offset tuples (or matched tokens) would be great!.
We often end up highlighting data from our business objects
(JPA/JDO/Hibernate) and not elastic stored data so for this case, most
useful would be a highlight consisting of list of matched fields with array
of matched tokens from the source (case does not matter)
i do not know if it is possible, but if it would be fantastic if it were
possible to store start/end of individual fields in _all (or similar
composite fields) and translate highlight positions in _all to which source
fields got matched and positions in those fields in the source. That would
take care of virtually all highlighting needs!
On Mon, Feb 18, 2013 at 9:04 PM, Gildas Houmard ghoumard@gmail.com wrote:
+1 exactly a solution I might need. An array of matched field with
position instead of highlighted text would be great !
On Monday, February 18, 2013 5:43:35 PM UTC-5, AlexR wrote:
+100 To me getting data about highlighting is more useful than
highlighted text.
In case of searching _all (or any other non stored multi_field) I would
like to get say a list of matched tokens (ideally including any "fuzzy"
tokens) and I will do client side highlighting of my text based on that.
For cases when we search against set of stored/source fields, name of
field(s) with array of offset tuples (or matched tokens) would be great!.
We often end up highlighting data from our business objects
(JPA/JDO/Hibernate) and not elastic stored data so for this case, most
useful would be a highlight consisting of list of matched fields with array
of matched tokens from the source (case does not matter)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.