Highlight whole sentence


(Guram Kajaia) #1

Hello guys.

Please run this curl recreation.
https://gist.github.com/3442578

As you can see, i'm search work 'elasticsearch' in text : ElasticSearch can
be used to search all 1kind of documents. It provides a scalable search
solution, has near real-time search and support for multitenancy.[5]
ElasticSearch is distributed, which means that indices can be divided into
shards and each shard can have zero or more replicas. Each node hosts one
or more shards, and acts as a coordinator to delegate operations to the
correct shard(s). Rebalancing and routing are done automatically.

and highlighted text returned:
"ElasticSearch can be used to search all 1kind of documents. It
provides a scalable search solution",
", has near real-time search and support for multitenancy.[5]
ElasticSearch is distributed, which"

Is it possible to highlight text where matched text is between dots ? I
want to get this highlight :
"ElasticSearch can be used to search all 1kind of documents.",
"ElasticSearch is distributed, which means that indices can be
divided into shards and each shard can have zero or more replicas."

Thanks.
GuriK.

--


(David Pilato) #2

Hi GuriK,

Did you look at: http://www.elasticsearch.org/guide/reference/api/search/highlighting.html ?
I think that the last part answers to your needs:
Boundary Characters

When highlighting a field that is mapped with term vectors, boundary_chars can be configured to define what constitutes a boundary for highlighting. Its a single string with each boundary character defined in it. It defaults to .,!? \t\n.

The boundary_max_size allows to control how far to look for boundary characters, and defaults to 20.

I never played with it myself. But I hope this could help.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 24 août 2012 à 00:15, GuriK guram.kajaia@gmail.com a écrit :

Hello guys.

Please run this curl recreation.
https://gist.github.com/3442578

As you can see, i'm search work 'elasticsearch' in text : ElasticSearch can be used to search all 1kind of documents. It provides a scalable search solution, has near real-time search and support for multitenancy.[5] ElasticSearch is distributed, which means that indices can be divided into shards and each shard can have zero or more replicas. Each node hosts one or more shards, and acts as a coordinator to delegate operations to the correct shard(s). Rebalancing and routing are done automatically.

and highlighted text returned:
"ElasticSearch can be used to search all 1kind of documents. It provides a scalable search solution",
", has near real-time search and support for multitenancy.[5] ElasticSearch is distributed, which"

Is it possible to highlight text where matched text is between dots ? I want to get this highlight :
"ElasticSearch can be used to search all 1kind of documents.",
"ElasticSearch is distributed, which means that indices can be divided into shards and each shard can have zero or more replicas."

Thanks.
GuriK.

--

--


(Guram Kajaia) #3

Hi David.

I mapped my field like this "message" : { "type" : "string", "term_vector"
: "with_positions_offsets"} but results is still not what i wanted.
All i want to do is to get text between dots no matter how long that text
will be.

GuriK

On Fri, Aug 24, 2012 at 5:43 AM, David Pilato david@pilato.fr wrote:

Hi GuriK,

Did you look at:
http://www.elasticsearch.org/guide/reference/api/search/highlighting.html?
I think that the last part answers to your needs:
Boundary Characters

When highlighting a field that is mapped with term vectors, boundary_chars can
be configured to define what constitutes a boundary for highlighting. Its a
single string with each boundary character defined in it. It defaults to .,!?
\t\n.

The boundary_max_size allows to control how far to look for boundary
characters, and defaults to 20.

I never played with it myself. But I hope this could help.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 24 août 2012 à 00:15, GuriK guram.kajaia@gmail.com a écrit :

Hello guys.

Please run this curl recreation.
https://gist.github.com/3442578

As you can see, i'm search work 'elasticsearch' in text : ElasticSearch
can be used to search all 1kind of documents. It provides a scalable search
solution, has near real-time search and support for multitenancy.[5]
ElasticSearch is distributed, which means that indices can be divided into
shards and each shard can have zero or more replicas. Each node hosts one
or more shards, and acts as a coordinator to delegate operations to the
correct shard(s). Rebalancing and routing are done automatically.

and highlighted text returned:
"ElasticSearch can be used to search all 1kind of documents. It
provides a scalable search solution",
", has near real-time search and support for multitenancy.[5]
ElasticSearch is distributed, which"

Is it possible to highlight text where matched text is between dots ? I
want to get this highlight :
"ElasticSearch can be used to search all 1kind of documents.",
"ElasticSearch is distributed, which means that indices can be
divided into shards and each shard can have zero or more replicas."

Thanks.
GuriK.

--

--

--


(David Pilato) #4

Did you set
boundary_max_size to 0

When highlighting?

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 24 août 2012 à 10:35, Guram Kajaia guram.kajaia@gmail.com a écrit :

Hi David.

I mapped my field like this "message" : { "type" : "string", "term_vector" : "with_positions_offsets"} but results is still not what i wanted.
All i want to do is to get text between dots no matter how long that text will be.

GuriK

On Fri, Aug 24, 2012 at 5:43 AM, David Pilato david@pilato.fr wrote:
Hi GuriK,

Did you look at: http://www.elasticsearch.org/guide/reference/api/search/highlighting.html ?
I think that the last part answers to your needs:
Boundary Characters

When highlighting a field that is mapped with term vectors, boundary_chars can be configured to define what constitutes a boundary for highlighting. Its a single string with each boundary character defined in it. It defaults to .,!? \t\n.

The boundary_max_size allows to control how far to look for boundary characters, and defaults to 20.

I never played with it myself. But I hope this could help.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 24 août 2012 à 00:15, GuriK guram.kajaia@gmail.com a écrit :

Hello guys.

Please run this curl recreation.
https://gist.github.com/3442578

As you can see, i'm search work 'elasticsearch' in text : ElasticSearch can be used to search all 1kind of documents. It provides a scalable search solution, has near real-time search and support for multitenancy.[5] ElasticSearch is distributed, which means that indices can be divided into shards and each shard can have zero or more replicas. Each node hosts one or more shards, and acts as a coordinator to delegate operations to the correct shard(s). Rebalancing and routing are done automatically.

and highlighted text returned:
"ElasticSearch can be used to search all 1kind of documents. It provides a scalable search solution",
", has near real-time search and support for multitenancy.[5] ElasticSearch is distributed, which"

Is it possible to highlight text where matched text is between dots ? I want to get this highlight :
"ElasticSearch can be used to search all 1kind of documents.",
"ElasticSearch is distributed, which means that indices can be divided into shards and each shard can have zero or more replicas."

Thanks.
GuriK.

--

--

--

--


(Guram Kajaia) #5

Yes.
Result is the same ...

On Fri, Aug 24, 2012 at 5:04 PM, David Pilato david@pilato.fr wrote:

Did you set
boundary_max_size to 0

When highlighting?

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 24 août 2012 à 10:35, Guram Kajaia guram.kajaia@gmail.com a écrit :

Hi David.

I mapped my field like this "message" : { "type" : "string",
"term_vector" : "with_positions_offsets"} but results is still not what
i wanted.
All i want to do is to get text between dots no matter how long that text
will be.

GuriK

On Fri, Aug 24, 2012 at 5:43 AM, David Pilato david@pilato.fr wrote:

Hi GuriK,

Did you look at:
http://www.elasticsearch.org/guide/reference/api/search/highlighting.html?
I think that the last part answers to your needs:
Boundary Characters

When highlighting a field that is mapped with term vectors,
boundary_chars can be configured to define what constitutes a boundary
for highlighting. Its a single string with each boundary character defined
in it. It defaults to .,!? \t\n.

The boundary_max_size allows to control how far to look for boundary
characters, and defaults to 20.

I never played with it myself. But I hope this could help.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 24 août 2012 à 00:15, GuriK guram.kajaia@gmail.com a écrit :

Hello guys.

Please run this curl recreation.
https://gist.github.com/3442578

As you can see, i'm search work 'elasticsearch' in text : ElasticSearch
can be used to search all 1kind of documents. It provides a scalable search
solution, has near real-time search and support for multitenancy.[5]
ElasticSearch is distributed, which means that indices can be divided into
shards and each shard can have zero or more replicas. Each node hosts one
or more shards, and acts as a coordinator to delegate operations to the
correct shard(s). Rebalancing and routing are done automatically.

and highlighted text returned:
"ElasticSearch can be used to search all 1kind of documents. It
provides a scalable search solution",
", has near real-time search and support for multitenancy.[5]
ElasticSearch is distributed, which"

Is it possible to highlight text where matched text is between dots ? I
want to get this highlight :
"ElasticSearch can be used to search all 1kind of documents.",
"ElasticSearch is distributed, which means that indices can be
divided into shards and each shard can have zero or more replicas."

Thanks.
GuriK.

--

--

--

--

--


(Nick Dunn) #6

I don't understand the "between dots" part; I can't see the pattern between the actual output and the desired output. Are you saying you always want the highlighted (wrapped with EM) to be the first characters of the highlight excerpt, and not midway within the string?

--


(David Pilato) #7

I think he wants to highlight a full sentence and not 5 words before and after the highlighted term.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 25 août 2012 à 08:23, Nick Dunn nick@nick-dunn.co.uk a écrit :

I don't understand the "between dots" part; I can't see the pattern between the actual output and the desired output. Are you saying you always want the highlighted (wrapped with EM) to be the first characters of the highlight excerpt, and not midway within the string?

--

--


(Guram Kajaia) #8

You're right David. I don't want to specify how many words to highlight
before or/and after matched text. I just want full sentence which includes
matched text.

On Sat, Aug 25, 2012 at 1:08 PM, David Pilato david@pilato.fr wrote:

I think he wants to highlight a full sentence and not 5 words before and
after the highlighted term.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 25 août 2012 à 08:23, Nick Dunn nick@nick-dunn.co.uk a écrit :

I don't understand the "between dots" part; I can't see the pattern
between the actual output and the desired output. Are you saying you always
want the highlighted (wrapped with EM) to be the first characters of the
highlight excerpt, and not midway within the string?

--

--

--


(Nick Dunn) #9

Sounds like this is a task for your own application logic to parse and extract in my opinion.

--


(Guram Kajaia) #10

Yes, maybe...
is any way to write plugin for elasticsearch which will do what i want ? in
this case highlighting whole sentence...

On Sun, Aug 26, 2012 at 8:04 AM, Nick Dunn nick@nick-dunn.co.uk wrote:

Sounds like this is a task for your own application logic to parse and
extract in my opinion.

--

--


(phill) #11

This is really a problem of plugging in a different Fragmenter
http://lucene.apache.org/core/3_6_0/api/all/index.html
Which I do NOT believe is an extension point in ES.

-Paul

On 8/27/2012 4:45 AM, Guram Kajaia wrote:

Yes, maybe...
is any way to write plugin for elasticsearch which will do what i want
? in this case highlighting whole sentence...

On Sun, Aug 26, 2012 at 8:04 AM, Nick Dunn <nick@nick-dunn.co.uk
mailto:nick@nick-dunn.co.uk> wrote:

Sounds like this is a task for your own application logic to parse
and extract in my opinion.

--

--

--


(system) #12