Attachment highlighting while excluding from source?


(Shane Witbeck) #1

I've resorted to excluding attachment content to prevent the content of the
indexed files from being included as part of search results.

Is it still possible to do highlighting for the file contents? My initial
testing tells me no but I'm wondering if I'm missing something.


(Shay Banon) #2

You mean you excluded it so it won't be returned when you get back _source?
You can two options:

  1. Use partials in search request, to exclude the attachment, and still
    store the full source.
    http://www.elasticsearch.org/guide/reference/api/search/fields.html (see
    under partial). Note, this will still load the _source and parse it, and
    then exclude the attachment part.
  2. Exclude the attachment from _source in the mapping, but mark it as
    stored (store set to yet) in the mapping. Then, it will be stored "on its
    own" separate from _source.

On Fri, Apr 27, 2012 at 5:22 PM, Shane Witbeck shane@digitalsanctum.comwrote:

I've resorted to excluding attachment content to prevent the content of
the indexed files from being included as part of search results.

Is it still possible to do highlighting for the file contents? My initial
testing tells me no but I'm wondering if I'm missing something.


(Shane Witbeck) #3

Thanks Shay. I went with #2 and it works great.

On Sunday, April 29, 2012 12:50:11 PM UTC-4, kimchy wrote:

You mean you excluded it so it won't be returned when you get back
_source? You can two options:

  1. Use partials in search request, to exclude the attachment, and still
    store the full source.
    http://www.elasticsearch.org/guide/reference/api/search/fields.html (see
    under partial). Note, this will still load the _source and parse it, and
    then exclude the attachment part.
  2. Exclude the attachment from _source in the mapping, but mark it as
    stored (store set to yet) in the mapping. Then, it will be stored "on its
    own" separate from _source.

On Fri, Apr 27, 2012 at 5:22 PM, Shane Witbeck shane@digitalsanctum.comwrote:

I've resorted to excluding attachment content to prevent the content of
the indexed files from being included as part of search results.

Is it still possible to do highlighting for the file contents? My initial
testing tells me no but I'm wondering if I'm missing something.


(tuan) #4

#2 does not work for me. I could not figure out what I doing wrong here. Please help !
Here's my mapping:

mapping :_source => { :excludes => ['attachment_original'] } do
indexes :id, :type => 'integer'
indexes :folder_id, :type => 'integer'
indexes :attachment_file_name
indexes :attachment_updated_at, :type => 'date'
indexes :attachment_original, :type => 'attachment',
:fields => {:title => {:store => "yes"},
:attachment_original => {:term_vector => "with_positions_offsets", :store => "yes"}}

end

def to_indexed_json
to_json(:methods => [:attachment_original])
end

the curl command that is generated :

curl -X GET "http://localhost:9200/user_files/user_file/_search?pretty=true" -d '{"query":{"query_string":{"query":"rspec","default_operator":"AND"}},"filter":{"term":{"folder_id":1}},"highlight":{"fields":{"attachment_original":{}}}}'

the search works find, but the attachement file is included in the search results as seen here:

"_source" : {"user_file":{"id":5,"folder_id":1,"updated_at":"2012-08-16T11:32:41Z","attachment_file_size":179895,"attachment_updated_at":"2012-08-16T11:32:41Z","attachment_file_name":"hw4.pdf","attachment_content_type":"application/pdf","created_at":"2012-08-16T11:32:41Z","attachment_original":"JVBERi0xL .....

--
Tuan

Thanks Shay. I went with #2 and it works great.

On Sunday, April 29, 2012 12:50:11 PM UTC-4, kimchy wrote:

You mean you excluded it so it won't be returned when you get back
_source? You can two options:

  1. Use partials in search request, to exclude the attachment, and still
    store the full source.
    http://www.elasticsearch.org/guide/reference/api/search/fields.html (see
    under partial). Note, this will still load the _source and parse it, and
    then exclude the attachment part.
  2. Exclude the attachment from _source in the mapping, but mark it as
    stored (store set to yet) in the mapping. Then, it will be stored "on its
    own" separate from _source.

On Fri, Apr 27, 2012 at 5:22 PM, Shane Witbeck <shane@>wrote:

I've resorted to excluding attachment content to prevent the content of
the indexed files from being included as part of search results.

Is it still possible to do highlighting for the file contents? My initial
testing tells me no but I'm wondering if I'm missing something.


(tuan) #5

#2 does not work for me. I could not figure out what I doing wrong here.
Please help !
Here's my mapping:

mapping :_source => { :excludes => ['attachment_original'] } do
indexes :id, :type => 'integer'
indexes :folder_id, :type => 'integer'
indexes :attachment_file_name
indexes :attachment_updated_at, :type => 'date'
indexes :attachment_original, :type => 'attachment',
:fields => {:title => {:store => "yes"},
:attachment_original => {:term_vector =>
"with_positions_offsets", :store => "yes"}}

end

def to_indexed_json
to_json(:methods => [:attachment_original])
end

the curl command that is generated :

curl -X GET "http://localhost:9200/user_files/user_file/_search?pretty=true"
-d
'{"query":{"query_string":{"query":"rspec","default_operator":"AND"}},"filter":{"term":{"folder_id":1}},"highlight":{"fields":{"attachment_original":{}}}}'

the search works find, but the attachement file is included in the search
results as seen here:

"_source" : {"user_file":{"id":5,"folder_id":1,"updated_at":
"2012-08-16T11:32:41Z","attachment_file_size":179895,"attachment_updated_at"
:"2012-08-16T11:32:41Z","attachment_file_name":"hw4.pdf",
"attachment_content_type":"application/pdf","created_at":
"2012-08-16T11:32:41Z","attachment_original":"JVBERi0xL .....

--
Tuan

On Wednesday, May 2, 2012 8:34:10 PM UTC+7, Shane Witbeck wrote:

Thanks Shay. I went with #2 and it works great.

On Sunday, April 29, 2012 12:50:11 PM UTC-4, kimchy wrote:

You mean you excluded it so it won't be returned when you get back
_source? You can two options:

  1. Use partials in search request, to exclude the attachment, and still
    store the full source.
    http://www.elasticsearch.org/guide/reference/api/search/fields.html (see
    under partial). Note, this will still load the _source and parse it, and
    then exclude the attachment part.
  2. Exclude the attachment from _source in the mapping, but mark it as
    stored (store set to yet) in the mapping. Then, it will be stored "on its
    own" separate from _source.

On Fri, Apr 27, 2012 at 5:22 PM, Shane Witbeck <sh...@digitalsanctum.com<javascript:>

wrote:

I've resorted to excluding attachment content to prevent the content of
the indexed files from being included as part of search results.

Is it still possible to do highlighting for the file contents? My
initial testing tells me no but I'm wondering if I'm missing something.

--


(system) #6