Searching attachment content with ingest attachment plugin


(Ray Yip) #1

Hi, I had tried to follow the example. I can success to insert attachment and get in the details. But don't know how to do search with content. For example, I would like to return "filename" & word position in document when content contains "some" keyword. Please find the enclosed scripts.

curl -XPUT 'http://192.168.196.248:9200/_ingest/pipeline/attachment?pretty' -H 'Content-Type: application/json' -d'
{
"description" : "Extract attachment information from arrays",
"processors" : [
{
"foreach": {
"field": "attachments",
"processor": {
"attachment": {
"target_field": "_ingest._value.attachment",
"field": "_ingest._value.data"
}
}
}
}
]
}
'
curl -XPUT 'http://192.168.196.248:9200/my_index/my_type/my_id?pipeline=attachment&pretty' -H 'Content-Type: application/json' -d'
{
"attachments" : [
{
"filename" : "ipsum.txt",
"data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
},
{
"filename" : "test.txt",
"data" : "VGhpcyBpcyBhIHRlc3QK"
}
]
}
'
curl -XGET 'http://192.168.196.248:9200/my_index/my_type/my_id?pretty'

Thanks,
Ray


(David Pilato) #2

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:

```
CODE
```

I would like to return "filename"

The problem here is that you are indexing multiple files instead of a single one.
The easiest way is to index one file by one file like:

PUT attachments/doc/1?pipeline=attachment
{
  "filename" : "ipsum.txt",
  "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
}

If you really want to have multiple files and being able to associate the filename with its content, then you need to use nested fields. See https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

I would like to return word position

The only position you can get is the the position in the extracted text, not in the source document. This later information is not available.
For the "position" within the extracted text Highlighting might help: https://www.elastic.co/guide/en/elasticsearch/reference/6.0/search-request-highlighting.html


(Ray Yip) #3

Hi,

It's great with thanks a lot your value input. The problem is resolved.

Thanks,
Ray


(system) closed #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.