Searching attachment content with ingest attachment plugin ES 5.2


(Divya Bhardwaj) #1

Hi All,

I am facing a issue while searching the attachment data: I have installed the ingest attachment plugin, created the for each processor, "being the attachment as array"; the mapping goes as:

"attachment": {
"properties": {
"attachment_data": {
"type": "text",
"store": true,
"term_vector": "with_positions_offsets"
},
"attachment_id": {
"type": "text",
"store": true,
"term_vector": "with_positions_offsets"
},
}
}

one of the record has data as:

"attachment": {

           "content_type": "xyz",
           "language": "it",
           "content": "xyz",
           "attachment_data":"base64 encoded"
           "attachment_id":"xxxxx"

}

but when I search "xyz", it gives no record, search is:

"query": {
"bool" : {
"must": [{ "query_string" : {
"fields": ["_all"],
"query": "xyz"
}}],

I have tried with:

"query": {
"bool" : {
"must": [{ "query_string" : {
"fields": ["attachment.attachment_data"],
"query": "xyz"
}}],

or even:
"query": {
"bool" : {
"must": [{ "query_string" : {
"fields": ["attachment.content"],
"query": "xyz"
}}],

but everytime, a "0" result.

any help is appreciated.

Best,
Divya


(David Pilato) #2

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:

```
CODE
```

If you provide a full recreation script it can be easier to help.


(Divya Bhardwaj) #3

</>"attachment": {

       "content_type": "xyz",
       "language": "it",
       "content": "xyz",
       "attachment_data":"base64 encoded"
       "attachment_id":"xxxxx"

}</>

but when I search "xyz", it gives no record, search is:

</>"query": {
"bool" : {
"must": [{ "query_string" : {
"fields": ["_all"],
"query": "xyz"
}}],</>

I have tried with:

</>"query": {
"bool" : {
"must": [{ "query_string" : {
"fields": ["attachment.attachment_data"],
"query": "xyz"
}}],</>

or even:
</>"query": {
"bool" : {
"must": [{ "query_string" : {
"fields": ["attachment.content"],
"query": "xyz"
}}],</>

Appreciate your help


(David Pilato) #4

Did you read my answer?


(Divya Bhardwaj) #5

the mapping is:

               "attachment": {
                  "properties": {
                     "attachment_data": {
                        "type": "text",
                        "store": true,
                        "term_vector": "with_positions_offsets",
                        "fielddata": true
                     },
                     "attachment_id": {
                        "type": "text",
                        "store": true,
                        "term_vector": "with_positions_offsets"
                     },

Ingest attachment plugin was used, processor and pipeline has been created as:

"processors": [
         {
            "foreach": {
               "field": "attachment",
               "processor": {
                  "attachment": {
                     "target_field": "_ingest._value.attachment",
                     "field": "_ingest._value.attachment_data"
                  }
               }
            }
 

I am not able to search the content, though it can be seen when /_search is used, but unable to search in match query with attachment.attachment.data field. my search is:

  "query": {
    "match": {
      "attachment.attachment.content":"word"
    }
  }

apologies for inconvenience in the format.

Best,
Divya


(David Pilato) #6

If you provide a full recreation script it can be easier to help.

How can I replay what you are doing without a script?

As explained in About the Elasticsearch category, provide something like:

DELETE index
PUT index/type/1
{
  "foo": "bar"
}
GET index/type/_search
{
  "query": {
    "match": {
      "foo": "bar"
    }
  }
}

Please try with the minimal settings/mappings/content...
If this forum rejects your post because of the number of characters, you can post your full script on gist.github.com and paste the link here.


(Divya Bhardwaj) #7

Thanks David for the consistent acknowledgement to my problem.
But my issue comes out to be related to:

https://www.elastic.co/guide/en/elasticsearch/plugins/master/ingest-attachment-with-arrays.html

I am unable to search any time inside content of the attachment which is decoded.

``
"attachments" : [
{
"filename" : "ipsum.txt",
"data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo=",
"attachment" : {
"content_type" : "text/plain; charset=ISO-8859-1",
"language" : "en",
"content" :

"this is\njust some text",

"content_length" : 24
}
}
``

I cannot use match query to search the "just" keyword.

though filename in the same is searchable with "attachments.filename"

Please let me know if this helps in understanding the use-case.

Best,
Divya


(David Pilato) #8

Why did you open a new discussion? Can you remove it?

Can you please provide a full script I can use to reproduce locally your problem?


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.