Not able to search through attachment contents


(David Pilato) #16

You can read the doc: https://github.com/elastic/elasticsearch-mapper-attachments#using-mapper-attachments

I don't know about C# so I can't tell how to translate that in that language. Might not be hard though.


(Mark Walkom) #17

@Suyog_Kale FYI in all of your pictures we can see your Found cluster ID, which means someone can potentially get access to your data.

I'd strongly suggest that you remove/edit the pictures.


(Suyog Kale) #18

Thank you David,

Now I am able to configure mapping and able to index pdf contents.

Now problem is when I execute search it returns records but not able to highlight actual file contents, it displays file binary data:

Any suggestion?


(Suyog Kale) #19

What I also observed is that even there is no match in contents it returns all records in search result:


(David Pilato) #20

Head plugin is buggy. Use POST instead of GET


(Aj) #21

Hi, I have the same issue that I cannot search from the attached document using NEST client
My mapping is

 {
 "mydocs": {
  "mappings": {
     "indexdocument": {
        "properties": {
           "docLocation": {
              "type": "string",
              "index": "not_analyzed",
              "store": true
           },
           "documentType": {
              "type": "string",
              "store": true
           },
           "file": {
              "type": "attachment",
              "fields": {
                 "content": {
                    "type": "string",
                    "analyzer": "full"
                 },
                 "author": {
                    "type": "string"
                 },
                 "title": {
                    "type": "string",
                    "term_vector": "with_positions_offsets",
                    "analyzer": "full"
                 },
                 "name": {
                    "type": "string"
                 },
                 "date": {
                    "type": "date",
                    "format": "strict_date_optional_time||epoch_millis"
                 },
                 "keywords": {
                    "type": "string"
                 },
                 "content_type": {
                    "type": "string"
                 },
                 "content_length": {
                    "type": "integer"
                 },
                 "language": {
                    "type": "string"
                 }
              }
           },
           "filePermissionInfo": {
              "properties": {
                 "accessControlType": {
                    "type": "string",
                    "store": true
                 },
                 "accountValue": {
                    "type": "string",
                    "store": true
                 },
                 "fileSystemRights": {
                    "type": "string",
                    "store": true
                 },
                 "isInherited": {
                    "type": "string",
                    "store": true
                 }
              }
           },
           "id": {
              "type": "double",
              "store": true
           },
           "lastModifiedDate": {
              "type": "date",
              "store": true,
              "format": "strict_date_optional_time||epoch_millis"
           },
           "otherDetails": {
              "type": "string"
           },
           "title": {
              "type": "string",
              "store": true,
              "term_vector": "with_positions_offsets"
           }
        }
     }
  }
 }
}

My Post query is working fine

POST /mydocs/_search
{
"query" : {
    "bool" : {
        "must" : [
           
            { "match" : { "filePermissionInfo.accountValue" : "S-1-5-18"}} ,
           { "match":{"otherDetails":"xyz"}},
            { "match":{"file.content":"abc"}}              
           
        ]
    }
}
}

But when I convert it to C#, Its not working. If I remove the File.Content field from the match query , it returns resultset. So I think the problem is with the attachment field. It is base64 encoded

var queryResult = client.Search<IndexDocument>(s => s
                            .Index("mydocs")
                            .Query(q => q
                            .Bool(b => b
                            .Must(m =>
                                 m.Match(mt1 => mt1.Field(f1 => f1.DocumentType).Query(queryTerm)) &&
                                 m.Match(mt2 => mt2.Field(f2 => f2.FilePermissionInfo.First().AccountValue).Query(accountName)) &&
                                 m.Match(mt3 => mt3.Field(f3 => f3.OtherDetails).Query(other))
                             ))) );

Can you please help?


(Aj) #22

@dadoonet Can you please look into my issue?


(David Pilato) #23

No. I don't know C#.


(Rakesh Kumar Saharan) #24

Hi @dadoonet

I am trying to implement attachment mapper search using ruby and for file uploading on amazon s3, i am using paperclip.

My problem is similar i am not able to search through the document but i can search on other parameters like title or etc.

Mapping:

{
"notes" : {
"mappings" : {
"note" : {
"properties" : {
"attachment" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachment_content_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachment_data" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachment_file_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachment_file_size" : {
"type" : "long"
},
"attachment_updated_at" : {
"type" : "date"
},
"company_id" : {
"type" : "long"
},
"created_at" : {
"type" : "date"
},
"id" : {
"type" : "long"
},
"member_id" : {
"type" : "long"
},
"note" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"noteable_id" : {
"type" : "long"
},
"noteable_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"updated_at" : {
"type" : "date"
},
"visibility_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}

And here is mapper code:

mapping _source: { excludes: ["attachment"] } do
self.tire.indexes :id, type: "integer"
self.tire.indexes :note

self.tire.indexes :attachment, :type => 'attachment',
  :fields => {
    :name=> { :store => 'yes' },
    :content=> { :store => 'yes' },
    :title=> { :store => 'yes' },
    :file=> { :term_vector => 'with_positions_offsets', :store => 'yes' },
    :date=> { :store => 'yes' }

    }

end

def attachment_data
Base64.encode64(open(PATH_TO_ATTACHMENT) { |file| file.read })
end


(David Pilato) #25

But the mapping you showed doesn't contain a field of type "attachment".

You need to fix the mapping.


(Rakesh Kumar Saharan) #26

@dadoonet
Here is the Mappingmapping _source: { excludes: ["attachment"] } do
self.tire.indexes :id, type: "integer"
self.tire.indexes :note
#self.tire.indexes :attachment, type: "attachment_data"
self.tire.indexes :attachment, :type => 'attachment',
:fields => {
:name=> { :store => 'yes' },
:content=> { :store => 'yes' },
:title=> { :store => 'yes' },
:file=> { :term_vector => 'with_positions_offsets', :store => 'yes' },
:date=> { :store => 'yes' }

    }

end


(David Pilato) #27

What gives GET yourindex/_mapping?


(Rakesh Kumar Saharan) #28
9200/notes/_mapping?pretty=true'
{
  "notes" : {
"mappings" : {
  "note" : {
    "properties" : {
      "attachment" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "attachment_content_type" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "attachment_data" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "attachment_file_name" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "attachment_file_size" : {
        "type" : "long"
      },
      "attachment_updated_at" : {
        "type" : "date"
      },
      "company_id" : {
        "type" : "long"
      },
      "created_at" : {
        "type" : "date"
      },
      "id" : {
        "type" : "long"
      },
      "member_id" : {
        "type" : "long"
      },
      "note" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "noteable_id" : {
        "type" : "long"
      },
      "noteable_type" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "updated_at" : {
        "type" : "date"
      },
      "visibility_type" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      }
    }
  }
}
  }
}`Preformatted text`

(David Pilato) #29

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Can you see any "type": "attachment" in your mapping? I don't see any.
That's your problem as I said before.


(Rakesh Kumar Saharan) #30

@dadoonet Right, Mapper attachment plugin is installed but attachment type is coming out is a text not an attachment not sure why , that where i am stuck


(David Pilato) #31

Because you did not specify it.

I mean that you created a mapping at some point. But without the needed type.

Remove the index, put a correct mapping and start again your application.


(Rakesh Kumar Saharan) #32
@dadoonet I did specify it,Please see below

mapping _source: { excludes: ["attachment"] } do
self.tire.indexes :id, type: "integer"
self.tire.indexes :note
self.tire.indexes :attachment, :type => 'attachment',
:fields => {
:name=> { :store => 'yes' },
:content=> { :store => 'yes' },
:title=> { :store => 'yes' },
:file=> { :term_vector => 'with_positions_offsets', :store => 'yes' },
:date=> { :store => 'yes' }

}

I am following this http://rny.io/rails/elasticsearch/2013/08/05/full-text-search-for-attachments-with-rails-and-elasticsearch.html


(David Pilato) #33

It's not in the mapping at the end. So you are probably doing something wrong.
I can't tell as I don't know Rails/Ruby.

But if you follow the documentation and replay this pure REST script it should work.

As you can see, the mapping is defined:

PUT /trying-out-mapper-attachments
{
  "mappings": {
    "person": {
      "properties": {
        "cv": { "type": "attachment" }
}}}}

(Rakesh Kumar Saharan) #34
@dadoonet  This is what i am getting in json mapping, I have installed attachment plugin but still it shows attachmet_data type as text not sure why

 {
	"notes": {
		"aliases": {},
		"mappings": {
			"note": {
				"properties": {
					"attachment_content_type": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					},
					"attachment_data": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					},
					"attachment_file_name": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					},
					"attachment_file_size": {
						"type": "long"
					},
					"attachment_updated_at": {
						"type": "date"
					},
					"company_id": {
						"type": "long"
					},
					"created_at": {
						"type": "date"
					},
					"id": {
						"type": "long"
					},
					"member_id": {
						"type": "long"
					},
					"note": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					},
					"noteable_id": {
						"type": "long"
					},
					"noteable_type": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					},
					"updated_at": {
						"type": "date"
					},
					"visibility_type": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					}
				}
			}
		},
		"settings": {
			"index": {
				"creation_date": "1483437269500",
				"number_of_shards": "5",
				"number_of_replicas": "1",
				"uuid": "mXHekbRiRB2z9lmUYDDbJA",
				"version": {
					"created": "5000199"
				},
				"provided_name": "notes"
			}
		}
	}
}

(system) closed #35