Not able to search through attachment contents

I am new to ElasticSearch and evaluating PDF files indexing.

I have used NEST .net plugin, to index pdf files. I have used steps as described in one of stackoverflow post http://stackoverflow.com/questions/25917386/client-net-nest-with-attachment-highlight-feature

I am able index pdf contents with Convert.ToBase64String method. and document is getting indexed.

I am able to search plain from Title field but not able to search text contents from PDF file, it returns me zero hits.

Can someone please help on this.

Did you install mapper attachments plugin? What can you see in logs?

Yes I have installed mapper attachments plugin and restarted cluster, it is displayed in clusters plugin list.Please refer screenshot for same:

Below is screenshot for uploaded sample index:

Here is sample C# code I have used:

Anything in elasticsearch logs?

Where I can find log files? Is there any option in ES-Head?

I dont see any error in ES logs:

Can you show the mapping for your type and what a typical JSON document looks like?

Please refer below C# code for mapping and i am uploading pdf as file:

Can you please run the following queries on your cluster?

  • GET /data/doc/1
  • GET /data/doc/_mapping

Yes, Here are results:

As you can see, your mapping is incorrect.
There is no attachment type in it.

So the file content is not analyzed with the mapper attachments plugin.

Remove your index, create it again, PUT the mapping, check that it has been applied, then index your docs.

how should I define field type as an attachment? any reference link for c#?

You can read the doc: https://github.com/elastic/elasticsearch-mapper-attachments#using-mapper-attachments

I don't know about C# so I can't tell how to translate that in that language. Might not be hard though.

@Suyog_Kale FYI in all of your pictures we can see your Found cluster ID, which means someone can potentially get access to your data.

I'd strongly suggest that you remove/edit the pictures.

Thank you David,

Now I am able to configure mapping and able to index pdf contents.

Now problem is when I execute search it returns records but not able to highlight actual file contents, it displays file binary data:

Any suggestion?

What I also observed is that even there is no match in contents it returns all records in search result:

Head plugin is buggy. Use POST instead of GET

Hi, I have the same issue that I cannot search from the attached document using NEST client
My mapping is

 {
 "mydocs": {
  "mappings": {
     "indexdocument": {
        "properties": {
           "docLocation": {
              "type": "string",
              "index": "not_analyzed",
              "store": true
           },
           "documentType": {
              "type": "string",
              "store": true
           },
           "file": {
              "type": "attachment",
              "fields": {
                 "content": {
                    "type": "string",
                    "analyzer": "full"
                 },
                 "author": {
                    "type": "string"
                 },
                 "title": {
                    "type": "string",
                    "term_vector": "with_positions_offsets",
                    "analyzer": "full"
                 },
                 "name": {
                    "type": "string"
                 },
                 "date": {
                    "type": "date",
                    "format": "strict_date_optional_time||epoch_millis"
                 },
                 "keywords": {
                    "type": "string"
                 },
                 "content_type": {
                    "type": "string"
                 },
                 "content_length": {
                    "type": "integer"
                 },
                 "language": {
                    "type": "string"
                 }
              }
           },
           "filePermissionInfo": {
              "properties": {
                 "accessControlType": {
                    "type": "string",
                    "store": true
                 },
                 "accountValue": {
                    "type": "string",
                    "store": true
                 },
                 "fileSystemRights": {
                    "type": "string",
                    "store": true
                 },
                 "isInherited": {
                    "type": "string",
                    "store": true
                 }
              }
           },
           "id": {
              "type": "double",
              "store": true
           },
           "lastModifiedDate": {
              "type": "date",
              "store": true,
              "format": "strict_date_optional_time||epoch_millis"
           },
           "otherDetails": {
              "type": "string"
           },
           "title": {
              "type": "string",
              "store": true,
              "term_vector": "with_positions_offsets"
           }
        }
     }
  }
 }
}

My Post query is working fine

POST /mydocs/_search
{
"query" : {
    "bool" : {
        "must" : [
           
            { "match" : { "filePermissionInfo.accountValue" : "S-1-5-18"}} ,
           { "match":{"otherDetails":"xyz"}},
            { "match":{"file.content":"abc"}}              
           
        ]
    }
}
}

But when I convert it to C#, Its not working. If I remove the File.Content field from the match query , it returns resultset. So I think the problem is with the attachment field. It is base64 encoded

var queryResult = client.Search<IndexDocument>(s => s
                            .Index("mydocs")
                            .Query(q => q
                            .Bool(b => b
                            .Must(m =>
                                 m.Match(mt1 => mt1.Field(f1 => f1.DocumentType).Query(queryTerm)) &&
                                 m.Match(mt2 => mt2.Field(f2 => f2.FilePermissionInfo.First().AccountValue).Query(accountName)) &&
                                 m.Match(mt3 => mt3.Field(f3 => f3.OtherDetails).Query(other))
                             ))) );

Can you please help?

@dadoonet Can you please look into my issue?