Attachment Mapper and Searching


(Eric Daniels) #1

So I'm fairly new to Elasticsearch and am having trouble trying to get attachments to work. The system I'm working with pretty much does everything through the Java API and while I've been able to figure out most stuff getting attachments to work has been eluding me. I don't know if the problem is in how I've set up the mapping, or in how I'm trying search.

Basically, attachments seem to be properly indexed and I'm not getting errors in the log. I definitely know if the plugin isn't working or loaded because I'll get a ton of errors. If I search for the Byte64 encoded string itself I'll get back the attachment, so I know that the attachment itself is getting to the system.. But if I try to search on the contents of the attachment (a basic pdf) I get no results.

Mapping ends up looking like this:

   "InfoEnvModel_Problem": {
    "_all": {
      "index_analyzer": "nGram_analyzer",
      "search_analyzer": "whitespace_analyzer"
    },
    "properties": {
      "InfoEnvModel_hasCollection": {
        "type": "string",
        "index": "not_analyzed",
        "ignore_above": 5000
      },
      "InfoEnvModel_hasKleStatus": {
        "type": "nested",
        "properties": {
          "reason": {
            "type": "string",
            "index_analyzer": "nGram_analyzer",
            "search_analyzer": "whitespace_analyzer",
            "fields": {
              "raw": {
                "type": "string",
                "index": "not_analyzed",
                "ignore_above": 5000
              }
            }
          },
          "status": {
            "type": "string",
            "index_analyzer": "nGram_analyzer",
            "search_analyzer": "whitespace_analyzer",
            "fields": {
              "raw": {
                "type": "string",
                "index": "not_analyzed",
                "ignore_above": 5000
              }
            }
          }
        }
      },
     "InfoEnvModel_hasManeuverStatus": {

      ..................

      },
      "InfoEnvModel_hasTaskInitiationAttachment": {
        "type": "attachment",
        "path": "full",
        "fields": {
          "InfoEnvModel_hasTaskInitiationAttachment": {
            "type": "string"
          },
          "author": {
            "type": "string"
          },
          "title": {
            "type": "string"
          },
          "name": {
            "type": "string"
          },
          "date": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "keywords": {
            "type": "string"
          },
          "content_type": {
            "type": "string"
          },
          "content_length": {
            "type": "integer"
          },
          "language": {
            "type": "string"
          }
        }
      },

      .............

Which seems to be correct. I've played around with overriding the various default meta fields and trying to add a 'file' or 'content' field which doesn't break anything but hasn't solved the problem. I just curious if anyone could give me some direction on how to figure out what I'm doing wrong.

Thanks for any help,

Eric


(Eric Daniels) #2

So no one has any advice on how to use the attachment mapper or what my problem might be related to? Is there somewhere else that would be better to ask to get some help? The Github site for it didn't seem to have any kind of forum.


(David Pilato) #3

Well. Sometimes it's super hard to have an opinion without a full script which helps to reproduce the issue.

Note: it's the right place to ask for questions.


(Suyog Kale) #4

I am facing similar problem


(David Pilato) #5

Which is?

Any script to reproduce the issue?


(Eric Daniels) #6

The problem is we do all of the elasticsearch work in java code through the java API. I don't have a script I can post that replicates this because we don't do anything through scripts. I'd I have to post most of our code base for it to be replicated.

I can show you the mappings or results of searches, because I can hit the elasticsearch endpoint directly. I can try to answer questions or try things and let you know the results, or describe our system in the best detail I can, whatever would help. This was why I was hoping for someone pointing me in the right direction to look for the problem, because I realize it would be difficult for someone to actually try to see the system itself.

Where I am right now is that the mapping seems correct, in that all the meta fields are properly there; the plugin (loaded through maven) seems to be working, because I'm not seeing any errors in the log about not having anything to handle attachments (which I saw when it is not properly set in maven); and I can do a search (against _all) for the exact Byte64 encoded string and find a match. But searching for actual contents returns nothing.

Maybe it is working fine and I'm doing the query wrong. I thought that the attachment contents index results ended up in _all as well, so just hitting the query endpoint should work but maybe I'm missing something here.


(David Pilato) #7

From what I saw, it looks correct. But obviously it does not give expected results. So it's incorrect.

I can't tell why without being able to see the code or reproduce.

My experience with elasticsearch is that each I started to write a script to reproduce or a sample Java project, I ended up solving the issue by myself.

So, please try to reproduce what you see and share with us here. I'm afraid I can't help otherwise.


(system) #8