Problems with searching from document contents with mapper-attachments plugin

Hello,

I have a problem with searching from document contets. I 've been using Elasticsearch 1.7.3, but now I want to start using the most newest version (2.2.1). I installed new version of ES, installed mapper-attachments plugin. No errors, service started and all feels ok.

I made little mapping to test it out.

PUT test_index
{
    "settings": {
        "number_of_shards": 2,
        "number_of_replicas": 0
    },
    "mappings": {
        "testfile": {
            "dynamic": "strict",
            "_source": {
                "enabled": true
            },
            "properties": {
                "fileId": {
                    "type": "integer",
                    "store": true
                },
                "contents": {
                    "type": "attachment"
                }
            }
        }
    }
}

After indexing the document with contents "hello world" and making match_all query the results looks like this:

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 2,
      "successful": 2,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "testfile",
            "_id": "AVO3TKC2-BSsruK6CY8T",
            "_score": 1,
            "_source": {
               "fileId": 101,
               "contents": "aGVsbG8gd29ybGQ="
            }
         }
      ]
   }
}

If I'll try to search word "hello" with phrase search, it doesn't work. No results are returned. The phrase search looks like this:

POST test_index/_search
{
  "from": 0,
  "size": 1000,
  "fields": [
    "fileId"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "contents": {
              "query": "hello",
              "type": "phrase"
            }
          }
        }
      ]
    }
  }
}

Previous versions of ES and mapper-attachments worked fine. Also there's no errors in log. Can someone help me to figure it out?

Thanks in advance!

As you can see from the result of your match_all query, the document clearly wasn't posted with "contents": "hello world".

How are you indexing your sample document?

Well, it's base64 encoded as mapper-attachment plugin demands.

Ah, sorry. I somehow missed that crucial bit. Which took some pretty solid of selective reading, I grant you.

Just to check a few basic things:

Are you installing the plug-in by:

bin/plugin install mapper-attachments

on every node, and restarting the nodes?

Meanwhile, trying to reproduce your results.

I'm trying it out in my local computer and I have only 1 node. And yes, I'm installing plugin in the way you mentioned. Also I'm using 64-bit Windows 10 and have Java 1.8.0u74 installed.

I have configured attachment plugin to index all characters in elasticsearch.yml: index.mapping.attachment.indexed_chars: -1, but I guess that's not causing problems.

Is there a way to enable mapper-attachments logging in logging.yml configuration file?

I was able to reproduce what you are observing, but I'm not going to have time today to go back and see whether the query works on 1.7.3.

I also confirmed that this query does return your document:

POST /test_index/_search
{
"query": {
"query_string": {
"query": "hello"
}
}
}

Seems like maybe a behavior change between the versions. Could be worth filing a github issue.

Ok, I filed an github issue https://github.com/elastic/elasticsearch/issues/17359. Thanks for your help!