Search not working with mapper-attachment plugin


(Sahas Sahas) #1

I am using ES 1.7 and have installed mapper-attachment plugin version 2.7.0.

I am following the steps from the below URL to understand the basic functionality.
http://www.elasticsearch.cn/tutorials/2011/07/18/attachment-type-in-action.html

I am executing the commands in below sequence-

curl -X DELETE "localhost:9200/test"
curl -X PUT "localhost:9200/test/attachment/\_mapping" -d '{
  "attachment" : {
    "properties" : {
      "file" : {
        "type" : "attachment",
        "fields" : {
          "title" : { "store" : "yes" },
          "file" : { "term_vector":"with_positions_offsets", "store":"yes" }
        }
      }
    }
  }
}'
  1. Indexing the data using below scripts-
#!/bin/sh
coded=`cat fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json="{\"file\":\"${coded}\"}"
echo "$json" > json.file
curl -X POST "localhost:9200/test/attachment/" -d @json.file

and doing search using below URL

curl "localhost:9200/_search?pretty=true" -d '{
  "fields" : ["title"],
  "query" : {
    "query_string" : {
      "query" : "amplifier"
    }
  },
  "highlight" : {
    "fields" : {
      "file" : {}
    }
  }
}'

this search query always ended up with '0' hits though the word i am searching for present in my file.


(David Pilato) #2

Did you really execute this? curl -X PUT "localhost:9200/test/attachment/\_mapping"

It should be curl -X PUT "localhost:9200/test/attachment/_mapping" right?


(Sahas Sahas) #3

Hello David,

Thank you for replying.

I am getting the below error when i execute curl -X PUT "localhost:9200/test/attachment/_mapping" on OSX 10.10-

error":"MapperParsingException[No handler for type [attachment] declared on field [file]]","status":400

this error only gets remove when i add the escape character.


(David Pilato) #4

It means that the plugin is not loaded.
May be you did not restart the node?


(Zipang) #5

The fact is that i ended up following the exact same tutorial with ElasticSearch 1.5.2 and the attachment plugin 2.5.0, and i got the same result !! Just an empty resultset.
(As you all know the the script is downloable as a gist here : https://gist.github.com/lukas-vlcek/1075067 so it is really easy to reproduce the behavior)

I wondered if there was any API change since the tutorial as i followed the documentation and never had a query returning a single resultset on an attachment.

Should we upgrade to a more recent version of the plugin and ES ?

Here is the log of ES starting :

[2015-08-25 18:13:41,878][INFO ][node                     ] [Ka-Zar] version[1.5.2], pid[4509], build[62ff986/2015-04-27T09:21:06Z]
[2015-08-25 18:13:41,878][INFO ][node                     ] [Ka-Zar] initializing ...
[2015-08-25 18:13:41,934][INFO ][plugins                  ] [Ka-Zar] loaded [mapper-attachments], sites []
[2015-08-25 18:13:47,029][INFO ][node                     ] [Ka-Zar] initialized
[2015-08-25 18:13:47,030][INFO ][node                     ] [Ka-Zar] starting ...
[2015-08-25 18:13:47,260][INFO ][transport                ] [Ka-Zar] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.2:9300]}
[2015-08-25 18:13:47,273][INFO ][discovery                ] [Ka-Zar] elasticsearch/EfPnAnHaRAua0MINnAp9jw
[2015-08-25 18:13:51,046][INFO ][cluster.service          ] [Ka-Zar] new_master [Ka-Zar][EfPnAnHaRAua0MINnAp9jw][EIDOLON][inet[/192.168.1.2:9300]], reason: zen-disco-join (elected_as_master)
[2015-08-25 18:13:51,098][INFO ][http                     ] [Ka-Zar] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.2:9200]}
[2015-08-25 18:13:51,098][INFO ][node                     ] [Ka-Zar] started
[2015-08-25 18:13:52,172][INFO ][gateway                  ] [Ka-Zar] recovered [4] indices into cluster_state
[2015-08-25 18:50:06,126][INFO ][cluster.metadata         ] [Ka-Zar] [test] deleting index
[2015-08-25 18:50:06,279][INFO ][cluster.metadata         ] [Ka-Zar] [test] creating index, cause [api], templates [], shards [1]/[0], mappings []
[2015-08-25 18:50:11,432][DEBUG][action.admin.cluster.health] [Ka-Zar] observer: timeout notification from cluster service. timeout setting [5s], time since start [5s]
[2015-08-25 18:50:11,461][INFO ][cluster.metadata         ] [Ka-Zar] [test] create_mapping [attachment]

And here is the query result with the script of the gist :

{
  "took" : 107,
  "timed_out" : false,
  "_shards" : {
    "total" : 16,
    "successful" : 16,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

(David Pilato) #6

Did you check the document content?

Did you see this comment?

Might help.


(Sahas Sahas) #7

I tried with new pdf document link as well but no luck.

Also i observe that when i am searching for all documents, it is returning me the index which i created for pdf attachment but in return result , content of file is in base 64 only. is it correct ?


(David Pilato) #8

is it correct ?

Yes.

Could send a full recreation script?


(Sahas Sahas) #9

I am executing below scripts in given sequence-

Step- 1 curl -C - -O http://www.intersil.com/content/dam/Intersil/documents/isl9/isl99201.pdf

Step -2 curl -X DELETE "localhost:9200/test”

Step -3

curl -X PUT "localhost:9200/test/person/\_mapping" -d '{
    "person" : {
        "properties" : {
            "my_attachment" : { "type" : "attachment" }
        }
    
}}’

It returns me below output-

{"_index":"test","_type":"person","_id":"\\_mapping","_version":1,"created":true}

Step 4-

#!/bin/sh
coded=`cat isl99201.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json="{\"file\":\"${coded}\"}"
echo "$json" > json.file
curl -X POST "localhost:9200/test/attachment/" -d @json.file

It returns me below result-

{"_index":"test","_type":"attachment","_id":"AU9oxmWvFGc3lmHpmRvo","_version":1,"created":true}

Step-5

curl "localhost:9200/_search?pretty=true"

It returns me below result

{

  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },

  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "test",
      "_type" : "person",
      "_id" : "\\_mapping",
      "_score" : 1.0,
      "_source":{

    "person" : {
        "properties" : {
            "my_attachment" : { "type" : "attachment" }

        }

    

}}

    }, {

      "_index" : "test",
      "_type" : "attachment",
      "_id" : "AU9oxmWvFGc3lmHpmRvo",
      "_score" : 1.0,
   "_source":{"file":"PEhFQUQ+PFRJVExFPkF1dGhlbnRpY2F0aW9uIFJlcXVpcmVkPC9USVRMRT48L0hFQUQ+Cg==PEJPRFkgQkdDT0xPUj0id2hpdGUiIEZHQ09MT1I9ImJsYWNrIj48SDE+QXV0aGVudGljYXRpb24gUmVxdWlyZWQ8L0gxPjxIUj4KPEZPTlQgRkFDRT0iSGVsdmV0aWNhLEFyaWFsIj48Qj4KPC9CPjwvRk9OVD4KPEhSPgo=PCEtLSBkZWZhdWx0ICJBdXRoZW50aWNhdGlvbiBSZXF1aXJlZCIgcmVzcG9uc2UgKDMwNykgLS0+Cg==PC9CT0RZPgo="}

    } ]

  }

}

Step -6

curl "localhost:9200/_search?pretty=true" -d '{


 "fields" : ["title"],
 "query" : {
   "query_string" : {
    "query" : "amplifier"
   }
 },
 "highlight" : {
  "fields" : {
   "file" : {}
 }
 }
}'

This returns me below results where total count of hit is 0.

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

(David Pilato) #10

I already told you. This is wrong:

curl -X PUT "localhost:9200/test/person/\_mapping" -d '{
    "person" : {
        "properties" : {
            "my_attachment" : { "type" : "attachment" }
        }
    
}}’

It MUST be:

curl -X PUT "localhost:9200/test/person/_mapping" -d '{
    "person" : {
        "properties" : {
            "my_attachment" : { "type" : "attachment" }
        }
    
}}’

(Sahas Sahas) #11

Hi David,
Thank you for the reply!
I am getting RemoteTransportException when running as you suggested. Please lt me know how to resolve this exception.

COMMAND-

curl -X PUT "localhost:9200/test/person/_mapping" -d '{
     "person" : {
        "properties" : {
            "my_attachment" : { "type" : "attachment" }
        }
   
}}'

RESULT-

{"error":"RemoteTransportException[[Lancer][inet[/10.102.81.33:9300]][indices:admin/mapping/put]]; nested: IndexMissingException[[test] missing]; ","status":404}

(David Pilato) #12

Create the index first.

PUT test


(Sahas Sahas) #13

Hi David,

I created the index using below command

curl -X PUT "localhost:9200/test" -d '{
  "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}
}'

Now when i am trying to create mapping , i am getting below exception-

{"error":"RemoteTransportException[[Lancer][inet[/10.102.81.33:9300]][indices:admin/mapping/put]]; nested: MapperParsingException[No handler for type [attachment] declared on field [file]]; ","status":400}

Below are the start logs of My ES-

 [2015-08-26 13:52:49,616][INFO ][node                     ] [gen Harada] version[1.7.0], pid[1575], build[929b973/2015-07-16T14:31:07Z]
    [2015-08-26 13:52:49,616][INFO ][node                     ] [gen Harada] initializing ...
    [2015-08-26 13:52:49,778][INFO ][plugins                  ] [gen Harada] loaded [mapper-attachments], sites []
    [2015-08-26 13:52:49,823][INFO ][env                      ] [gen Harada] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [203.6gb], net total_space [232.6gb], types [hfs]
    [2015-08-26 13:52:52,096][INFO ][node                     ] [gen Harada] initialized
    [2015-08-26 13:52:52,097][INFO ][node                     ] [gen Harada] starting ...
    [2015-08-26 13:52:52,163][INFO ][transport                ] [gen Harada] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.102.81.29:9300]}
    [2015-08-26 13:52:52,180][INFO ][discovery                ] [gen Harada] elasticsearch/j4_Nyb4DTa6EaplMf6J3pg
    [2015-08-26 13:52:55,231][INFO ][cluster.service          ] [gen Harada] detected_master [Lancer][w6fHaUjER0qcmfuiHQsA-w][DIN52003769][inet[/10.102.81.33:9300]], added {[Lancer][w6fHaUjER0qcmfuiHQsA-w][DIN52003769][inet[/10.102.81.33:9300]],}, reason: zen-disco-receive(from master [[Lancer][w6fHaUjER0qcmfuiHQsA-w][DIN52003769][inet[/10.102.81.33:9300]]])
    [2015-08-26 13:52:55,251][INFO ][http                     ] [gen Harada] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.102.81.29:9200]}
    [2015-08-26 13:52:55,252][INFO ][node                     ] [gen Harada] started

While creating index , i am getting the below exception in ES logs. However , index is getting generated-

2015-08-26 14:05:47,865][WARN ][indices.cluster          ] [gen Harada] [[test][0]] marking and sending shard failed due to [failed to create index]
org.elasticsearch.indices.IndexCreationException: [test] failed to create index
	at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:338)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:313)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:182)
	at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:480)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.common.inject.CreationException: Guice creation errors:

1) A binding to org.elasticsearch.index.mapper.attachment.RegisterAttachmentType was already configured at _unknown_.
  at _unknown_

1 error
	at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:344)
	at org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:151)
	at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:102)
	at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
	at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
	at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:336)
	... 8 more

curl -X GET "localhost:9200/_cat/indices" returns below output

green open test 1 0 0 0 115b 115b


(David Pilato) #14

You have more than one node running.

One of the node does not have the plugin.


(Sahas Sahas) #15

Yeah ... its only deployed on one node.

I installed plugin using the below command

bin/plugin install elasticsearch/elasticsearch-mapper-attachments/2.7.0

this command does not ask for any node information.
Please let me know how can i deploy the same on both nodes.


(David Pilato) #16

Run the same command on the other node.


(Sahas Sahas) #17

Hi David , thank you for the prompt response.
How can i run the elasticsearch on single node?Every time it starts few nodes with default names.

or do we have any command to start all nodes together?


(David Pilato) #18

I have no idea of what you are trying to do to be honest!

Based on the info you sent, I can just tell you that you have two nodes running on your local network using the same cluster.name.

  • 10.102.81.33
  • 10.102.81.29

May the other node is one of a coworker?

A good practice is to change the cluster name in config/elasticsearch.yml so you will work alone and you won't have to install anything on other nodes.


(Zipang) #19

You're right. Thanks David !
The first URL had a redirection that was not followed by the script !
So that the downloaded PDF file was empty.
The previous comments and script output were not obvious.
So i forked the gist and the new version gives us the expected result.


(Sahas Sahas) #20

Thanks David for your valuable inputs !

After changing the cluster name , it worked for me as well