Attachments Plugin Not Parsing Files

Jim_Cumming · November 7, 2014, 1:44pm

Hi. I'm quite new to elasticsearch, so far it's all been going great but
I've run into a wall and after a few days of no progress I thought it was
time to ask for help.

I'm trying to create a replacement search solution for a CMS system, one of
the requirements is that it needs to index binary files. The
mapping-attachments plugin appears to be just the thing, but I'm struggling
to get it to work.

I've tried this with ElasticSearch 1.3x and Mapper Attachements 2.3.2 and
ElasticSearch 1.4x and Mapper Attachments 2.4.2 running under Windows. I
have no errors in the log, the plugin appears to be loading correctly, so I
assume I'm doing something wrong with my requests.

I've simplified my requests down to the most basic level I can, and the
issue still occurs. Testing has been done with the Postman extension in
Chrome. But I've converted my posts to curl requests to help anyone who
might want to try this on Linux. The Base64 file is a .txt file with some
English text from the BBC News site.

Create test index
curl -XPUT 'http://localhost:9200/test/'

Response
{
"acknowledged": true
}

Create mapping for person

curl -XPUT 'http://localhost:9200/test/_mapping/person' -d '{
"person" : {
"properties" : {
"my_attachment" : { "type" : "attachment" }
}
}
}'

Response
{
"acknowledged": true
}

Get mapping for person
curl -XGET 'http://localhost:9200/test/_mapping/person'

Response
{
"test": {
"mappings": {
"person": {
"properties": {
"my_attachment": {
"type": "attachment",
"path": "full",
"fields": {
"my_attachment": {
"type": "string"
},
"author": {
"type": "string"
},
"title": {
"type": "string"
},
"name": {
"type": "string"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"keywords": {
"type": "string"
},
"content_type": {
"type": "string"
},
"content_length": {
"type": "integer"
},
"language": {
"type": "string"
}
}
}
}
}
}
}
}

This looks good, I have meta data fields for the file in the mapping

Create person id 1

curl -XPUT 'http://localhost:9200/test/person/1' -d '{
"my_attachment" :
"Rm9ybWVyIGNlbGVicml0eSBwdWJsaWNpc3QgTWF4IENsaWZmb3JkIGhhcyBoYWQgYW4gYXBwZWFsIGFnYWluc3QgaGlzIGVpZ2h0LXllYXIgc2VudGVuY2UgZm9yIHNleCBvZmZlbmNlcyByZWplY3RlZCBieSB0aGUgQ291cnQgb2YgQXBwZWFsLg0KDQpUaGUgY291cnQgcnVsZWQgdGhlIHNlbnRlbmNlIGhhbmRlZCB0byBDbGlmZm9yZCBlYXJsaWVyIHRoaXMgeWVhciB3YXMganVzdGlmaWVkIGFuZCBjb3JyZWN0Lg0KDQpDbGlmZm9yZCB3YXMgY29udmljdGVkIGluIEFwcmlsIG9mIGVpZ2h0IGhpc3RvcmljYWwgaW5kZWNlbnQgYXNzYXVsdHMgb24gd29tZW4gYW5kIG9uIGdpcmxzIGFzIHlvdW5nIGFzIDE1Lg0KDQpIaXMgbGF3eWVyIGhhZCBhcmd1ZWQgdGhlIHNlbnRlbmNlIHdhcyAidW5mYWlyIiBhbmQgY2xhaW1lZCBDbGlmZm9yZCB3YXMgbm90IGEgdGhyZWF0IHRvIHdvbWVuLg=="
}'

Response
{
"_index": "test",
"_type": "person",
"_id": "1",
"_version": 1,
"created": true
}

Looks good, let's get that record back

Get person id 1
curl -XGET 'http://localhost:9200/test/person/1'

{
"_index": "test",
"_type": "person",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"my_attachment":
"Rm9ybWVyIGNlbGVicml0eSBwdWJsaWNpc3QgTWF4IENsaWZmb3JkIGhhcyBoYWQgYW4gYXBwZWFsIGFnYWluc3QgaGlzIGVpZ2h0LXllYXIgc2VudGVuY2UgZm9yIHNleCBvZmZlbmNlcyByZWplY3RlZCBieSB0aGUgQ291cnQgb2YgQXBwZWFsLg0KDQpUaGUgY291cnQgcnVsZWQgdGhlIHNlbnRlbmNlIGhhbmRlZCB0byBDbGlmZm9yZCBlYXJsaWVyIHRoaXMgeWVhciB3YXMganVzdGlmaWVkIGFuZCBjb3JyZWN0Lg0KDQpDbGlmZm9yZCB3YXMgY29udmljdGVkIGluIEFwcmlsIG9mIGVpZ2h0IGhpc3RvcmljYWwgaW5kZWNlbnQgYXNzYXVsdHMgb24gd29tZW4gYW5kIG9uIGdpcmxzIGFzIHlvdW5nIGFzIDE1Lg0KDQpIaXMgbGF3eWVyIGhhZCBhcmd1ZWQgdGhlIHNlbnRlbmNlIHdhcyAidW5mYWlyIiBhbmQgY2xhaW1lZCBDbGlmZm9yZCB3YXMgbm90IGEgdGhyZWF0IHRvIHdvbWVuLg=="
}
}

Attachment has been added as a string, and there are no additional meta
data fields

Here's my system info got via

curl -XGET 'http://localhost:9200/_nodes'

{
"cluster_name": "elasticsearch",
"nodes": {
"QWhhRNIOTUWX_1OxGSJOvA": {
"name": "Franz Kafka",
"transport_address": "inet[/192.168.76.148:9300]",
"host": "WIN-23CNBGGKSSE",
"ip": "192.168.76.148",
"version": "1.4.0",
"build": "bc94bd8",
"http_address": "inet[/192.168.76.148:9200]",
"settings": {
"node": {
"name": "Franz Kafka"
},
"client": {
"type": "node"
},
"http": {
"cors": {
"enabled": "true",
"allow-origin":
"/https?:\/\/local.kibana(:[0-9]+)?/"
}
},
"name": "Franz Kafka",
"path": {
"data": "c:\apps\elasticsearch\data",
"work": "c:\apps\elasticsearch",
"home": "c:\apps\elasticsearch",
"conf": "c:\apps\elasticsearch\config",
"logs": "c:/apps/elasticsearch/logs"
},
"cluster": {
"name": "elasticsearch"
},
"config":
"c:\apps\elasticsearch\config\elasticsearch.yml",
"plugin": {
"mandatory": "mapper-attachments"
}
},
"os": {
"refresh_interval_in_millis": 1000,
"available_processors": 4,
"cpu": {
"vendor": "Intel",
"model": "Xeon",
"mhz": 2666,
"total_cores": 4,
"total_sockets": 1,
"cores_per_socket": 4,
"cache_size_in_bytes": -1
},
"mem": {
"total_in_bytes": 8589402112
},
"swap": {
"total_in_bytes": 17176915968
}
},
"process": {
"refresh_interval_in_millis": 1000,
"id": 6048,
"max_file_descriptors": -1,
"mlockall": false
},
"jvm": {
"pid": 6048,
"version": "1.7.0_71",
"vm_name": "Java HotSpot(TM) 64-Bit Server VM",
"vm_version": "24.71-b01",
"vm_vendor": "Oracle Corporation",
"start_time_in_millis": 1415361000462,
"mem": {
"heap_init_in_bytes": 268435456,
"heap_max_in_bytes": 1038876672,
"non_heap_init_in_bytes": 24313856,
"non_heap_max_in_bytes": 136314880,
"direct_max_in_bytes": 1038876672
},
"gc_collectors": [
"ParNew",
"ConcurrentMarkSweep"
],
"memory_pools": [
"Code Cache",
"Par Eden Space",
"Par Survivor Space",
"CMS Old Gen",
"CMS Perm Gen"
]
},
"thread_pool": {
"generic": {
"type": "cached",
"keep_alive": "30s",
"queue_size": -1
},
"index": {
"type": "fixed",
"min": 4,
"max": 4,
"queue_size": "200"
},
"bench": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
},
"get": {
"type": "fixed",
"min": 4,
"max": 4,
"queue_size": "1k"
},
"snapshot": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
},
"merge": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
},
"suggest": {
"type": "fixed",
"min": 4,
"max": 4,
"queue_size": "1k"
},
"bulk": {
"type": "fixed",
"min": 4,
"max": 4,
"queue_size": "50"
},
"optimize": {
"type": "fixed",
"min": 1,
"max": 1,
"queue_size": -1
},
"warmer": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
},
"flush": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
},
"search": {
"type": "fixed",
"min": 12,
"max": 12,
"queue_size": "1k"
},
"listener": {
"type": "fixed",
"min": 2,
"max": 2,
"queue_size": -1
},
"percolate": {
"type": "fixed",
"min": 4,
"max": 4,
"queue_size": "1k"
},
"management": {
"type": "scaling",
"min": 1,
"max": 5,
"keep_alive": "5m",
"queue_size": -1
},
"refresh": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
}
},
"network": {
"refresh_interval_in_millis": 5000,
"primary_interface": {
"address": "192.168.76.148",
"name": "eth6",
"mac_address": "00:0C:29:80:70:CA"
}
},
"transport": {
"bound_address": "inet[/0:0:0:0:0:0:0:0:9300]",
"publish_address": "inet[/192.168.76.148:9300]"
},
"http": {
"bound_address": "inet[/0:0:0:0:0:0:0:0:9200]",
"publish_address": "inet[/192.168.76.148:9200]",
"max_content_length_in_bytes": 104857600
},
"plugins": [
{
"name": "mapper-attachments",
"version": "2.4.1",
"description": "Adds the attachment type allowing to
parse difference attachment formats",
"jvm": true,
"site": false
},
{
"name": "kopf",
"version": "1.3.7",
"description": "kopf - simple web administration tool
for ElasticSearch",
"url": "/_plugin/kopf/",
"jvm": false,
"site": true
}
]
}
}
}

And my elasticsearch log from startup.

[2014-11-07 13:23:59,256][INFO ][node ] [Franz Kafka]
version[1.4.0], pid[6928], build[bc94bd8/2014-11-05T14:26:12Z]
[2014-11-07 13:23:59,256][INFO ][node ] [Franz Kafka]
initializing ...
[2014-11-07 13:23:59,319][INFO ][plugins ] [Franz Kafka]
loaded [mapper-attachments], sites [kopf]
[2014-11-07 13:24:03,503][INFO ][node ] [Franz Kafka]
initialized
[2014-11-07 13:24:03,503][INFO ][node ] [Franz Kafka]
starting ...
[2014-11-07 13:24:03,643][INFO ][transport ] [Franz Kafka]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/192.168.76.148:9300]}
[2014-11-07 13:24:03,784][INFO ][discovery ] [Franz Kafka]
elasticsearch/A4ONWcVyRIiJxaVw0Mm0uA
[2014-11-07 13:24:07,566][INFO ][cluster.service ] [Franz Kafka]
new_master [Franz
Kafka][A4ONWcVyRIiJxaVw0Mm0uA][WIN-23CNBGGKSSE][inet[/192.168.76.148:9300]],
reason: zen-disco-join (elected_as_master)
[2014-11-07 13:24:07,705][INFO ][http ] [Franz Kafka]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/192.168.76.148:9200]}
[2014-11-07 13:24:07,705][INFO ][node ] [Franz Kafka]
started
[2014-11-07 13:24:08,416][INFO ][gateway ] [Franz Kafka]
recovered [1] indices into cluster_state

I've also set Mapper Attachment as a mandatory plugin in the config, so
it's definitely loading as the node starts up ok.

I'd really appreciate some help on this. I'm sure I'm making some newbie
mistake with the mapping or something, but the documentation isn't helping
me here.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/95bcd0b7-844a-40b5-93cf-dce2ea4bc284%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dadoonet · November 7, 2014, 1:54pm

Attachment plugin index binary content but does not actually modify the source document (_source field).

But if you _search for content it should work.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 nov. 2014 à 14:44, Jim Cumming jimcumming@gmail.com a écrit :

Hi. I'm quite new to elasticsearch, so far it's all been going great but I've run into a wall and after a few days of no progress I thought it was time to ask for help.

I'm trying to create a replacement search solution for a CMS system, one of the requirements is that it needs to index binary files. The mapping-attachments plugin appears to be just the thing, but I'm struggling to get it to work.

I've tried this with Elasticsearch 1.3x and Mapper Attachements 2.3.2 and Elasticsearch 1.4x and Mapper Attachments 2.4.2 running under Windows. I have no errors in the log, the plugin appears to be loading correctly, so I assume I'm doing something wrong with my requests.

I've simplified my requests down to the most basic level I can, and the issue still occurs. Testing has been done with the Postman extension in Chrome. But I've converted my posts to curl requests to help anyone who might want to try this on Linux. The Base64 file is a .txt file with some English text from the BBC News site.

Create test index
curl -XPUT 'http://localhost:9200/test/'

Response
{
"acknowledged": true
}

Create mapping for person

curl -XPUT 'http://localhost:9200/test/_mapping/person' -d '{
"person" : {
"properties" : {
"my_attachment" : { "type" : "attachment" }
}
}
}'

Response
{
"acknowledged": true
}

Get mapping for person
curl -XGET 'http://localhost:9200/test/_mapping/person'

Response
{
"test": {
"mappings": {
"person": {
"properties": {
"my_attachment": {
"type": "attachment",
"path": "full",
"fields": {
"my_attachment": {
"type": "string"
},
"author": {
"type": "string"
},
"title": {
"type": "string"
},
"name": {
"type": "string"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"keywords": {
"type": "string"
},
"content_type": {
"type": "string"
},
"content_length": {
"type": "integer"
},
"language": {
"type": "string"
}
}
}
}
}
}
}
}

This looks good, I have meta data fields for the file in the mapping

Create person id 1

curl -XPUT 'http://localhost:9200/test/person/1' -d '{
"my_attachment" : "Rm9ybWVyIGNlbGVicml0eSBwdWJsaWNpc3QgTWF4IENsaWZmb3JkIGhhcyBoYWQgYW4gYXBwZWFsIGFnYWluc3QgaGlzIGVpZ2h0LXllYXIgc2VudGVuY2UgZm9yIHNleCBvZmZlbmNlcyByZWplY3RlZCBieSB0aGUgQ291cnQgb2YgQXBwZWFsLg0KDQpUaGUgY291cnQgcnVsZWQgdGhlIHNlbnRlbmNlIGhhbmRlZCB0byBDbGlmZm9yZCBlYXJsaWVyIHRoaXMgeWVhciB3YXMganVzdGlmaWVkIGFuZCBjb3JyZWN0Lg0KDQpDbGlmZm9yZCB3YXMgY29udmljdGVkIGluIEFwcmlsIG9mIGVpZ2h0IGhpc3RvcmljYWwgaW5kZWNlbnQgYXNzYXVsdHMgb24gd29tZW4gYW5kIG9uIGdpcmxzIGFzIHlvdW5nIGFzIDE1Lg0KDQpIaXMgbGF3eWVyIGhhZCBhcmd1ZWQgdGhlIHNlbnRlbmNlIHdhcyAidW5mYWlyIiBhbmQgY2xhaW1lZCBDbGlmZm9yZCB3YXMgbm90IGEgdGhyZWF0IHRvIHdvbWVuLg=="
}'

Response
{
"_index": "test",
"_type": "person",
"_id": "1",
"_version": 1,
"created": true
}

Looks good, let's get that record back

Get person id 1
curl -XGET 'http://localhost:9200/test/person/1'

{
"_index": "test",
"_type": "person",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"my_attachment": "Rm9ybWVyIGNlbGVicml0eSBwdWJsaWNpc3QgTWF4IENsaWZmb3JkIGhhcyBoYWQgYW4gYXBwZWFsIGFnYWluc3QgaGlzIGVpZ2h0LXllYXIgc2VudGVuY2UgZm9yIHNleCBvZmZlbmNlcyByZWplY3RlZCBieSB0aGUgQ291cnQgb2YgQXBwZWFsLg0KDQpUaGUgY291cnQgcnVsZWQgdGhlIHNlbnRlbmNlIGhhbmRlZCB0byBDbGlmZm9yZCBlYXJsaWVyIHRoaXMgeWVhciB3YXMganVzdGlmaWVkIGFuZCBjb3JyZWN0Lg0KDQpDbGlmZm9yZCB3YXMgY29udmljdGVkIGluIEFwcmlsIG9mIGVpZ2h0IGhpc3RvcmljYWwgaW5kZWNlbnQgYXNzYXVsdHMgb24gd29tZW4gYW5kIG9uIGdpcmxzIGFzIHlvdW5nIGFzIDE1Lg0KDQpIaXMgbGF3eWVyIGhhZCBhcmd1ZWQgdGhlIHNlbnRlbmNlIHdhcyAidW5mYWlyIiBhbmQgY2xhaW1lZCBDbGlmZm9yZCB3YXMgbm90IGEgdGhyZWF0IHRvIHdvbWVuLg=="
}
}

Attachment has been added as a string, and there are no additional meta data fields

Here's my system info got via

curl -XGET 'http://localhost:9200/_nodes'

{
"cluster_name": "elasticsearch",
"nodes": {
"QWhhRNIOTUWX_1OxGSJOvA": {
"name": "Franz Kafka",
"transport_address": "inet[/192.168.76.148:9300]",
"host": "WIN-23CNBGGKSSE",
"ip": "192.168.76.148",
"version": "1.4.0",
"build": "bc94bd8",
"http_address": "inet[/192.168.76.148:9200]",
"settings": {
"node": {
"name": "Franz Kafka"
},
"client": {
"type": "node"
},
"http": {
"cors": {
"enabled": "true",
"allow-origin": "/https?:\/\/local.kibana(:[0-9]+)?/"
}
},
"name": "Franz Kafka",
"path": {
"data": "c:\apps\elasticsearch\data",
"work": "c:\apps\elasticsearch",
"home": "c:\apps\elasticsearch",
"conf": "c:\apps\elasticsearch\config",
"logs": "c:/apps/elasticsearch/logs"
},
"cluster": {
"name": "elasticsearch"
},
"config": "c:\apps\elasticsearch\config\elasticsearch.yml",
"plugin": {
"mandatory": "mapper-attachments"
}
},
"os": {
"refresh_interval_in_millis": 1000,
"available_processors": 4,
"cpu": {
"vendor": "Intel",
"model": "Xeon",
"mhz": 2666,
"total_cores": 4,
"total_sockets": 1,
"cores_per_socket": 4,
"cache_size_in_bytes": -1
},
"mem": {
"total_in_bytes": 8589402112
},
"swap": {
"total_in_bytes": 17176915968
}
},
"process": {
"refresh_interval_in_millis": 1000,
"id": 6048,
"max_file_descriptors": -1,
"mlockall": false
},
"jvm": {
"pid": 6048,
"version": "1.7.0_71",
"vm_name": "Java HotSpot(TM) 64-Bit Server VM",
"vm_version": "24.71-b01",
"vm_vendor": "Oracle Corporation",
"start_time_in_millis": 1415361000462,
"mem": {
"heap_init_in_bytes": 268435456,
"heap_max_in_bytes": 1038876672,
"non_heap_init_in_bytes": 24313856,
"non_heap_max_in_bytes": 136314880,
"direct_max_in_bytes": 1038876672
},
"gc_collectors": [
"ParNew",
"ConcurrentMarkSweep"
],
"memory_pools": [
"Code Cache",
"Par Eden Space",
"Par Survivor Space",
"CMS Old Gen",
"CMS Perm Gen"
]
},
"thread_pool": {
"generic": {
"type": "cached",
"keep_alive": "30s",
"queue_size": -1
},
"index": {
"type": "fixed",
"min": 4,
"max": 4,
"queue_size": "200"
},
"bench": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
},
"get": {
"type": "fixed",
"min": 4,
"max": 4,
"queue_size": "1k"
},
"snapshot": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
},
"merge": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
},
"suggest": {
"type": "fixed",
"min": 4,
"max": 4,
"queue_size": "1k"
},
"bulk": {
"type": "fixed",
"min": 4,
"max": 4,
"queue_size": "50"
},
"optimize": {
"type": "fixed",
"min": 1,
"max": 1,
"queue_size": -1
},
"warmer": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
},
"flush": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
},
"search": {
"type": "fixed",
"min": 12,
"max": 12,
"queue_size": "1k"
},
"listener": {
"type": "fixed",
"min": 2,
"max": 2,
"queue_size": -1
},
"percolate": {
"type": "fixed",
"min": 4,
"max": 4,
"queue_size": "1k"
},
"management": {
"type": "scaling",
"min": 1,
"max": 5,
"keep_alive": "5m",
"queue_size": -1
},
"refresh": {
"type": "scaling",
"min": 1,
"max": 2,
"keep_alive": "5m",
"queue_size": -1
}
},
"network": {
"refresh_interval_in_millis": 5000,
"primary_interface": {
"address": "192.168.76.148",
"name": "eth6",
"mac_address": "00:0C:29:80:70:CA"
}
},
"transport": {
"bound_address": "inet[/0:0:0:0:0:0:0:0:9300]",
"publish_address": "inet[/192.168.76.148:9300]"
},
"http": {
"bound_address": "inet[/0:0:0:0:0:0:0:0:9200]",
"publish_address": "inet[/192.168.76.148:9200]",
"max_content_length_in_bytes": 104857600
},
"plugins": [
{
"name": "mapper-attachments",
"version": "2.4.1",
"description": "Adds the attachment type allowing to parse difference attachment formats",
"jvm": true,
"site": false
},
{
"name": "kopf",
"version": "1.3.7",
"description": "kopf - simple web administration tool for Elasticsearch",
"url": "/_plugin/kopf/",
"jvm": false,
"site": true
}
]
}
}
}

And my elasticsearch log from startup.

[2014-11-07 13:23:59,256][INFO ][node ] [Franz Kafka] version[1.4.0], pid[6928], build[bc94bd8/2014-11-05T14:26:12Z]
[2014-11-07 13:23:59,256][INFO ][node ] [Franz Kafka] initializing ...
[2014-11-07 13:23:59,319][INFO ][plugins ] [Franz Kafka] loaded [mapper-attachments], sites [kopf]
[2014-11-07 13:24:03,503][INFO ][node ] [Franz Kafka] initialized
[2014-11-07 13:24:03,503][INFO ][node ] [Franz Kafka] starting ...
[2014-11-07 13:24:03,643][INFO ][transport ] [Franz Kafka] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.76.148:9300]}
[2014-11-07 13:24:03,784][INFO ][discovery ] [Franz Kafka] elasticsearch/A4ONWcVyRIiJxaVw0Mm0uA
[2014-11-07 13:24:07,566][INFO ][cluster.service ] [Franz Kafka] new_master [Franz Kafka][A4ONWcVyRIiJxaVw0Mm0uA][WIN-23CNBGGKSSE][inet[/192.168.76.148:9300]], reason: zen-disco-join (elected_as_master)
[2014-11-07 13:24:07,705][INFO ][http ] [Franz Kafka] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.76.148:9200]}
[2014-11-07 13:24:07,705][INFO ][node ] [Franz Kafka] started
[2014-11-07 13:24:08,416][INFO ][gateway ] [Franz Kafka] recovered [1] indices into cluster_state

I've also set Mapper Attachment as a mandatory plugin in the config, so it's definitely loading as the node starts up ok.

I'd really appreciate some help on this. I'm sure I'm making some newbie mistake with the mapping or something, but the documentation isn't helping me here.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/95bcd0b7-844a-40b5-93cf-dce2ea4bc284%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F3741DB1-51EE-4BF1-AA9D-92129CC74DA5%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Jim_Cumming · November 7, 2014, 2:00pm

Thanks David, is there any easy way to view the parsed data other than
searching?

On Friday, 7 November 2014 13:55:41 UTC, David Pilato wrote:

Attachment plugin index binary content but does not actually modify the
source document (_source field).

But if you _search for content it should work.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 nov. 2014 à 14:44, Jim Cumming <jimcu...@gmail.com <javascript:>> a
écrit :

Hi. I'm quite new to elasticsearch, so far it's all been going great but
I've run into a wall and after a few days of no progress I thought it was
time to ask for help.

I'm trying to create a replacement search solution for a CMS system, one
of the requirements is that it needs to index binary files. The
mapping-attachments plugin appears to be just the thing, but I'm struggling
to get it to work.

I've tried this with Elasticsearch 1.3x and Mapper Attachements 2.3.2 and
Elasticsearch 1.4x and Mapper Attachments 2.4.2 running under Windows. I
have no errors in the log, the plugin appears to be loading correctly, so I
assume I'm doing something wrong with my requests.

I've simplified my requests down to the most basic level I can, and the
issue still occurs. Testing has been done with the Postman extension in
Chrome. But I've converted my posts to curl requests to help anyone who
might want to try this on Linux. The Base64 file is a .txt file with some
English text from the BBC News site.

Create test index
curl -XPUT 'http://localhost:9200/test/'

Response
{
"acknowledged": true
}

Create mapping for person

curl -XPUT 'http://localhost:9200/test/_mapping/person' -d '{
"person" : {
"properties" : {
"my_attachment" : { "type" : "attachment" }
}
}
}'

Response
{
"acknowledged": true
}

Get mapping for person
curl -XGET 'http://localhost:9200/test/_mapping/person'

Response
{
"test": {
"mappings": {
"person": {
"properties": {
"my_attachment": {
"type": "attachment",
"path": "full",
"fields": {
"my_attachment": {
"type": "string"
},
"author": {
"type": "string"
},
"title": {
"type": "string"
},
"name": {
"type": "string"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"keywords": {
"type": "string"
},
"content_type": {
"type": "string"
},
"content_length": {
"type": "integer"
},
"language": {
"type": "string"
}
}
}
}
}
}
}
}

This looks good, I have meta data fields for the file in the mapping

Create person id 1

curl -XPUT 'http://localhost:9200/test/person/1' -d '{
"my_attachment" :
"Rm9ybWVyIGNlbGVicml0eSBwdWJsaWNpc3QgTWF4IENsaWZmb3JkIGhhcyBoYWQgYW4gYXBwZWFsIGFnYWluc3QgaGlzIGVpZ2h0LXllYXIgc2VudGVuY2UgZm9yIHNleCBvZmZlbmNlcyByZWplY3RlZCBieSB0aGUgQ291cnQgb2YgQXBwZWFsLg0KDQpUaGUgY291cnQgcnVsZWQgdGhlIHNlbnRlbmNlIGhhbmRlZCB0byBDbGlmZm9yZCBlYXJsaWVyIHRoaXMgeWVhciB3YXMganVzdGlmaWVkIGFuZCBjb3JyZWN0Lg0KDQpDbGlmZm9yZCB3YXMgY29udmljdGVkIGluIEFwcmlsIG9mIGVpZ2h0IGhpc3RvcmljYWwgaW5kZWNlbnQgYXNzYXVsdHMgb24gd29tZW4gYW5kIG9uIGdpcmxzIGFzIHlvdW5nIGFzIDE1Lg0KDQpIaXMgbGF3eWVyIGhhZCBhcmd1ZWQgdGhlIHNlbnRlbmNlIHdhcyAidW5mYWlyIiBhbmQgY2xhaW1lZCBDbGlmZm9yZCB3YXMgbm90IGEgdGhyZWF0IHRvIHdvbWVuLg=="
}'

Response
{
"_index": "test",
"_type": "person",
"_id": "1",
"_version": 1,
"created": true
}

Looks good, let's get that record back

Get person id 1
curl -XGET 'http://localhost:9200/test/person/1'

{
"_index": "test",
"_type": "person",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"my_attachment":
"Rm9ybWVyIGNlbGVicml0eSBwdWJsaWNpc3QgTWF4IENsaWZmb3JkIGhhcyBoYWQgYW4gYXBwZWFsIGFnYWluc3QgaGlzIGVpZ2h0LXllYXIgc2VudGVuY2UgZm9yIHNleCBvZmZlbmNlcyByZWplY3RlZCBieSB0aGUgQ291cnQgb2YgQXBwZWFsLg0KDQpUaGUgY291cnQgcnVsZWQgdGhlIHNlbnRlbmNlIGhhbmRlZCB0byBDbGlmZm9yZCBlYXJsaWVyIHRoaXMgeWVhciB3YXMganVzdGlmaWVkIGFuZCBjb3JyZWN0Lg0KDQpDbGlmZm9yZCB3YXMgY29udmljdGVkIGluIEFwcmlsIG9mIGVpZ2h0IGhpc3RvcmljYWwgaW5kZWNlbnQgYXNzYXVsdHMgb24gd29tZW4gYW5kIG9uIGdpcmxzIGFzIHlvdW5nIGFzIDE1Lg0KDQpIaXMgbGF3eWVyIGhhZCBhcmd1ZWQgdGhlIHNlbnRlbmNlIHdhcyAidW5mYWlyIiBhbmQgY2xhaW1lZCBDbGlmZm9yZCB3YXMgbm90IGEgdGhyZWF0IHRvIHdvbWVuLg=="
}
}

Attachment has been added as a string, and there are no additional meta
data fields

Here's my system info got via

curl -XGET 'http://localhost:9200/_nodes'

{
"cluster_name": "elasticsearch",
"nodes": {
"QWhhRNIOTUWX_1OxGSJOvA": {
"name": "Franz Kafka",
"transport_address": "inet[/192.168.76.148:9300]",
"host": "WIN-23CNBGGKSSE",
"ip": "192.168.76.148",
"version": "1.4.0",
"build": "bc94bd8",
"http_address": "inet[/192.168.76.148:9200]",
"settings": {
"node": {
"name": "Franz Kafka"
},
"client": {
"type": "node"
},
"http": {
"cors": {
"enabled": "true",
"allow-origin":
"/https?:\/\/local.kibana(:[0-9]+)?/"
}
},
"name": "Franz Kafka",
"path": {
"data"<span style="color: #660;" class="styled-by-pre

...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7b929d77-26d1-4dbc-ae5d-158a8f660fcc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dadoonet · November 7, 2014, 2:03pm

Changing the mapping to store the actual content, then use fields option in search to get content field back.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 nov. 2014 à 15:00, Jim Cumming jimcumming@gmail.com a écrit :

Thanks David, is there any easy way to view the parsed data other than searching?

On Friday, 7 November 2014 13:55:41 UTC, David Pilato wrote:
Attachment plugin index binary content but does not actually modify the source document (_source field).

But if you _search for content it should work.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 nov. 2014 à 14:44, Jim Cumming jimcu...@gmail.com a écrit :

Hi. I'm quite new to elasticsearch, so far it's all been going great but I've run into a wall and after a few days of no progress I thought it was time to ask for help.

I'm trying to create a replacement search solution for a CMS system, one of the requirements is that it needs to index binary files. The mapping-attachments plugin appears to be just the thing, but I'm struggling to get it to work.

I've tried this with Elasticsearch 1.3x and Mapper Attachements 2.3.2 and Elasticsearch 1.4x and Mapper Attachments 2.4.2 running under Windows. I have no errors in the log, the plugin appears to be loading correctly, so I assume I'm doing something wrong with my requests.

I've simplified my requests down to the most basic level I can, and the issue still occurs. Testing has been done with the Postman extension in Chrome. But I've converted my posts to curl requests to help anyone who might want to try this on Linux. The Base64 file is a .txt file with some English text from the BBC News site.

Create test index
curl -XPUT 'http://localhost:9200/test/'

Response
{
"acknowledged": true
}

Create mapping for person

curl -XPUT 'http://localhost:9200/test/_mapping/person' -d '{
"person" : {
"properties" : {
"my_attachment" : { "type" : "attachment" }
}
}
}'

Response
{
"acknowledged": true
}

Get mapping for person
curl -XGET 'http://localhost:9200/test/_mapping/person'

Response
{
"test": {
"mappings": {
"person": {
"properties": {
"my_attachment": {
"type": "attachment",
"path": "full",
"fields": {
"my_attachment": {
"type": "string"
},
"author": {
"type": "string"
},
"title": {
"type": "string"
},
"name": {
"type": "string"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"keywords": {
"type": "string"
},
"content_type": {
"type": "string"
},
"content_length": {
"type": "integer"
},
"language": {
"type": "string"
}
}
}
}
}
}
}
}

This looks good, I have meta data fields for the file in the mapping

Create person id 1

curl -XPUT 'http://localhost:9200/test/person/1' -d '{
"my_attachment" : "Rm9ybWVyIGNlbGVicml0eSBwdWJsaWNpc3QgTWF4IENsaWZmb3JkIGhhcyBoYWQgYW4gYXBwZWFsIGFnYWluc3QgaGlzIGVpZ2h0LXllYXIgc2VudGVuY2UgZm9yIHNleCBvZmZlbmNlcyByZWplY3RlZCBieSB0aGUgQ291cnQgb2YgQXBwZWFsLg0KDQpUaGUgY291cnQgcnVsZWQgdGhlIHNlbnRlbmNlIGhhbmRlZCB0byBDbGlmZm9yZCBlYXJsaWVyIHRoaXMgeWVhciB3YXMganVzdGlmaWVkIGFuZCBjb3JyZWN0Lg0KDQpDbGlmZm9yZCB3YXMgY29udmljdGVkIGluIEFwcmlsIG9mIGVpZ2h0IGhpc3RvcmljYWwgaW5kZWNlbnQgYXNzYXVsdHMgb24gd29tZW4gYW5kIG9uIGdpcmxzIGFzIHlvdW5nIGFzIDE1Lg0KDQpIaXMgbGF3eWVyIGhhZCBhcmd1ZWQgdGhlIHNlbnRlbmNlIHdhcyAidW5mYWlyIiBhbmQgY2xhaW1lZCBDbGlmZm9yZCB3YXMgbm90IGEgdGhyZWF0IHRvIHdvbWVuLg=="
}'

Response
{
"_index": "test",
"_type": "person",
"_id": "1",
"_version": 1,
"created": true
}

Looks good, let's get that record back

Get person id 1
curl -XGET 'http://localhost:9200/test/person/1'

{
"_index": "test",
"_type": "person",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"my_attachment": "Rm9ybWVyIGNlbGVicml0eSBwdWJsaWNpc3QgTWF4IENsaWZmb3JkIGhhcyBoYWQgYW4gYXBwZWFsIGFnYWluc3QgaGlzIGVpZ2h0LXllYXIgc2VudGVuY2UgZm9yIHNleCBvZmZlbmNlcyByZWplY3RlZCBieSB0aGUgQ291cnQgb2YgQXBwZWFsLg0KDQpUaGUgY291cnQgcnVsZWQgdGhlIHNlbnRlbmNlIGhhbmRlZCB0byBDbGlmZm9yZCBlYXJsaWVyIHRoaXMgeWVhciB3YXMganVzdGlmaWVkIGFuZCBjb3JyZWN0Lg0KDQpDbGlmZm9yZCB3YXMgY29udmljdGVkIGluIEFwcmlsIG9mIGVpZ2h0IGhpc3RvcmljYWwgaW5kZWNlbnQgYXNzYXVsdHMgb24gd29tZW4gYW5kIG9uIGdpcmxzIGFzIHlvdW5nIGFzIDE1Lg0KDQpIaXMgbGF3eWVyIGhhZCBhcmd1ZWQgdGhlIHNlbnRlbmNlIHdhcyAidW5mYWlyIiBhbmQgY2xhaW1lZCBDbGlmZm9yZCB3YXMgbm90IGEgdGhyZWF0IHRvIHdvbWVuLg=="
}
}

Attachment has been added as a string, and there are no additional meta data fields

Here's my system info got via

curl -XGET 'http://localhost:9200/_nodes'

{
"cluster_name": "elasticsearch",
"nodes": {
"QWhhRNIOTUWX_1OxGSJOvA": {
"name": "Franz Kafka",
"transport_address": "inet[/192.168.76.148:9300]",
"host": "WIN-23CNBGGKSSE",
"ip": "192.168.76.148",
"version": "1.4.0",
"build": "bc94bd8",
"http_address": "inet[/192.168.76.148:9200]",
"settings": {
"node": {
"name": "Franz Kafka"
},
"client": {
"type": "node"
},
"http": {
"cors": {
"enabled": "true",
"allow-origin": "/https?:\/\/local.kibana(:[0-9]+)?/"
}
},
"name": "Franz Kafka",
"path": {
"data"<span style="color: #660;" class="styled-by-pre
...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7b929d77-26d1-4dbc-ae5d-158a8f660fcc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/DBCF14F5-2FC9-4A67-BC37-1EC1440462FD%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Jim_Cumming · November 7, 2014, 3:05pm

Thanks David, I found the documentation on the plugin's github page.

All working now. Can't believe I've wasted so much time on this when it was
already working. D'oh.

On Friday, 7 November 2014 14:03:54 UTC, David Pilato wrote:

Changing the mapping to store the actual content, then use fields option
in search to get content field back.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 nov. 2014 à 15:00, Jim Cumming <jimcu...@gmail.com <javascript:>> a
écrit :

Thanks David, is there any easy way to view the parsed data other than
searching?

On Friday, 7 November 2014 13:55:41 UTC, David Pilato wrote:

Attachment plugin index binary content but does not actually modify the
source document (_source field).

But if you _search for content it should work.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 nov. 2014 à 14:44, Jim Cumming jimcu...@gmail.com a écrit :

Hi. I'm quite new to elasticsearch, so far it's all been going great but
I've run into a wall and after a few days of no progress I thought it was
time to ask for help.

I'm trying to create a replacement search solution for a CMS system, one
of the requirements is that it needs to index binary files. The
mapping-attachments plugin appears to be just the thing, but I'm struggling
to get it to work.

I've tried this with Elasticsearch 1.3x and Mapper Attachements 2.3.2 and
Elasticsearch 1.4x and Mapper Attachments 2.4.2 running under Windows. I
have no errors in the log, the plugin appears to be loading correctly, so I
assume I'm doing something wrong with my requests.

I've simplified my requests down to the most basic level I can, and the
issue still occurs. Testing has been done with the Postman extension in
Chrome. But I've converted my posts to curl requests to help anyone who
might want to try this on Linux. The Base64 file is a .txt file with some
English text from the BBC News site.

Create test index
curl -XPUT 'http://localhost:9200/test/'

Response
{
"acknowledged": true
}

Create mapping for person

curl -XPUT 'http://localhost:9200/test/_mapping/person' -d '{
"person" : {
"properties" : {
"my_attachment" : { "type" : "attachment" }
}
}
}'

Response
{
"acknowledged": true
}

Get mapping for person
curl -XGET 'http://localhost:9200/test/_mapping/person'

Response
{
"test": {
"mappings": {
"person": {
"properties": {
"my_attachment": {
"type": "attachment",
"path": "full",
"fields": {
"my_attachment": {
"type": "string"
},
"author": {
"type": "string"
},
"title": {
"type": "string"
},
"name": {
"type": "string"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"keywords": {
"type": "string"
},
"content_type": {
"type": "string"
},
"content_length": {
"type": "integer"
},
"language": {
"type": "string"
}
}
}
}
}
}
}
}

This looks good, I have meta data fields for the file in the mapping

Create person id 1

curl -XPUT 'http://localhost:9200/test/person/1' -d '{
"my_attachment" :
"Rm9ybWVyIGNlbGVicml0eSBwdWJsaWNpc3QgTWF4IENsaWZmb3JkIGhhcyBoYWQgYW4gYXBwZWFsIGFnYWluc3QgaGlzIGVpZ2h0LXllYXIgc2VudGVuY2UgZm9yIHNleCBvZmZlbmNlcyByZWplY3RlZCBieSB0aGUgQ291cnQgb2YgQXBwZWFsLg0KDQpUaGUgY291cnQgcnVsZWQgdGhlIHNlbnRlbmNlIGhhbmRlZCB0byBDbGlmZm9yZCBlYXJsaWVyIHRoaXMgeWVhciB3YXMganVzdGlmaWVkIGFuZCBjb3JyZWN0Lg0KDQpDbGlmZm9yZCB3YXMgY29udmljdGVkIGluIEFwcmlsIG9mIGVpZ2h0IGhpc3RvcmljYWwgaW5kZWNlbnQgYXNzYXVsdHMgb24gd29tZW4gYW5kIG9uIGdpcmxzIGFzIHlvdW5nIGFzIDE1Lg0KDQpIaXMgbGF3eWVyIGhhZCBhcmd1ZWQgdGhlIHNlbnRlbmNlIHdhcyAidW5mYWlyIiBhbmQgY2xhaW1lZCBDbGlmZm9yZCB3YXMgbm90IGEgdGhyZWF0IHRvIHdvbWVuLg=="
}'

Response
{
"_index": "test",
"_type": "person",
"_id": "1",
"_version": 1,
"created": true
}

Looks good, let's get that record back

Get person id 1
curl -XGET 'http://localhost:9200/test/person/1'

{
"_index": "test",
"_type": "person",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"my_attachment":
"Rm9ybWVyIGNlbGVicml0eSBwdWJsaWNpc3QgTWF4IENsaWZmb3JkIGhhcyBoYWQgYW4gYXBwZWFsIGFnYWluc3QgaGlzIGVpZ2h0LXllYXIgc2VudGVuY2UgZm9yIHNleCBvZmZlbmNlcyByZWplY3RlZCBieSB0aGUgQ291cnQgb2YgQXBwZWFsLg0KDQpUaGUgY291cnQgcnVsZWQgdGhlIHNlbnRlbmNlIGhhbmRlZCB0byBDbGlmZm9yZCBlYXJsaWVyIHRoaXMgeWVhciB3YXMganVzdGlmaWVkIGFuZCBjb3JyZWN0Lg0KDQpDbGlmZm9yZCB3YXMgY29udmljdGVkIGluIEFwcmlsIG9mIGVpZ2h0IGhpc3RvcmljYWwgaW5kZWNlbnQgYXNzYXVsdHMgb24gd29tZW4gYW5kIG9uIGdpcmxzIGFzIHlvdW5nIGFzIDE1Lg0KDQpIaXMgbGF3eWVyIGhhZCBhcmd1ZWQgdGhlIHNlbnRlbmNlIHdhcyAidW5mYWlyIiBhbmQgY2xhaW1lZCBDbGlmZm9yZCB3YXMgbm90IGEgdGhyZWF0IHRvIHdvbWVuLg=="
}
}

Attachment has been added as a string, and there are no additional meta
data fields

Here's my system info got via

curl -XGET 'http://localhost:9200/_nodes'

{
"cluster_name": "elasticsearch",
"nodes": {
"QWhhRNIOTUWX_1OxGSJOvA": {
"name": "Franz Kafka",
"transport_address": "inet[/192.168.76.148:9300]",
"host": "WIN-23CNBGGKSSE",
"ip": "192.168.76.148",
"version": "1.4.0",
"build": "bc94bd8",
"http_address": "inet[/192.168.76.148:9200]",
"settings": {
"node": {
"name": "Franz Kafka"
},
"client": {
"type": "node"
},
"http": {
"cors": {
"enabled": "true",
"allow-origin":
"/https?:\/\/local.kibana(:[0-9]+)?/"
}
},
"name": "Franz Kafka",
"path": {
"data"<span style="color: #660;" class="styled-by-pre

...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7b929d77-26d1-4dbc-ae5d-158a8f660fcc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7b929d77-26d1-4dbc-ae5d-158a8f660fcc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ab5b04b-51d8-4750-9dbb-0ad5811895d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Cannot use elasticsearch-mapper-attachments successfully Elasticsearch	1	476	July 5, 2017
Mapper-attachment plugin not parsing binary files Elasticsearch	5	657	July 6, 2017
Attachments plugin - has anyone been using this successfully? Elasticsearch	1	279	July 6, 2017
Search not working with mapper-attachment plugin Elasticsearch	20	6166	July 5, 2017
Attachment Mapper and Searching Elasticsearch	7	894	July 5, 2017

Attachments Plugin Not Parsing Files

Related topics