**Elasticsearch version** 6.6.0
**Plugins installed**: [ingest attachment]
…
**JVM version** (`java -version`):
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)
**OS version** (`uname -a` if on a Unix-like system):
opensuse 42.3
Linux elastic 4.4.76-1-default #1 SMP Fri Jul 14 08:48:13 UTC 2017 (9a2885c) x86_64 x86_64 x86_64 GNU/Linux
**Description of the problem including expected versus actual behavior**:
I'm upgrading an existing app using elasticsearch v2.1.2 (+ attachment mapper plugin) to v6.6.0 (+ ingest attachment plugin). **String searches in the 2.1.2 version return hits which are within compressed attachment files (eg .tar .tar.gz),** as well as the usual .pdf .doc .xls etc.
Using the new ingest-attachment plugin, I see that compressed files do not appear to be processed: content-type is correctly deduced as "application/gzip", content-length is zero and no other fields are present in the attachment structure returned by elastic. For uncompressed files elastic also returns date, author, language and content fields!
I saw no compression related options in the docs at https://www.elastic.co/guide/en/elasticsearch/plugins/current/using-ingest-attachment.html
I do not know how to get the plugin versions - they're not in the log. elastic is running in a docker container produced with this Dockerfile content:
```
FROM docker.elastic.co/elasticsearch/elasticsearch:6.6.0
RUN bin/elasticsearch-plugin install --batch ingest-attachment
```
**Steps to reproduce**:
Index template:
```
{
"index_patterns": "ars*",
"mappings": {
"ar": {
"properties": {
"actionee": {
"properties": {
"fullname": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"name": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
}
}
},
"attachments": {
"properties": {
"description": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"filename": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"filesize": {
"type": "long"
},
"id": {
"index": false,
"type": "long"
},
"updated_at": {
"type": "date"
}
}
}
}
}
},
"order": 0,
"settings": {
"analysis": {
"analyzer": {
"default": {
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
},
"number_of_replicas": 0,
"number_of_shards": 1,
"refresh_interval": "1s"
}
}
```
Ingest-pipeline template:
```
pipeline.json
{
"description" : "Extract attachment information from arrays",
"processors" : [
{
"foreach": {
"field": "attachments",
"processor": {
"attachment": {
"target_field": "_ingest._value.attachment",
"field": "_ingest._value.data",
"indexed_chars": -1
}
}
}
}
]
}
```
**Provide logs (if relevant)**:
I don't see anything relevant - but here it is for completeness:
```
Mar 28, 2019 5:25:21 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
[2019-03-28T17:25:21,630][INFO ][o.e.c.m.MetaDataMappingService] [oBnurzh] [ars/sEx9Jo9VQPGK-XOFs6r8Fg] update_mapping [ar]
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
OpenJDK 64-Bit Server VM warning: UseAVX=2 is not supported on this CPU, setting it to UseAVX=1
[2019-03-28T17:32:28,969][INFO ][o.e.e.NodeEnvironment ] [864g5mA] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/vdb1)]], net usable_space [46.7gb], net total_space [59.9gb], types [xfs]
[2019-03-28T17:32:28,973][INFO ][o.e.e.NodeEnvironment ] [864g5mA] heap size [3.9gb], compressed ordinary object pointers [true]
[2019-03-28T17:32:28,977][INFO ][o.e.n.Node ] [864g5mA] node name derived from node ID [864g5mAqS_WIS-BMu8DH-Q]; set [node.name] to override
[2019-03-28T17:32:28,977][INFO ][o.e.n.Node ] [864g5mA] version[6.6.0], pid[1], build[default/tar/a9861f4/2019-01-24T11:27:09.439740Z], OS[Linux/4.4.76-1-default/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/11.0.1/11.0.1+13]
[2019-03-28T17:32:28,978][INFO ][o.e.n.Node ] [864g5mA] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch-12949752479696623950, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -XX:UseAVX=2, -Des.cgroups.hierarchy.override=/, -Xms4g, -Xmx4g, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=tar]
[2019-03-28T17:32:31,824][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [aggs-matrix-stats]
[2019-03-28T17:32:31,824][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [analysis-common]
[2019-03-28T17:32:31,825][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [ingest-common]
[2019-03-28T17:32:31,825][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [lang-expression]
[2019-03-28T17:32:31,825][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [lang-mustache]
[2019-03-28T17:32:31,826][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [lang-painless]
[2019-03-28T17:32:31,826][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [mapper-extras]
[2019-03-28T17:32:31,826][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [parent-join]
[2019-03-28T17:32:31,827][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [percolator]
[2019-03-28T17:32:31,827][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [rank-eval]
[2019-03-28T17:32:31,827][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [reindex]
[2019-03-28T17:32:31,827][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [repository-url]
[2019-03-28T17:32:31,828][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [transport-netty4]
[2019-03-28T17:32:31,828][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [tribe]
[2019-03-28T17:32:31,828][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-ccr]
[2019-03-28T17:32:31,829][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-core]
[2019-03-28T17:32:31,829][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-deprecation]
[2019-03-28T17:32:31,829][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-graph]
[2019-03-28T17:32:31,830][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-ilm]
[2019-03-28T17:32:31,830][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-logstash]
[2019-03-28T17:32:31,830][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-ml]
[2019-03-28T17:32:31,831][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-monitoring]
[2019-03-28T17:32:31,831][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-rollup]
[2019-03-28T17:32:31,831][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-security]
[2019-03-28T17:32:31,831][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-sql]
[2019-03-28T17:32:31,832][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-upgrade]
[2019-03-28T17:32:31,832][INFO ][o.e.p.PluginsService ] [864g5mA] loaded module [x-pack-watcher]
[2019-03-28T17:32:31,833][INFO ][o.e.p.PluginsService ] [864g5mA] loaded plugin [ingest-attachment]
[2019-03-28T17:32:31,833][INFO ][o.e.p.PluginsService ] [864g5mA] loaded plugin [ingest-geoip]
[2019-03-28T17:32:31,834][INFO ][o.e.p.PluginsService ] [864g5mA] loaded plugin [ingest-user-agent]
[2019-03-28T17:32:38,546][INFO ][o.e.x.s.a.s.FileRolesStore] [864g5mA] parsed [0] roles from file [/usr/share/elasticsearch/config/roles.yml]
[2019-03-28T17:32:39,429][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [864g5mA] [controller/89] [Main.cc@109] controller (64 bit): Version 6.6.0 (Build bbb4919f4d17a5) Copyright (c) 2019 Elasticsearch BV
[2019-03-28T17:32:40,702][INFO ][o.e.d.DiscoveryModule ] [864g5mA] using discovery type [zen] and host providers [settings]
[2019-03-28T17:32:42,024][INFO ][o.e.n.Node ] [864g5mA] initialized
[2019-03-28T17:32:42,025][INFO ][o.e.n.Node ] [864g5mA] starting ...
[2019-03-28T17:32:42,256][INFO ][o.e.t.TransportService ] [864g5mA] publish_address {172.17.0.2:9300}, bound_addresses {0.0.0.0:9300}
[2019-03-28T17:32:42,278][INFO ][o.e.b.BootstrapChecks ] [864g5mA] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-03-28T17:32:45,360][INFO ][o.e.c.s.MasterService ] [864g5mA] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {864g5mA}{864g5mAqS_WIS-BMu8DH-Q}{cLx7yl7cQFSK4voEG18jqg}{172.17.0.2}{172.17.0.2:9300}{ml.machine_memory=12598550528, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
[2019-03-28T17:32:45,368][INFO ][o.e.c.s.ClusterApplierService] [864g5mA] new_master {864g5mA}{864g5mAqS_WIS-BMu8DH-Q}{cLx7yl7cQFSK4voEG18jqg}{172.17.0.2}{172.17.0.2:9300}{ml.machine_memory=12598550528, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {864g5mA}{864g5mAqS_WIS-BMu8DH-Q}{cLx7yl7cQFSK4voEG18jqg}{172.17.0.2}{172.17.0.2:9300}{ml.machine_memory=12598550528, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2019-03-28T17:32:45,451][INFO ][o.e.h.n.Netty4HttpServerTransport] [864g5mA] publish_address {172.17.0.2:9200}, bound_addresses {0.0.0.0:9200}
[2019-03-28T17:32:45,452][INFO ][o.e.n.Node ] [864g5mA] started
[2019-03-28T17:32:45,472][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [864g5mA] Failed to clear cache for realms [[]]
[2019-03-28T17:32:45,543][INFO ][o.e.g.GatewayService ] [864g5mA] recovered [0] indices into cluster_state
[2019-03-28T17:32:45,828][INFO ][o.e.c.m.MetaDataIndexTemplateService] [864g5mA] adding template [.watch-history-9] for index patterns [.watcher-history-9*]
[2019-03-28T17:32:45,879][INFO ][o.e.c.m.MetaDataIndexTemplateService] [864g5mA] adding template [.triggered_watches] for index patterns [.triggered_watches*]
[2019-03-28T17:32:45,922][INFO ][o.e.c.m.MetaDataIndexTemplateService] [864g5mA] adding template [.watches] for index patterns [.watches*]
[2019-03-28T17:32:45,966][INFO ][o.e.c.m.MetaDataIndexTemplateService] [864g5mA] adding template [.monitoring-logstash] for index patterns [.monitoring-logstash-6-*]
[2019-03-28T17:32:46,035][INFO ][o.e.c.m.MetaDataIndexTemplateService] [864g5mA] adding template [.monitoring-es] for index patterns [.monitoring-es-6-*]
[2019-03-28T17:32:46,077][INFO ][o.e.c.m.MetaDataIndexTemplateService] [864g5mA] adding template [.monitoring-alerts] for index patterns [.monitoring-alerts-6]
[2019-03-28T17:32:46,131][INFO ][o.e.c.m.MetaDataIndexTemplateService] [864g5mA] adding template [.monitoring-beats] for index patterns [.monitoring-beats-6-*]
[2019-03-28T17:32:46,184][INFO ][o.e.c.m.MetaDataIndexTemplateService] [864g5mA] adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-6-*]
[2019-03-28T17:32:46,347][INFO ][o.e.l.LicenseService ] [864g5mA] license [58339812-447d-4735-b976-96d42134833b] mode [basic] - valid
[2019-03-28T17:34:58,789][WARN ][o.e.d.c.m.MetaDataCreateIndexService] [864g5mA] the default number of shards will change from [5] to [1] in 7.0.0; if you wish to continue using the default of [5] shards, you must manage this on the create index request or with an index template
[2019-03-28T17:34:58,806][INFO ][o.e.c.m.MetaDataCreateIndexService] [864g5mA] [ars] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2019-03-28T17:34:59,406][INFO ][o.e.c.m.MetaDataMappingService] [864g5mA] [ars/ZLpXMDNaRFaSczn8QX_58A] create_mapping [ar]
[2019-03-28T17:34:59,637][INFO ][o.e.c.m.MetaDataDeleteIndexService] [864g5mA] [ars/ZLpXMDNaRFaSczn8QX_58A] deleting index
[2019-03-28T17:34:59,798][INFO ][o.e.c.m.MetaDataIndexTemplateService] [864g5mA] adding template [template-ars] for index patterns [ars*]
[2019-03-28T17:35:00,255][INFO ][o.e.c.m.MetaDataCreateIndexService] [864g5mA] [ars] creating index, cause [auto(bulk api)], templates [template-ars], shards [1]/[0], mappings [ar]
[2019-03-28T17:35:00,322][INFO ][o.e.c.r.a.AllocationService] [864g5mA] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[ars][0]] ...]).
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Mar 28, 2019 5:35:00 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: TIFFImageWriter not loaded. tiff files will not be processed
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
Mar 28, 2019 5:35:01 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
[2019-03-28T17:35:01,599][INFO ][o.e.c.m.MetaDataMappingService] [864g5mA] [ars/3_efZOn0RwGA7_NkEEplQQ] update_mapping [ar]
```