Conector ElasticSearch-Hadood parameter es.resource of conf

(William Vidal) #1


My doubt is about index/type in to es.resource parameter.

For example, I'm trying to read everything from ElasticSearch, but it is more than one index... so... How I can do this?

I`m try conf(es.resource, "* / * "); but it just takes the first.

So I try "logstash-* / *" (The data index that I'm interested in are like "logstash-date") and again, it just takes data the first index.

The output:

15/09/05 15:52:11 WARN rest.RestRepository: Read resource [logstash-* / ] includes multiple indices or/and aliases; to avoid duplicate results (caused by shard overlapping), parallelism is reduced from 10 to 5
15/09/05 15:52:11 INFO util.Version: Elasticsearch Hadoop v2.1.0 [76e51188cf]
15/09/05 15:52:11 INFO mr.EsInputFormat: Reading from [logstash-
/ * ]
15/09/05 15:52:11 INFO mr.EsInputFormat: Discovered mapping {logstash-2015.09.05=[mappings=[default=[@version=STRING, geoip=[location=GEO_POINT]], auth-log=[@timestamp=DATE, @version=STRING, geoip=[location=GEO_POINT], host=STRING, message=STRING, path=STRING, tags=STRING, type=STRING]]]} for [logstash-* / *]
15/09/05 15:52:11 INFO mr.EsInputFormat: Created [5] shard-splits

Ps.: The caracters" * / * " are all together!!

Grateful for the attention,
William Vidal

(Costin Leau) #2

How many indices do you have? Can you post the equivalent behaviour using curl?
ES-Hadoop doesn't parse the regext itself but rather delegates to Elasticsearch - if only one index is returned, then likely that's the case.
If you can double check that multiple indices are available and yet ES-Hadoop doesn't use them, please raise an issue in Github.


(William Vidal) #3

Thanks for the answer Costin!
Now it's fine. Probably it was my mistake.

(system) #4