When we run:
{
"from" : 0,
"size" : 2147483647,
"query" : {
"bool" : {
"should" : {
"match" : {
"_all" : {
"query" : "News",
"type" : "boolean",
"analyzer" : "english"
}
}
}
}
},
"post_filter" : {
"and" : {
"filters" : [ {
"terms" : {
"streamId" : [ 2, 4, 5, 16, 19, 25, 26, 57 ],
"execution" : "bool"
}
}, {
"term" : {
"_type" : "Document"
}
} ]
}
},
"highlight" : {
"pre_tags" : [ "<es_fts>" ],
"post_tags" : [ "</es_fts>" ],
"fragment_size" : 0,
"number_of_fragments" : 0,
"fields" : {
"Document.Body" : { },
"Document.OriginalUrl" : { },
"Document.Title" : { },
"Document.Url" : { },
...........
"TwitterUser.Location" : { },
"TwitterUser.ScreenName" : { },
"TwitterUser.UserId" : { },
"YouTubeVideo.Description" : { },
"YouTubeVideo.Url" : { },
"YouTubeVideo.Username" : { },
"YouTubeVideo.VideoId" : { }
}
}
}
on EsRDD, we get:
[WARN ] [2015-08-04 17:42:12.377] o.e.h.r.RestRepository : Read resource [fts*/Document] includes multiple indices or/and aliases; to avoid duplicate results (caused by shard overlapping), parallelism is reduced from 160 to 5
[INFO ] [2015-08-04 17:42:12.379] o.e.h.u.Version : Elasticsearch Hadoop v2.1.0.BUILD-SNAPSHOT [51847921a7]
[INFO ] [2015-08-04 17:42:12.379] o.e.s.r.ScalaEsRDD : Reading from [fts*/Document]
[INFO ] [2015-08-04 17:42:12.450] o.e.s.r.ScalaEsRDD : Discovered mapping {fts-swedish-20150728164001=[mappings=[Document=[Document.Body=STRING, Document.OriginalUrl=STRING, Document.Title=STRING, Document.Url=STRING, DocumentMetadata.FBAdmins=STRING, DocumentMetadata.FBAppID=STRING, DocumentMetadata.MetaAbstract=STRING, DocumentMetadata.MetaAppName=STRING, DocumentMetadata.MetaAuthor=STRING, ...YouTubeVideo.Url=STRING, YouTubeVideo.Username=STRING, YouTubeVideo.VideoId=STRING, _analyzer=STRING, language=STRING, streamId=LONG]]]} for [fts*/Document]
[WARN ] [2015-08-04 17:42:23.225] o.a.s.s.TaskSetManager : Lost task 0.0 in stage 2785.0 (TID 3409, om-inv): org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [POST] on [_search/scroll?scroll=5m] failed; server[null] returned [404|Not Found:]
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:335)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:312)
at org.elasticsearch.hadoop.rest.RestClient.scroll(RestClient.java:355)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:401)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.WritablePartitionedIterator$$anon$3.writeNext(WritablePartitionedPairCollection.scala:105)
at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:375)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:208)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
what does this error mean and how can we address it?
Thanks.