Problems with plugins and index templates

Hi,

we've been trying to switch some of our index creation from programmatic to
index-template-based. This seems to work fine with one exception: plugins
do not seem to work (we're using phonetic in this case, ES version is
0.20.2):

org.elasticsearch.index.mapper.MapperParsingException: Analyzer
[name_metaphone] not found for field [metaphone]
at
org.elasticsearch.index.mapper.core.TypeParsers.parseField(TypeParsers.java:86)
at
org.elasticsearch.index.mapper.core.StringFieldMapper$TypeParser.parse(StringFieldMapper.java:136)
at
org.elasticsearch.index.mapper.multifield.MultiFieldMapper$TypeParser.parse(MultiFieldMapper.java:132)
at
org.elasticsearch.index.mapper.object.RootObjectMapper.findTemplateBuilder(RootObjectMapper.java:218)
at
org.elasticsearch.index.mapper.object.RootObjectMapper.findTemplateBuilder(RootObjectMapper.java:204)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:691)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:575)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:451)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:486)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:430)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:318)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:431)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
java.lang.NullPointerException
at
org.elasticsearch.action.bulk.BulkShardRequest.beforeLocalFork(BulkShardRequest.java:67)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.retry(TransportShardReplicationOperationAction.java:484)
...

Adding the settings and mapping JSON explicitly on index creation (we're
using the Java API for that) has been working fine for quite some time. Now
when we create a unified configuration in an index template JSON, it works
OK when you create the index. But ES reproducibly fails once the first
indexing operation hits a field that uses the plugin-provided analyzer. The
index template consists of the same settings and mappings JSONs that we
used before. The phonetic plugin is installed correctly and logged on ES
startup.

Any ideas, suggestions?

Klaus

--

Addendum: We just tried to switch back to 0.19.11 - same issue there.

Klaus

--

Hi Klaus,

your problem has nothing to do with plugins or index templates but with
bulk indexing, or with your data.

The cause is a NPE in traversing the bulk item requests on shard level.
There are two reasons, maybe you managed to add a null item to the bulk
requests (which seems very unlikely because the bulk API does not endorse
it) or you are running multiple threads on bulk indexing and they
interfere. Because the ActionRequest list in BulkRequest is not threadsafe,
you need to encapsulate the bulk indexing in a threadsafe manner by
yourself when you reuse a single client by many threads.

Jörg

--

On Sunday, 13 January 2013 10:28:56 UTC+1, Jörg Prante wrote:

The cause is a NPE in traversing the bulk item requests on shard level.
There are two reasons, maybe you managed to add a null item to the bulk
requests (which seems very unlikely because the bulk API does not endorse
it)

Nope. No null items. We've been indexing the same data, using the same
indexing code, for more than half a year now. Many times, many items. No
issues whatsoever. The problem only occurs once we switch the way we
create the indexes. The code to add items to an index is unchanged.

or you are running multiple threads on bulk indexing and they interfere.
Because the ActionRequest list in BulkRequest is not threadsafe, you need
to encapsulate the bulk indexing in a threadsafe manner by yourself when
you reuse a single client by many threads.

This is interesting. Yes, we're doing bulk indexing with several threads
(dozens) on the same client object, using a per-thread pattern like this:

void indexStuff(...) {
final BulkRequestBuilder bulkRequest = client.prepareBulk();
bulkRequest.add(....);
final BulkResponse bulkResponse = bulkRequest.execute().actionGet();
}

Again, so far this has worked perfectly.

Now the idea that this wouldn't be thread-safe is a bit unsettling, also
because it's not really obvious from any documentation, and it makes me
wonder what else isn't thread-safe on the Java API? "Encapsulating in a
thread-safe manner" would effectively kill indexing performance for us, so
our only alternative would be one Client object per thread. Is this the
recommended way of doing things?

Or am I misinterpreting you and your point is that a BulkRequest mustn't be
shared by multiple threads? We're not doing that at all, as shown in the
simplified code snippet above.

The whole thing simply doesn't quite look to me like a threading related
issue - it happens reproducibly and immediately when we use the templates,
and never happens without them.

Thanks
Klaus

--

I've gisted a complete curl recreation at https://gist.github.com/4547457 -
any hints appreciated.

Klaus

--

"index": {...} is missing between "settings" and "analysis". Your template
should look like this:

{
"asset_template": {
"template": "*",
"settings": {

  •        "index": {*
              "analysis": {
    

On Wednesday, January 16, 2013 9:48:58 AM UTC-5, Klaus Brunner wrote:

I've gisted a complete curl recreation at https://gist.github.com/4547457 -
any hints appreciated.

Klaus

--

On Wednesday, 16 January 2013 20:32:31 UTC+1, Igor Motov wrote:

"index": {...} is missing between "settings" and "analysis". Your template
should look like this:

Yay! Thanks Igor, that was indeed it.

I was a little surprised though that this kind of problem isn't caught
earlier or at least with a more helpful error message. But I understand the
settings JSON is basically free-form.

Klaus

--

Note, the bulk items are collected in a simple array list in the
BulkRequest, this is not threadsafe. So, each thread should handle its own
BulkRequest (and I understand you do so).

I use a modified BulkRequestBuilder with Queues.newConcurrentLinkedQueue()
from Guava instead of an array list which can be shared by many threads
(more than 3000 docs per second throughput with 12 threads on a
TransportClient singleton, each doc ~1-10k). The difference is, only a
single flush() call is needed to clear the queue which makes parallel bulk
indexing more manageable in case of abort or client wait and shutdown.

Jörg

--

BulkRequest shared by many threads so you can fill it even faster?

Nice!

Jörg? Did you send a pull request about it?

Regards

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 janv. 2013 à 01:01, Jörg Prante joergprante@gmail.com a écrit :

Note, the bulk items are collected in a simple array list in the BulkRequest, this is not threadsafe. So, each thread should handle its own BulkRequest (and I understand you do so).

I use a modified BulkRequestBuilder with Queues.newConcurrentLinkedQueue() from Guava instead of an array list which can be shared by many threads (more than 3000 docs per second throughput with 12 threads on a TransportClient singleton, each doc ~1-10k). The difference is, only a single flush() call is needed to clear the queue which makes parallel bulk indexing more manageable in case of abort or client wait and shutdown.

Jörg

--

--

I doubt it's faster - ConcurrentLinkedQueue is a beast. I also use size()
for catching the bulk size limit which is a big no-no. In result it comes
with a penalty. I presume that parallel document construction with
XContentBuilder is overweighing the extra concurrent bulk list management.
But I feel Shay won't prefer bulk indexing with ConcurrentLinkedQueue.

Anyway, I found my code does not work with 0.20 any longer so I need to
update. The famous LMAX disruptor looks promising to experiment with, but
is difficult to code and a memory hog. Maybe a simple blocking fifo batch
queue will do, with reentrant locking instead of synchronization.

Cheers,

Jörg

--