Wrong analyser used when indexing dynamic property

Hi,

I'm seeing some unusual behaviour when indexing documents in Elasticsearch
and am hoping someone here might be able to help me solve the problem. So...

I have a unit test that's failing intermittently. The flow of the test is
as follows:

  1. Initiailise in-memory Elasticsearch cluster (one local node, no
    replicas)
  2. Create new index
  3. Create new type mapping
  4. Index some documents
  5. Refresh index and wait for all documents to be processed
  6. Query Elasticsearch for documents

The type mapping I'm using includes the following dynamic template
definition:

{
    "participants": {
        "path_match": "participants.*",
        "mapping": {
            "type": "string",
            "store": "yes",
            "index": "analyzed",
            "analyzer": "whitespace"
        }
    }
}

This template is intended to produce fields of the form:

participants.new = [ 'user1@some.domain.com', 'user2@some.domain.com' ]
participants.removed = [ 'user3@some.domain.com' ]

The problem I have is that occasionally (perhaps once in every ten runs)
the test will fail because step 6 does not return all the expected
documents. When I check the indexed terms for the missing documents I see
that the 'participants' field has been split into separate tokens on the
'@' character. This seems to suggest that the default analyzer is being
used for indexing instead of the whitespace one.

So far I haven't been able to detect any pattern to the failures. The
unexpected tokenisation only affects a portion of the indexed documents and
can occur at any point in the indexing process (i.e. it isn't always the
first or last document that has problems).

Let me know if I can provide any additional information to help diagnose
this issue. Any help you can provide will be much appreciated as I'm not
sure what to try next.

Cheers,
Paul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.