String is analyzed non deterministically

UPDATE: I initially thought this was an issue with custom routing since it cropped up after experimenting with _routing, but it seems to be an issue with analyzers.

One property with mapping "type": "string" and value "ReadingRainbow2.0" is causing inconsistent query results.
It seems to be analyzed inconsistently. When I query using match, I get a different number of hits each time I run the query.

If I use a term filter for both the original string and what I expect it to be analyzed as, I get consistent results (same hits count each time I run the query) so it seems that some documents are being analyzed one way, and some the other...

{
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "should": [
                        {
                            "term": { "edApp.name": "ReadingWonders2.0" }
                        },
                        {
                            "term": { "edApp.name": "readingwonders2.0" }
                        }
                    ]
                }
                
            }
        }
    }
}

Any ideas?

Using ES 1.5.2 through AWS Elasticsearch Service

Hi,

Can you specify what you are doing with a little more detail? For example:

  • the title suggest the property you are searching for is "analyzed". If so, what are the analyzer settings you use for this in your mapping
  • if this setting the same for all indices you are searching over through the read-alias?
  • How do use the routing value while indexing?
  • When you say you just started this, are you still searching over indices where you didn't use the routing through the alias?
  • Can you post examples of the query (match or term) that you are getting different results for? Possibly with some sample documents you expect would match but don't?

Here is the mapping. It is string with default settings.

"name": {
    "type": "string"
}

Right now there is only one index behind my read alias. I reset all my indexes before indexing with routing value.

The query for edApp.name is the condition that seems to break the query. When it is excluded, the query always returns results, but when included sometimes query returns nothing, sometimes returns results.

Here is the entire query:

{
    "query": {
        "filtered": {
            "query": {
                "bool": {
                    "must": [
                        {
                            "match": {
                                "edApp.name": "ReadingRainbow2.0"
                            }
                        },
                        {
                            "match": {
                                "object.isPartOf.name": "Reporting_Auto_Test_1454365827"
                            }
                        }
                    ]
                }
            },
            "filter": {
                "bool": {
                    "must": [
                        
                        {
                            "terms": {
                                "@context": [
                                    "http://purl.imsglobal.org/ctx/caliper/v1/Context"
                                ]
                            }
                        },
                        {
                            "terms": {
                                "@type": [
                                    "http://purl.imsglobal.org/caliper/v1/AssessmentItemEvent"
                                ]
                            }
                        },
                        {
                            "terms": {
                                "action": [
                                    "http://purl.imsglobal.org/vocab/caliper/v1/action#Completed"
                                ]
                            }
                        },
                        {
                            "terms": {
                                "actor.@context": [
                                    "http://purl.imsglobal.org/ctx/caliper/v1/Context"
                                ]
                            }
                        }
                    ]
                }
            }
        }
    }
}

Example of offending property:

{
  ...
  "edApp": {
    "name": "ReadingRainbow2.0"
  }
}

I see the same problem when querying the index directly, bypassing my aliases. I think this must be some weird mapping or routing issue.

Results when document is found:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "hits": {
    "total": 992,
    "max_score": 3.6582928,
    "hits": [
...

When not found:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "hits": {
    "total": 28,
    "max_score": 7.1917276,
    "hits": [
...

Nothing obvious comes to my mind, but I'm still in the dark about how you do the custom routing, both at index and at search time. Also it would probably help to isolate the problem, maybe just using that field that you think is causing the problem and see if the results are still different.

Hi @mashbourne

great you found out it wasn't the routing. But changing all your previous comments on this thread is confusing and makes it hard to follow how you came to this conclusion for other readers of this thread. This forum is not only ment to be a help-desk but also a source of information for other users, so please if you do update your comments, make only small updates or correct typos.
I restored your deleted comments to their previous version, let me know of this bothers you in which case I would suggest to delete this thread altogether.
Also, please open a new thread for the supposed analysis problem. This is a separate issues from what you originally posted.

Thanks.

Just delete the thread then. I removed prior comments since they turned out to be inaccurate and had no value as source of information