ELasticsearch returns unmatched results

I am newbie to ELasticsearch and would need your help regarding ES returning unmatched results for analyzed field. i have a custom analyzer as follows:

"testing_analyzer": { "type": "custom", "char_filter": "html_strip" , "tokenizer": "standard", "filter": [ "lowercase", "asciifolding" , "snowball" , "stop" ] },

i have set this analyzer for a field on both index & search as follows.

"name": { "type": "string", "analyzer": "testing_analyzer", "search_analyzer": "testing_search_analyzer" }

but when search for name "università di bologna", it return first result record have same match, but some other records not match (2nd record in result below):

Record1: [ "Università di Bologna", "University of Bologna", "CNR", "Università di Pisa", "University of Pisa", "Mineraria e Delle Tecnologie Ambientali" ]

Record2:

[ "University of Salerno", "Università di Salerno" ]

any help ?!

The low-level details of why a document did (or didn't) match can be revealed using this API: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-explain.html

here is the explain result:
"hits" : {
"total" : 10,
"max_score" : 0.97005683,
"hits" : [ {
"fields" : {
"name" : [ "Istituto Nazionale di Fisica Nucleare", "INFN", "", "University of Michigan", "Forschungszentrum Karlsruhe", "Kernforschungszentrum Karlsruhe", "University of Naples Federico II", "CERN", "Università di Bologna", "University of Bologna", "Università di Milano", "University of Milan", "INFN Laboratori Nazionali di Frascati"]
},
"_explanation" : {
"value" : 0.97005683,
"description" : "sum of:",
"details" : [ {
"value" : 0.97005683,
"description" : "sum of:",
"details" : [ {
"value" : 0.31842932,
"description" : "weight(name:universita in 12) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.31842932,
"description" : "score(doc=12,freq=7.0), product of:",
.....
}, {
"value" : 0.4814199,
"description" : "weight(name:di in 12) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.4814199,
"description" : "score(doc=12,freq=16.0), product of:",
"details" : [ {
"value" : 0.57735026,
......
}, {
"value" : 0.17020763,
"description" : "weight(name:bologna in 12) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.17020763,
"description" : "score(doc=12,freq=2.0), product of:",
"details" : [ {
"value" : 0.57735026,
"description" : "queryWeight, product of:",
"details" : [ {
"value" : 3.3353748,
........
}, {
"_id" : "4",
"_score" : 0.6874734,
"fields" : {
"name" : [ "University of Salerno", "Università di Salerno" ]
},
"_explanation" : {
"value" : 0.6874734,
"description" : "sum of:",
"details" : [ {
"value" : 0.6874734,
"description" : "product of:",
"details" : [ {
"value" : 1.0312101,
"description" : "sum of:",
"details" : [ {
"value" : 0.51560503,
"description" : "weight(name:universita in 9) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.51560503,
"description" : "score(doc=9,freq=1.0), product of:",
"details" : [ {
"value" : 0.526913,
.......
}, {
"value" : 0.51560503,
"description" : "weight(name:di in 9) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.51560503,
"description" : "score(doc=9,freq=1.0), product of:",
"details" : [ {
"value" : 0.526913,
"description" : "queryWeight, product of:",
"details" : [ {
"value" : 2.609438,
......
}

I'm probably not understanding the question exactly, but the record you said was missing, I see it in the explain output:

"_id" : "4",
"_score" : 0.6874734,
"fields" : {
"name" : [ "University of Salerno", "Università di Salerno" ]

well, i am getting by default "10" matched records where the first "3" matches contains exactly the term i am looking for which is "università di bologna", the next hits contains only partial of the search term and some contains exactly same search term with lower score than partial match? what i expect from submitting this query to return all records that only match what is in the query

"hits" : {
"total" : 10,
"max_score" : 0.87562394,

"_score" : 0.87562394,
"fields" : {"name": [ "Chulalongkorn University", "Istituto Nazionale di Fisica Nucleare", "INFN", "CERN", "Università di Bologna", "University of Bologna", "University of Singapore, National", "National University of", "Univ. Nationale de Singapour", "NUS", "National University of Singapore", "The National University of Singapore"]
},

"_score" : 0.86642766,
"fields" : {
"name" : [ "Università di Bologna", "University of Bologna", "CNR", "Università di Pisa", "University of Pisa", "Mineraria e Delle Tecnologie Ambientali"]
},
"_score" : 0.8352318,
"fields" : {
"name" : [ "Istituto Nazionale di Fisica Nucleare", "INFN", "", "University of Michigan", "Forschungszentrum Karlsruhe", "Kernforschungszentrum Karlsruhe", "University of Naples Federico II", "CERN", "Università di Bologna", "University of Bologna", "Università di Milano", "University of Milan", "INFN Laboratori Nazionali di Frascati", "Istituto Nazionale di Fisica Nucleare", "INFN"]
},
"_score" : 0.8115465,
"fields" : {
"name" : [ "University of Palermo", "Università di Palermo" ]
},
"_score" : 0.7826258,
"fields" : {
"name" : [ "University of Siena", "Università di Siena", "Purdue Univ", "Purdue University", "Lawrence Berkeley Natl. Laboratory", "Lawrence Berkeley National Laboratory", "Lawrence Berkeley Laboratory", "University of Michigan", "Istituto Nazionale di Fisica Nucleare", "INFN", "Università di Bologna", "University of Bologna", "INFN Laboratori Nazionali di Frascati"]
},
"_score" : 0.6874734,
"fields" : {
"name" : [ "University of Salerno", "Università di Salerno" ]
},
"_score" : 0.6874734,
"fields" : {
"name" : [ "University of Palermo", "Università di Palermo" ]
},
"_score" : 0.6129301,
"fields" : {
"name" : [ "Universidad Complutense de Madrid", "Universidad Complutense", "Istituto Nazionale di Fisica Nucleare", "INFN", "Università di Bologna", "University of Bologna", "University of Rome Tor Vergata", "University of Stockholm"]
},

It sounds like you're wondering why this entry was scored higher than the entry below it. I'd recommend running the explain API against that particular document with your query. I don't see the output of that explanation anywhere

I'd also experiment with the standard analyzer(s) first, check those work as you'd expect, then move up to a custom one.

yes, i tried the standard one, but because the special character i could not search for "università di bologna". even when i index the content as "universita di bologna", replacing the special char "à" with "a" also i get same behavior as with custom analyzer.

is there a way to upload a file with output of "explain"? i could not post it here because of size limitation