Issue with nGram Analyzer on ECE

Hi, I'm creating a index on ECE with a nGram analyzer but never works the query. This is my index:

{
	"settings": {
		"analysis": {
			"analyzer": {
				"ailabs_analyzer": {
					"type": "stop",
					"stopwords": ["a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with", "any", "than", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t"]
				},
				"ngram_analyzer": {
					"type": "custom",
          "tokenizer": "ngram_tokenizer",
					"filter":[
						"lowercase"
					]
        }
			},
			"tokenizer": {
        "ngram_tokenizer": {
          "type": "nGram",
          "min_gram": 3,
          "max_gram": 4
        }
      }
		}
	},
	"mappings": {
		"properties": {
			"documentId": {
				"type": "keyword"
			},
			"documentName": {
				"type": "text",
				"fielddata": true,
				"fields": {
					"keyword": {
						"type": "keyword"
					}
				}
			},
			"documentType": {
				"type": "text",
				"fields": {
					"sortable": {
						"type": "keyword"
					}
				}
			},
			"dateCreated": {
				"type": "keyword"
			},
			"userCreated": {
				"type": "text",
				"fields": {
					"sortable": {
						"type": "keyword"
					}
				}
			},
			"language": {
				"type": "text",
				"fields": {
					"sortable": {
						"type": "keyword"
					}
				}
			},
			"unitOfAnalysis": {
				"properties": {
					"uoaId": {
						"type": "text"
					},
					"unitOfAnalysis": {
						"type": "text",
						"analyzer": "ngram_analyzer"
					},
					"page": {
						"type": "integer"
					},
					"index": {
						"type": "integer"
					},
					"percent": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"duration": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"org": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"date": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"cardinal": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"ordinal": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"gpe": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"person": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"work_of_art": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"time": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"law": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"money": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"loc": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"frequency": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"fac": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"norp": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"quantity": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					},
					"language": {
						"type": "text",
						"analyzer": "ailabs_analyzer",
						"fielddata": true,
						"fields": {
							"keyword": {
								"type": "keyword"
							}
						}
					}
				}
			}
		}
	}
}

This is my query

{
    "query": {
				"query_string" : {
					"query" : "unitOfAnalysis.unitOfAnalisys:memb",
					"fields" : []
				}
		}
}

If I search for 'memb' I didn't get results, if I search for 'member' I get 12 results, if I search for members I get 17 results. Any ideas about what I'm doing wrong?

What does the document you are expecting to match with your query look like? There also seems to be a typo. In your schema you have specified the field as unitOfAnalysis while it in the query is unitOfAnalisys.

Right now I fix the query, but now I don't get any results

{
    "query": {
				"query_string" : {
					"query" : "unitOfAnalysis.unitOfAnalysis:memb",
					"fields" : [ ]
				}
		}
}


{
        "query": {
    				"query_string" : {
    					"query" : "memb",
    					"fields" : [ "unitOfAnalysis.unitOfAnalysis"]
    				}
    		}
    }

What does the document you expect to match look like? Please provide a minimal example that reproduces the issue.

For example If I search with out fields I get the next respose as result:

{
    "query": {
				"query_string" : {
					"query" : "member",
					"fields" : []
				}
		}
}

Result:

{
  "took": 995,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 12,
      "relation": "eq"
    },
    "max_score": 8.96235,
    "hits": [
      {
        "_index": "document",
        "_type": "_doc",
        "_id": "5dcc82289f6c34f13f75275a",
        "_score": 8.96235,
        "_source": {
          "documentId": "35",
          "documentName": "GB020_Hull _MARINA_COURT_CASTLE_STREET_Nov 2012.PDF",
          "documentType": "real estate",
          "unitOfAnalysis": {
            "uoaId": "5dcc82289f6c34f13f75275a",
            "unitOfAnalisys": "Designated Member 49 g:\\031793-140534\\01752511.doc",
            "page": 53,
            "index": 3,
            "percent": null,
            "duration": null,
            "org": [
              "g:\\031793"
            ],
            "date": null,
            "cardinal": [
              "49"
            ],
            "ordinal": null,
            "gpe": null,
            "person": null,
            "workOfArt": null,
            "time": null,
            "law": null,
            "money": null,
            "loc": null,
            "frequency": null,
            "fac": null,
            "norp": null,
            "quantity": null,
            "language": null
          },
          "entities": null,
          "language": "en",
          "dateCreated": 1573665516381,
          "userCreated": "",
          "file": null
        }
      },
      {
        "_index": "document",
        "_type": "_doc",
        "_id": "5dcc5c700c03bfac18f27d63",
        "_score": 7.921951,
        "_source": {
          "documentId": "33",
          "documentName": "GB_Telford_FullerHouse_Headlease_12.011.pdf",
          "documentType": "real estate",
          "unitOfAnalysis": {
            "uoaId": "5dcc5c700c03bfac18f27d63",
            "unitOfAnalisys": "3.10.9 The foregoing provisions of this clause 3.10 shall not apply to any parting with possession or occupation or the sharing of occupation or sub-division of the Demised Premises to or with any member of a group of companies of which the Tenant is itself a member upon the conditions that:-",
            "page": 14,
            "index": 2,
            "percent": null,
            "duration": null,
            "org": [
              "the Demised Premises",
              "Tenant"
            ],
            "date": null,
            "cardinal": [
              "3.10"
            ],
            "ordinal": null,
            "gpe": null,
            "person": null,
            "workOfArt": null,
            "time": null,
            "law": null,
            "money": null,
            "loc": null,
            "frequency": null,
            "fac": null,
            "norp": null,
            "quantity": null,
            "language": null
          },
          "entities": null,
          "language": "en",
          "dateCreated": 1573655586490,
          "userCreated": "",
          "file": null
        }
      },
      {
        "_index": "document",
        "_type": "_doc",
        "_id": "5dcc82279f6c34f13f752759",
        "_score": 7.749855,
        "_source": {
          "documentId": "35",
          "documentName": "GB020_Hull _MARINA_COURT_CASTLE_STREET_Nov 2012.PDF",
          "documentType": "real estate",
          "unitOfAnalysis": {
            "uoaId": "5dcc82279f6c34f13f752759",
            "unitOfAnalisys": "Signed as a deed by XYZ LLP ) acting by two designated members and ) delivered at the date hereof: ) ) Designated Member",
            "page": 53,
            "index": 2,
            "percent": null,
            "duration": null,
            "org": [
              "XYZ LLP"
            ],
            "date": [
              "the date hereof"
            ],
            "cardinal": [
              "two"
            ],
            "ordinal": null,
            "gpe": null,
            "person": null,
            "workOfArt": null,
            "time": null,
            "law": null,
            "money": null,
            "loc": null,
            "frequency": null,
            "fac": null,
            "norp": null,
            "quantity": null,
            "language": null
          },
          "entities": null,
          "language": "en",
          "dateCreated": 1573665516381,
          "userCreated": "",
          "file": null
        }
      }, ...

Like I do a tokenization by 3 or 4 letters I expect get results querying on the field that have the analyzer for example with this query:

{
        "query": {
    				"query_string" : {
    					"query" : "memb",
    					"fields" : [ "unitOfAnalysis.unitOfAnalysis"]
    				}
    		}
    }

But I expect more results because must be also considered words as "members" or "membership" because has tokens related to the query

The field in your document has the same spelling issue and does not match the field you specified ngram analyzer for in your mapping. If you look at the mappings for the index I believe you will find that the field you are querying has the default mapping, which means it is a standard text field without ngrams. this explains why the full word matches but partials do not.

I am also not convinced your mappings are valid as you have the same field name being defined as an object as well as a string although at different points in the hierarchy. Which version of Elasticsearch are you on?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.