Wrong number of result using match_phrase_prefix

Hi all
I have a on-premise ELK instance where i have an index with roughly 50.000 documents with 30 columns. One of the columns is name, indexed as a text.
I have 4500 documents called VFCXXXX where the X is numbers, i.e VFC1234, VFC2345.

When i search with a match_phrase_prefix with query: "VFC" i only get 70 results, all starting with VFC1XXX.
I was under the impression that fields with type text would match all with the prefix VFC even if there is only one value in the field?

Thanks in advance
/Martin

It'd be good if you could share some example data, the query and the response.

1 Like

Hi Mark
i will post what as much details as possible:
Mapping of property name:

      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }

Data (this is just a subset, but it should give you an idea of the structure of the data in the name field, in total there is 4888 documents where name is like this):

name
VFC3679
VFC1378
VFC3607
VFC3028
VFC3027
VFC3026
VFC3024
VFC850
VFC341
VFC327
VFC302
VFC301
VFC294
VFC293
VFC291
VFC3642
VFC3641
VFC3609
VFC3455
VFC3454
Query:
{
    "query": {
        "match_phrase_prefix": {
            "name": "vfc"
        }
    },
    "fields": [
        "name"
    ],
    "_source": false
}

Result:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 63,
            "relation": "eq"
        },
        "max_score": 12.9442215,
        "hits": [
            {
                "_index": "chcc_staging_idx",
                "_id": "32821",
                "_score": 12.9442215,
                "fields": {
                    "name": [
                        "VFC1"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33699",
                "_score": 12.9442215,
                "fields": {
                    "name": [
                        "VFC1000"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33700",
                "_score": 12.9442215,
                "fields": {
                    "name": [
                        "VFC1001"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33701",
                "_score": 12.9442215,
                "fields": {
                    "name": [
                        "VFC1002"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33702",
                "_score": 12.9442215,
                "fields": {
                    "name": [
                        "VFC1003"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33703",
                "_score": 12.9442215,
                "fields": {
                    "name": [
                        "VFC1004"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33704",
                "_score": 12.9442215,
                "fields": {
                    "name": [
                        "VFC1005"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33705",
                "_score": 12.9442215,
                "fields": {
                    "name": [
                        "VFC1006"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33706",
                "_score": 12.9442215,
                "fields": {
                    "name": [
                        "VFC1007"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33707",
                "_score": 12.9442215,
                "fields": {
                    "name": [
                        "VFC1008"
                    ]
                }
            }
        ]
    }
}

@warkolm Is the provided example data sufficient?

Hi @warkolm
Any idea on this issue?

/Martin

Hi @warkolm
Any idea on this issue?

/Martin

You did not state which version of Elasticsearch you are using. This is something you should always provide.

If you are on a recent version would help if you could rerun the query with track_total_hits set to true.

HI Christian
sorry for the missing information, i'm using version 8.4.3 of ELK.
Below is the response with "track_total_hits": true
I've also put in a picture from Kibana searching for VFC*

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 63,
            "relation": "eq"
        },
        "max_score": 12.865288,
        "hits": [
            {
                "_index": "chcc_staging_idx",
                "_id": "32821",
                "_score": 12.865288,
                "fields": {
                    "name": [
                        "VFC1"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33699",
                "_score": 12.865288,
                "fields": {
                    "name": [
                        "VFC1000"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33700",
                "_score": 12.865288,
                "fields": {
                    "name": [
                        "VFC1001"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33701",
                "_score": 12.865288,
                "fields": {
                    "name": [
                        "VFC1002"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33702",
                "_score": 12.865288,
                "fields": {
                    "name": [
                        "VFC1003"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33703",
                "_score": 12.865288,
                "fields": {
                    "name": [
                        "VFC1004"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33704",
                "_score": 12.865288,
                "fields": {
                    "name": [
                        "VFC1005"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33705",
                "_score": 12.865288,
                "fields": {
                    "name": [
                        "VFC1006"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33706",
                "_score": 12.865288,
                "fields": {
                    "name": [
                        "VFC1007"
                    ]
                }
            },
            {
                "_index": "chcc_staging_idx",
                "_id": "33707",
                "_score": 12.865288,
                "fields": {
                    "name": [
                        "VFC1008"
                    ]
                }
            }
        ]
    }
}

image

Think i forgot to reply to your message

OK. What is you set size to something greater than the hit count in your query? Can you show the JSON of one of the hits (at least the full relevant queried field) as well as one of the documents you expect to match that do not? That way we could try to reproduce it.

Hi @Gorelldk

I did a simple test by adding 100 documents with the name "VPC"+index.
When I ran the match_phrase_prefix with the term "vpc" I got a total of 50 documents found.
After I changed the "max_expansions" parameter to 100, I managed to have 100 documents found.
I don't know if it will work for you but that was my experience.

1 Like

Setting size did not change the outcome.
This is one of the documents that do not show up when searching for VFC, but will come up when searching for VFC2. And below there is a document that will come up when searching just for VFC

{
  "_index": "chcc_staging_idx",
  "_id": "38151",
  "_version": 1,
  "_score": 0,
  "_source": {
    "contractname": "CCP case no. 1349",
    "mutagenesiseln": null,
    "chcc_search": "38151",
    "isgmo": false,
    "homologousrecombinationtype": null,
    "genusname": "Propionibacterium",
    "speciesname": "freudenreichii",
    "@version": "1",
    "genomesequence": true,
    "biosafetylevel": "NotDetermined",
    "materialnumber": null,
    "provider": null,
    "chplasmidids": null,
    "externalid": null,
    "type": "culture",
    "depositnumber": null,
    "updateddate": "1969-12-31T23:00:01Z",
    "biovar": null,
    "singleormulti": "Single",
    "status": "Deposited",
    "serovar": null,
    "antibioticresistance": null,
    "cbd": "NagoyaCleared",
    "originchcc": "34663",
    "culturetypedescription": "Strain Supply Chain",
    "companyname": null,
    "migrationbiovarserovar": "",
    "unix_ts_in_secs": 1,
    "detailsofcontractualobligations": null,
    "iscompetitorculture": false,
    "@timestamp": "2022-11-25T08:53:01.987292400Z",
    "name": "VFC2009",
    "othername": "P1125/1; VAL_ID:P1125/1",
    "genotypicspeciesdescription": "Partial 16S rDNA Sequence",
    "sampletype": null,
    "mutagenesistype": null,
    "genotypicspecies": "Partial 16S rDNA Sequence",
    "location": "6-26-3-2",
    "productname": null,
    "gmotype": null,
    "label": null,
    "deposittype": null,
    "gmoconstructioneln": null,
    "contractualobligation": "None",
    "country": null,
    "mutantconstructiontype": null,
    "subspeciesname": null,
    "addresssamplingsite": null,
    "chcc": 38151,
    "details": null,
    "depositdescription": null,
    "derivationtype": null,
    "pfge": false,
    "origindetails": "CHCC34663 <- Valio <- Finnish dairies/Anneli Rauhamäki; Finland; 1979",
    "lotnumber": null
  },
  "fields": {
    "migrationbiovarserovar": [
      ""
    ],
    "culturetypedescription": [
      "Strain Supply Chain"
    ],
    "genusname": [
      "Propionibacterium"
    ],
    "speciesname": [
      "freudenreichii"
    ],
    "status.keyword": [
      "Deposited"
    ],
    "type": [
      "culture"
    ],
    "originchcc.keyword": [
      "34663"
    ],
    "speciesname.keyword": [
      "freudenreichii"
    ],
    "originchcc": [
      "34663"
    ],
    "genotypicspecies": [
      "Partial 16S rDNA Sequence"
    ],
    "cbd.keyword": [
      "NagoyaCleared"
    ],
    "biosafetylevel.keyword": [
      "NotDetermined"
    ],
    "othername": [
      "P1125/1; VAL_ID:P1125/1"
    ],
    "type.keyword": [
      "culture"
    ],
    "contractualobligation": [
      "None"
    ],
    "@version": [
      "1"
    ],
    "genomesequence": [
      true
    ],
    "chcc_search.keyword": [
      "38151"
    ],
    "genotypicspeciesdescription.keyword": [
      "Partial 16S rDNA Sequence"
    ],
    "contractname.keyword": [
      "CCP case no. 1349"
    ],
    "chcc_search": [
      "38151"
    ],
    "pfge": [
      false
    ],
    "migrationbiovarserovar.keyword": [
      ""
    ],
    "isgmo": [
      false
    ],
    "cbd": [
      "NagoyaCleared"
    ],
    "iscompetitorculture": [
      false
    ],
    "genusname.keyword": [
      "Propionibacterium"
    ],
    "contractname": [
      "CCP case no. 1349"
    ],
    "@version.keyword": [
      "1"
    ],
    "name.keyword": [
      "VFC2009"
    ],
    "culturetypedescription.keyword": [
      "Strain Supply Chain"
    ],
    "chcc": [
      38151
    ],
    "unix_ts_in_secs": [
      1
    ],
    "origindetails": [
      "CHCC34663 <- Valio <- Finnish dairies/Anneli Rauhamäki; Finland; 1979"
    ],
    "biosafetylevel": [
      "NotDetermined"
    ],
    "genotypicspeciesdescription": [
      "Partial 16S rDNA Sequence"
    ],
    "@timestamp": [
      "2022-11-25T08:53:01.987Z"
    ],
    "name": [
      "VFC2009"
    ],
    "updateddate": [
      "1969-12-31T23:00:01.000Z"
    ],
    "location": [
      "6-26-3-2"
    ],
    "singleormulti": [
      "Single"
    ],
    "genotypicspecies.keyword": [
      "Partial 16S rDNA Sequence"
    ],
    "singleormulti.keyword": [
      "Single"
    ],
    "contractualobligation.keyword": [
      "None"
    ],
    "location.keyword": [
      "6-26-3-2"
    ],
    "othername.keyword": [
      "P1125/1; VAL_ID:P1125/1"
    ],
    "status": [
      "Deposited"
    ],
    "origindetails.keyword": [
      "CHCC34663 <- Valio <- Finnish dairies/Anneli Rauhamäki; Finland; 1979"
    ]
  }
}
{
  "_index": "chcc_staging_idx",
  "_id": "39739",
  "_version": 1,
  "_score": 0,
  "_source": {
    "contractname": "CCP case no. 1349",
    "mutagenesiseln": null,
    "chcc_search": "39739",
    "isgmo": false,
    "homologousrecombinationtype": null,
    "genusname": "Streptococcus",
    "speciesname": "thermophilus",
    "@version": "1",
    "genomesequence": true,
    "biosafetylevel": "BSI",
    "materialnumber": null,
    "provider": null,
    "chplasmidids": null,
    "externalid": null,
    "type": "culture",
    "depositnumber": null,
    "updateddate": "1969-12-31T23:00:01Z",
    "biovar": null,
    "singleormulti": "Single",
    "status": "Deposited",
    "serovar": null,
    "antibioticresistance": null,
    "cbd": "NagoyaCleared",
    "originchcc": "32915",
    "culturetypedescription": "Strain Supply Chain",
    "companyname": null,
    "migrationbiovarserovar": "",
    "unix_ts_in_secs": 1,
    "detailsofcontractualobligations": null,
    "iscompetitorculture": false,
    "@timestamp": "2022-11-25T08:53:04.471721400Z",
    "name": "VFC100",
    "othername": "Str 200; VAL_ID:0158",
    "genotypicspeciesdescription": "Partial 16S rDNA Sequence",
    "sampletype": null,
    "mutagenesistype": null,
    "genotypicspecies": "Partial 16S rDNA Sequence",
    "location": "6-29-15-14",
    "productname": null,
    "gmotype": null,
    "label": null,
    "deposittype": null,
    "gmoconstructioneln": null,
    "contractualobligation": "None",
    "country": null,
    "mutantconstructiontype": null,
    "subspeciesname": null,
    "addresssamplingsite": null,
    "chcc": 39739,
    "details": null,
    "depositdescription": null,
    "derivationtype": null,
    "pfge": false,
    "origindetails": "CHCC32915 <- Valio collection <- Hansen quark starter; 1969",
    "lotnumber": null
  },
  "fields": {
    "migrationbiovarserovar": [
      ""
    ],
    "culturetypedescription": [
      "Strain Supply Chain"
    ],
    "genusname": [
      "Streptococcus"
    ],
    "speciesname": [
      "thermophilus"
    ],
    "status.keyword": [
      "Deposited"
    ],
    "type": [
      "culture"
    ],
    "originchcc.keyword": [
      "32915"
    ],
    "speciesname.keyword": [
      "thermophilus"
    ],
    "originchcc": [
      "32915"
    ],
    "genotypicspecies": [
      "Partial 16S rDNA Sequence"
    ],
    "cbd.keyword": [
      "NagoyaCleared"
    ],
    "biosafetylevel.keyword": [
      "BSI"
    ],
    "othername": [
      "Str 200; VAL_ID:0158"
    ],
    "type.keyword": [
      "culture"
    ],
    "contractualobligation": [
      "None"
    ],
    "@version": [
      "1"
    ],
    "genomesequence": [
      true
    ],
    "chcc_search.keyword": [
      "39739"
    ],
    "genotypicspeciesdescription.keyword": [
      "Partial 16S rDNA Sequence"
    ],
    "contractname.keyword": [
      "CCP case no. 1349"
    ],
    "chcc_search": [
      "39739"
    ],
    "pfge": [
      false
    ],
    "migrationbiovarserovar.keyword": [
      ""
    ],
    "isgmo": [
      false
    ],
    "cbd": [
      "NagoyaCleared"
    ],
    "iscompetitorculture": [
      false
    ],
    "genusname.keyword": [
      "Streptococcus"
    ],
    "contractname": [
      "CCP case no. 1349"
    ],
    "@version.keyword": [
      "1"
    ],
    "name.keyword": [
      "VFC100"
    ],
    "culturetypedescription.keyword": [
      "Strain Supply Chain"
    ],
    "chcc": [
      39739
    ],
    "unix_ts_in_secs": [
      1
    ],
    "origindetails": [
      "CHCC32915 <- Valio collection <- Hansen quark starter; 1969"
    ],
    "biosafetylevel": [
      "BSI"
    ],
    "genotypicspeciesdescription": [
      "Partial 16S rDNA Sequence"
    ],
    "@timestamp": [
      "2022-11-25T08:53:04.471Z"
    ],
    "name": [
      "VFC100"
    ],
    "updateddate": [
      "1969-12-31T23:00:01.000Z"
    ],
    "location": [
      "6-29-15-14"
    ],
    "singleormulti": [
      "Single"
    ],
    "genotypicspecies.keyword": [
      "Partial 16S rDNA Sequence"
    ],
    "singleormulti.keyword": [
      "Single"
    ],
    "contractualobligation.keyword": [
      "None"
    ],
    "location.keyword": [
      "6-29-15-14"
    ],
    "othername.keyword": [
      "Str 200; VAL_ID:0158"
    ],
    "status": [
      "Deposited"
    ],
    "origindetails.keyword": [
      "CHCC32915 <- Valio collection <- Hansen quark starter; 1969"
    ]
  }
}

It works for me if i set the "max_expansions" to 10.000, then it return all the documents in question. I don't know if this is the right approach but it works. However i see that search-ui does not implement this, which we are also using.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.