Term Query for short keywords that include spaces

Hi there,
stuck on a simple problem.

I have an index containing various RAM sizes (1GB, 2GB, 3GB, 4GB...24GB)

Is there a way to use Term query to check if RAM size exists in the index?

Search for "4 GB" (notice the space), should return exactly the document that contains 4GB

Here a simple index with some test documents:

PUT ram

POST _bulk
{ "index": { "_index": "ram"} }
{ "thefield": "1GB"}
{ "index": { "_index": "ram"} }
{ "thefield": "2GB"}
{ "index": { "_index": "ram"} }
{ "thefield": "3GB"}
{ "index": { "_index": "ram"} }
{ "thefield": "4GB"}
{ "index": { "_index": "ram"} }
{ "thefield": "5GB"}
{ "index": { "_index": "ram"} }
{ "thefield": "6GB"}
{ "index": { "_index": "ram"} }
{ "thefield": "7GB"}
{ "index": { "_index": "ram"} }
{ "thefield": "8GB"}
{ "index": { "_index": "ram"} }
{ "thefield": "24GB"}
this returns no results
GET ram/_search
{
  "query": {
    "term": {
      "thefield": {
        "value": "4 GB"
      }
    }
  }
}
this returns 1 match, exactly what I need
GET ram/_search
{
  "query": {
    "term": {
      "thefield.keyword": {
        "value": "4GB"
      }
    }
  }
}
but this does not...
GET ram/_search
{
  "query": {
    "term": {
      "thefield.keyword": {
        "value": "4gb"
      }
    }
  }
}
this returns no results also
GET ram/_search
{
  "query": {
    "term": {
      "thefield.keyword": {
        "value": "4 GB"
      }
    }
  }
}

Tried my best with Analyzers, Tokenizer, Filters, etc... from the documentation and some online courses. Couldn't get it to work. Before you slam me with "did you google for it?", I did and results are related to performance issues or hardware requirements for elastic. Probably because of my specific need to match RAM size with space and without.

Term Query, returns documents that contain an exact term in a provided field.
So, for term query, search 4gb and 4 gb will not hit 4GB.

The simplest way is convert the 4gb and 4 gb to 4GB before send the query to ES.

Thanks for replying.
The thing is, it will eventually get more "attributes" so it will end up as a huge problem. I will have to hire an ML expert to get working on feature extraction out of 10m html documents.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.