Efficient query with multiple boolean flags

Hello, in the "real world" the entities I need to search have many yes/no characteristics which I need to filter in AppSearch (for example, "is retailer", "accepts credit card", "accepts crypto", etc).
Users may filter throug one o many of this characteristics. If more than one is selected, all of them must be meet.

What's the most efficient way to store and filter in this scenario?

The obvious way is to have one field for each characteristic with a "true"/"false" or 1/0 value, but that will be a problem each time I add a new characteristic.

I thought storing the characteristics in a single CSV field, but the text filter doesn't allow wildcards. I also thought in single field numeric alternatives, but the filter doesn't allow basic math operators like MOD or LOG to disaggregate that number

Thank you in advance

1 Like

Hey @gpribi,

I wrote a response to this a few days ago but it looks like I forgot to submit it.

I recommend storing these in a single field like:

{
  "attributes": ["is retailer", "accepts credit card"]
}

Users may filter throug one o many of this characteristics. If more than one is selected, all of them must be meet.

This is just an "All" filter in our API

 "filters": {
    "all": [
      { "attributes": "is retailer" },
      { "attributes": "accepts credit card" }
    ]
  }

I didn't test that syntax but that is the gist ^

1 Like

Hello @JasonStoltz thank you, I like your approach.

I modified Logstash and now I have a string array field called "caracteristicas", with an array of the characteristics met by each shop.

Lets pick one of them, the 150001, as seen on the document section of appsearch.
ScreenHunter_263 Apr. 21 00.26

As you suggested, if I query by one of this characteristics, like "may" or "env", this shop should be in the results. In fact I have a lot of documents with the same characteristics.

However, when I make the query, the results are empty.

REQUEST
{"query": "", "page": {"size": 1000,"current": 1}, "result_fields": {"id": { "raw": {} },"caracteristicas2": { "raw": {} }}, "filters" : { "all": [ { "caracteristicas2": "may" } ] } }

RESPONSE
{ "meta": { "alerts": [], "warnings": [], "page": { "current": 1, "total_pages": 0, "total_results": 0, "size": 1000 }, "engine": { "name": "productos", "type": "default" }, "request_id": "4c579fe5-71e0-49f4-8368-3b52e35bdc47" }, "results": [] }

I also tried escaping "may", wrote an "all" filter with all the characteristics of that shop, added more true criterias to the filter, with no luck.

Am I doing something wrong?

Everything you're doing looks correct. I reproduced what you created with the following commands as was able to execute your search query with no issues. So your syntax looks correct, not sure what else would be going on.

Can you filter by "id" on a query and retrieve the document?

# Create engine productos
POST /api/as/v1/engines
{
  "name": "productos"
}

# Insert document from your example
POST /api/as/v1/engines/productos/documents
[
  {
     "caracteristicas2": ["may", "min", "env", "prob"]
  }
]

# Query
POST /api/as/v1/engines/productos/search
{
    "query": "",
    "page": {
        "size": 1000,
        "current": 1
    },
    "result_fields": {
        "id": {
            "raw": {}
        },
        "caracteristicas2": {
            "raw": {}
        }
    },
    "filters": {
        "all": [
            {
                "caracteristicas2": "may"
            }
        ]
    }
}

# Query response
{
    "meta": {
        "alerts": [],
        "warnings": [],
        "page": {
            "current": 1,
            "total_pages": 1,
            "total_results": 1,
            "size": 1000
        },
        "request_id": "2270a20a89a684ea91ee733f58a118f7"
    },
    "results": [
        {
            "id": {
                "raw": "doc-5e9ed83e82ecd388c30662b9"
            },
            "caracteristicas2": {
                "raw": [
                    "may",
                    "min",
                    "env",
                    "prob"
                ]
            },
            "_meta": {
                "score": 1.0
            }
        }
    ]
}

Sure!

REQUEST
{"query": "", "page": {"size": 1000,"current": 1}, "result_fields": {"id": { "raw": {} },"caracteristicas2": { "raw": {} }}, "filters" : { "all": [ { "id": "150001" } ] } }

RESPONSE
{ "meta": { "alerts": [], "warnings": [], "page": { "current": 1, "total_pages": 1, "total_results": 1, "size": 1000 }, "engine": { "name": "productos", "type": "default" }, "request_id": "f4e6a4b9-486b-4a2c-a40c-8c7a80bccc38" }, "results": [ { "_meta": { "engine": "productos", "id": "150001", "score": 1 }, "id": {"raw": "150001"}, "caracteristicas2": {"raw": "[\"may\", \"min\", \"env\", \"prob\"]"} }] }

Oh I see. Your issue is that you have caracteristicas2 indexed as a string:

"[\"may\", \"min\", \"env\", \"prob\"]"

Not an array of strings

["may", "min", "env", "prob"]
1 Like

Thank you! I applied the json filter to convert that string into an array in appsearch and it worked

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.