Problem with searching and deleting data in Elasticsearch using term and regexp

I am currently developing a user data analysis platform and have a problem with searching and deleting data in Elasticsearch using term and regexp. I would be very grateful for checking the solutions below and writing so some of them do not work or maybe even the working ones should be changed in a more optimal way. Thank you very much in advance.

I insert my data to Elasticsearch in this way:

es.index(index='files', body={
                        'user_id': user_id,
                        'path': str(path_to_file), // for example "/folder1/image.jpg"
                        'text1': "example text1",
                        'text2': "example text2",
                        'text3': "example text3"
                    })

When I want to display all documents for the specified user I do something like this (it seems to be working but maybe should be done in a more optimal way?):

    body = {"size": 10000}
    body['post_filter'] = {"match": {"user_id": user_id}}

    res = es.search(index="files", body=body)

    output = []

    for hit in res['hits']['hits']:
        output.append(
            {"text1": hit["_source"]['text1'], "path": hit["_source"]['path'], "text2": hit["_source"]['text2'],
             "text3": hit["_source"]['text3']})


    return Response(
        output,
        status=status.HTTP_200_OK)

When I want to access only one field for example text2 I do something like this (it seems to be working but maybe should be done in a more optimal way?):

    body = {
        "_source": ["text2"],
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "path": str(path_to_file), // For example "/folder2/document.txt"
                        }
                    },
                    {
                        "match": {
                            "user_id": user_id
                        }
                    }
                ]
            }
        }
    }

    try:

        res = es.search(index="files", body=body)["hits"]["hits"][0]["_source"]["text2"]

    except:
        return Response(
            {'text': 'There is no text2 for this file.'},
            status=status.HTTP_404_NOT_FOUND)

The problem arises when I try to filter data based on the values in certain fields using specified patterns. All files of the given user should be returned that meet all the conditions for the specified fields.

{
    "query": {
        "bool": {
            "must": [
                {
                    "regexp": {
                        "path": str(path) // For example "/example_folder/.*"
                    }
                },
                {
                    "regexp": {
                        "text1": ".*example_word.*"
                    }
                },
                {
                    "regexp": {
                        "text2": "example sentence at the beginning.*"
                    }
                },
                {
                    "regexp": {
                        "text3": ".*example sentence in the middle.*"
                    }
                }
            ]
        }
    },
    "post_filter": {
        "match": {
            "user_id": user_id
        }
    }
}

res = es.search(index="files", body=body)

for hit in res['hits']['hits']:
    output.append(
        {"text1": hit["_source"]['text1'], "path": hit["_source"]['path'], "text2": hit["_source"]['text2'],
         "text3": hit["_source"]['text3']})

return Response(output, status=status.HTTP_200_OK)

Definitely the biggest problems occur with deleting documents using delete_by_query and filtering fields using regexp and term. In this case we have to scenarios.

  1. I want to delete all files of the specified user from the specified folder. I try to do this in this way:

     doc = {
         "query": {
             "bool": {
                 "must": [
                     {
                         "term": {
                             "user_id": user_id
                         }
                     },
                     {
                         "regexp": {
                             "path": str(path) // For example "/example_folder/.*"
                         }
                     }
                 ]
             }
         }
     }
    
  2. I want to delete all files of the specified user from the specified folder. I try to do this in this way:

         doc = {
             "query": {
                 "bool": {
                     "must": [
                         {
                             "term": {
                                 "user_id": user_id
                             }
                         },
                         {
                             "term": {
                                 "path": str(path) // For example "/example_folder/example_file.jpg"
                             }
                         }
                     ]
                 }
             }
         }
    

Of course in both cases I use es.delete_by_query(index='files', body=doc). The problem is that both regexp and term do not work properly in these scenarios. I suspect that it may be related to incorrect field mapping like in this case https://stackoverflow.com/a/33053556/9644040. Does anyone have an idea how to solve this problem? I also would be very grateful for the information, if someone think any of these solutions is incorrect and should be done in a different and more optimal way. Thank you very much in advance!

Index mapping:

{
  "files": {
    "mappings": {
      "properties": {
        "text3": {
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          },
          "type": "text"
        },
        "path": {
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          },
          "type": "text"
        },
        "text2": {
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          },
          "type": "text"
        },
        "user_id": {
          "type": "long"
        },
        "text1": {
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          },
          "type": "text"
        }
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.