Elasticsearch query for user data overrides

I need to design a query which can support user specific document edits. The document below describes one way to store this data. The document below includes a root document Description property. The root document Description property should searched by all users, except for Eric. For Eric, the Description property has been customized, and a search query executed by that user should search his custom Description field data, within the nested UserData array. A search query executed by Alex should search the root document Description field, since Alex has not customized that field.

For my use case, users may customize 0 or up to 50 root document properties. For any root document property which a user has customized, only the custom value for that property should be searched for that user. Each document property will need to support up to 3000 user customizations.

Sample document

POST index1/_doc/TestDoc1
{ 
  "Name": "TestDoc1 Base Name",
  "Description":"TestDoc1 Base Desc abc",
  "Spec":"TestDoc1 Base Spec",
   ...up to 50 more fields
  "UserData":[
  {
    "Owner": "Eric",
    "Description": "Desc entered by Eric def",
    "Spec":"Eric's custom spec"
  },
  {
    "Owner": "Alex",
    "Spec": "Spec entered by Alex"
    ...notice that Alex did not customize the Description field
  },
  ....up to 3000 more user customizations]
}

If user Eric were to search for text "abc", no result should be returned because user Eric has customized the Description field, overwriting the base description text which includes "abc". If user Eric searches for text "def", then a result should be returned, since "def" is found in Eric's custom Description.

I posted this question in stackoverflow, and myself, along with one other, have been suggesting some solutions. https://stackoverflow.com/questions/59524627/elasticsearch-query-for-user-data-overrides

At the moment, I'm investigating an approach which uses the document structure above, using nested fields. The query below appears to be giving me the results that I need. Next, I'm testing to determine if this will scale up to my requirements, mentioned above.

GET index1/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "nested" : {
                "path" : "UserData",
                "query": {
                  "match": {
                    "UserData.Owner": "Eric"
                  }}}},
              {
                "nested" : {
                "path" : "UserData",
                "query": { 
                  "match": {
                  "UserData.Description": "abc"
                }}}}
            ]
        }},
        {
          "bool": {
            "must_not": [
              {
                "nested" : {
                "path" : "UserData",
                "query": { 
                "match": {
                  "UserData.Owner": "Eric"
                }}}},
              {
                "nested" : {
                "path" : "UserData",
                "query": { 
                "match": {
                  "UserData.Description": "*"
                }}}}],
            "must": [
            {
               "match": {
                  "Description": "abc"
            }}]}}
      ]}}
}
1 Like

Welcome to the forums! I think this nested fields approach is very promising. Please let us know if it has trouble with scale.

-William

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.