Parent and child fields in single document

Hi,

I am planning to map a relational table which has parent and child relationship into Elastic index.

SQL table Structure:

pk_id | parent_id | child_id |

100 | 1 | 2 |

101 | 2 | 3 |

102 | 3 | 4 |

103 | 5 | 6 |

104 | 6 | 7 |

What I understand from Join type in Elastic is, we need to create separate documents for parent and child and then relate them using join type. But even if I try to convert above table into linear data, I am not aware of how many exact children a parent can possess, in above example we have two relationship chains: 1-2-3-4 and 5-6-7..

Can you please suggest how this can be implemented in Elastic ?

The join datatype allows you to model 1:n relationships, but cannot do n:m relationships. From what I see in your example this would work.

That said the join datatype is basically a shortcut for indexing less data, that is more expensive at query time. If you have the possibility to merge all your data into a single JSON document and the means to update all the children when a parent gets changed, then that might actually be a better route to go.

If you have issues configuring the join datatype, please come up with what you tried in your mapping and how you are indexing your documents an a fully reproducible example.

Thanks!

1 Like

Thanks Alexander for the reply. :slight_smile:

As I have 1:N case, I created the below mapping:

put /rel_test_idx
{
   "mappings": {
    "properties": {
      "pk_id": {
        "type": "long"
      },
     "hier_id": {
        "type": "long"
      },
      "parent_child": {
        "type": "join",
        "relations": {
          "parent_id": "child_id"
        }
      }
    }
  }
}

Then as step1, I tried to load all the parent level information from the sql:

select  pk_id,
 parent_id, 
 from rel_table;

Step 2, next I tried to load all child records something like below, but I am now afraid that I will overwrite pk_id, as some of child ids can be parent as well, so it will overwrite the data..

SELECT pk_id,
child_id
from rel_table;

As I am not aware how depth the data is (it can 1-2-3 OR 4-5-6-7-8 , etc), I am confused to represent this relationship in elasticsearch.

If would have been easier for me if there was an option to link hierarchy between two fields of same document. Something like this:

"pk_id": { "type": "long" }, 
"parent_id": { "type": "long" },
 "child_id": { "type": "long" } ",
 "group_parent_child": { "type": "join", "relations": { "parent_id": "child_id" } }

Is there any sort of solution here ?

having deeper 1:n relationships requires you to index each document with the proper parent, while also use the routing key of the upmost parent to make sure everything gets stored in the same shard.

maybe you can explain your use-case. bit more what you are trying to model... always hoping for an alternative to this strict hierarchical structure.

1 Like

Okay, I manage account information. One account can have many sub-accounts and further sub-accounts may have their own children. So the depth is unknown, some larger accounts may have 7-8 levels.

Now, I want to generate a reporting system, where I can traverse top-down or bottom-up, based on a account number. I can take care of routing to same shard at logstash stage, but I am confused about how to represent this design in terms of elastic.

Adding my Oracle table structure again for reference:

pk_id | parent_account | child_account |

100 | 1                              | 2                       |

101 | 2                              | 3                       |

102 | 3                              | 4                       |

103 | 5                              | 6                       |

104 | 6                              | 7                        |

I see. Let me suggest completely different approach to this, if all that you are after is traversing all the accounts from a top account.

Check out the path_hierarchy tokenizer

What if you modeled your account structure in a single field like

"account_hierarchy": "1-2-3"

# another document
"account_hierarchy": "4-5-6-7-8"

That tokenizer is able to store the data in a way, that for 1-2-3, there are several tokens stored:

  • 1
  • 1-2
  • 1-2-3

This means, that searching for all accounts below the 1 account, all you would have to do is to execute a terms query like this

{
  "query" : {
    "term" : { "account_hierarchy":"1" } 
  }
}

getting all the accounts from a sub account?

{
  "query" : {
    "term" : { "account_hierarchy":"1-2-3" } 
  }
}

all this with out any joins on query time, so this is blazing fast. You 'only' need to change your index strategy and data modeling.

Full text search approaches sometimes require you to think a bit different then SQL, but the differences can be dramatic.

Hope this helps you to rethink a little bit how to structure and index your data :slight_smile:

2 Likes

Thanks Alexander for the wonderful solution!

This is totally a new dimension than the convention one and it surely helps. Only issue for me now is, using complex hierarchical SQL query at the source to pull data in this prescribed format through logstash, which I want to avoid considering source being a transactional SQL database.

My SQL query:

 SELECT pk_id, CONNECT_BY_ROOT parent_id
                            || '/'
                            || LTRIM (SYS_CONNECT_BY_PATH (child_id, '/'), '/') AS account_hierarchy
                    FROM account_tbl 
                   WHERE CONNECT_BY_ISLEAF = 1
                     AND LEVEL >= 2
              CONNECT BY NOCYCLE parent_id = PRIOR child_id

Hi Alexander,
Based on the output from path hierarchy we defined, I want to visualize the tree structure using Kibana. I see we have a Vega plugin for this in Kibana, but the examples I found in google are based on parent-child joins.

Can this be implemented using path hierarchy tokenizer option?

Thanks in Advance.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.