Opinion needed about Document design for recursive sub-objects/documents


(Luca Belluccini) #1

Hello,
I am approaching document design to ease the process on Elasticsearch.

I am using ES to extract stats from ES. The frontend is the well known
Kibana.
The main difference between the typical setup is:

  • I have almost 30M of lines each hour
  • I need to search not only for all log lines containing a value or
    matching a query, but all the "consequent" lines
    • E.g.: I search for a request containing a specific payload; I want
      to produce a facet not only on those lines, but also on lines generated by
      a request needed by the "master" one

My idea is to create a document tree:

  • SERVICE A Y
    • SERVICE B Y
      • SERVICE C Y
      • SERVICE D N
        • SERVICE E Y
        • SERVICE F Y
          • SERVICE G N

As result, I would like to be able to search for SERVICE D and get, without
generating any other query:

  • SERVICE D N
    • SERVICE E Y
    • SERVICE F Y
      • SERVICE G N

And be able to perform a facet on them:

  • Matched 4
  • Facet
    • Y 2
    • N 2

First attept, CHILD MAPPING.

Document structure:
{
"gcx": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"oid": "LON6X0100",
"sap": "1ASAP",
"chld": [
{
"trxnb": "44",
"t": "2013/09/28 11:39:01.123456",
"sn": "PT*",
"st": "C",
"app": "ROC",
"be": "RI",
"d": "OBE",
"chld": [
{
"trxnb": "44-1",
"t": "2013/09/28 11:39:01.223456",
"sn": "PT*",
"st": "C",
"app": "CPL",
"be": "PI",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1",
"t": "2013/09/28 11:39:01.323456",
"sn": "PT*",
"st": "C",
"app": "CPL",
"be": "ACU",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1-1",
"t": "2013/09/28 11:39:01.423456",
"sn": "PEAUDQ",
"st": "E",
"app": "ELT",
"be": "MPP",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1-1-1",
"t": "2013/09/28 11:39:01.523456",
"sn": "PNRADD",
"st": "E",
"app": "ROC",
"be": "DI",
"d": "TPF",
"chld": [

                  ]
                },
                {
                  "trxnb": "44-1-1-1-2",
                  "t": "2013/09/28 11:39:01.623456",
                  "sn": "TFOPCQ",
                  "st": "E",
                  "app": "FOP",
                  "be": "FPP",
                  "d": "OBE",
                  "chld": [
                    
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

]
}
Mapping
{
"template": "aggregate_index*",
"mappings": {
"default" : {
"dynamic_templates": [
{
"chld_template" : {
"mapping": { "type": "nested" },
"path_match": "*.chld"
}
}
]
}
}
}

Nested queries fail due to some parsing issue. Furthermore, they are
designed to return the parent object.

Attempt 2, PARENT-CHILD mapping:
{
"template": "aggregate_index*",
"mappings": {
"aggregate": {
"_parent": {
"type": "aggregate"
},
"_routing": {
"required": true
}
}
}
}

I am unable to index, since the parent is the object itself and it is
recursive.

Any advice?

I need to implement my custom query and add it to ES?

Best regards,
Luca

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Martijn Van Groningen) #2

What parsing error did occur when you were using nested? Can you also check
the mapping of the nested object fields are actually created based on your
dynamic template? I think there is no need to use path match, match=chld*
should be sufficient.

On 29 September 2013 21:09, Luca Belluccini lucabelluccini@gmail.comwrote:

Hello,
I am approaching document design to ease the process on Elasticsearch.

I am using ES to extract stats from ES. The frontend is the well known
Kibana.
The main difference between the typical setup is:

  • I have almost 30M of lines each hour
  • I need to search not only for all log lines containing a value or
    matching a query, but all the "consequent" lines
    • E.g.: I search for a request containing a specific payload; I
      want to produce a facet not only on those lines, but also on lines
      generated by a request needed by the "master" one

My idea is to create a document tree:

  • SERVICE A Y
    • SERVICE B Y
      • SERVICE C Y
      • SERVICE D N
        • SERVICE E Y
        • SERVICE F Y
          • SERVICE G N

As result, I would like to be able to search for SERVICE D and get,
without generating any other query:

  • SERVICE D N
    • SERVICE E Y
    • SERVICE F Y
      • SERVICE G N

And be able to perform a facet on them:

  • Matched 4
  • Facet
    • Y 2
    • N 2

First attept, CHILD MAPPING.

Document structure:
{
"gcx": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"oid": "LON6X0100",
"sap": "1ASAP",
"chld": [
{
"trxnb": "44",
"t": "2013/09/28 11:39:01.123456",
"sn": "PT*",
"st": "C",
"app": "ROC",
"be": "RI",
"d": "OBE",
"chld": [
{
"trxnb": "44-1",
"t": "2013/09/28 11:39:01.223456",
"sn": "PT*",
"st": "C",
"app": "CPL",
"be": "PI",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1",
"t": "2013/09/28 11:39:01.323456",
"sn": "PT*",
"st": "C",
"app": "CPL",
"be": "ACU",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1-1",
"t": "2013/09/28 11:39:01.423456",
"sn": "PEAUDQ",
"st": "E",
"app": "ELT",
"be": "MPP",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1-1-1",
"t": "2013/09/28 11:39:01.523456",
"sn": "PNRADD",
"st": "E",
"app": "ROC",
"be": "DI",
"d": "TPF",
"chld": [

                  ]
                },
                {
                  "trxnb": "44-1-1-1-2",
                  "t": "2013/09/28 11:39:01.623456",
                  "sn": "TFOPCQ",
                  "st": "E",
                  "app": "FOP",
                  "be": "FPP",
                  "d": "OBE",
                  "chld": [

                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

]
}
Mapping
{
"template": "aggregate_index*",
"mappings": {
"default" : {
"dynamic_templates": [
{
"chld_template" : {
"mapping": { "type": "nested" },
"path_match": "*.chld"
}
}
]
}
}
}

Nested queries fail due to some parsing issue. Furthermore, they are
designed to return the parent object.

Attempt 2, PARENT-CHILD mapping:
{
"template": "aggregate_index*",
"mappings": {
"aggregate": {
"_parent": {
"type": "aggregate"
},
"_routing": {
"required": true
}
}
}
}

I am unable to index, since the parent is the object itself and it is
recursive.

Any advice?

I need to implement my custom query and add it to ES?

Best regards,
Luca

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Luca Belluccini) #3

Using above example, you'll have no way to get data.
I tried to change the data model:

curl -XPUT http://NCE00259:9200/_template/aggregate_template -d'
{
"template": "aggregate_index*",
"mappings": {
"aggregate": {
"properties" : {
"chld" : {
"type" : "nested",
"include_in_parent" : false
}
}
}
}
}
'

curl -XPOST http://NCE00259:9200/aggregate_index/aggregate -d'
{
"gcx": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"oid": "LON6X0100",
"sap": "1ASAP",
"chld": [
{
"trxnb": "44",
"t": "2013/09/28 11:39:01.123456",
"sn": "PT*",
"st": "C",
"app": "ROC",
"be": "RI",
"d": "OBE"
},
{
"trxnb": "44-1",
"t": "2013/09/28 11:39:01.223456",
"sn": "PT*",
"st": "C",
"app": "CPL",
"be": "PI",
"d": "OBE"
},
{
"trxnb": "44-1-1",
"t": "2013/09/28 11:39:01.323456",
"sn": "PT*",
"st": "C",
"app": "CPL",
"be": "ACU",
"d": "OBE"
},
{
"trxnb": "44-1-1-1",
"t": "2013/09/28 11:39:01.423456",
"sn": "PEAUDQ",
"st": "E",
"app": "ELT",
"be": "MPP",
"d": "OBE"
},
{
"trxnb": "44-1-1-1-1",
"t": "2013/09/28 11:39:01.523456",
"sn": "PNRADD",
"st": "E",
"app": "ROC",
"be": "DI",
"d": "TPF"
},
{
"trxnb": "44-1-1-1-2",
"t": "2013/09/28 11:39:01.623456",
"sn": "TFOPCQ",
"st": "E",
"app": "FOP",
"be": "FPP",
"d": "OBE"
}
]
}
'

But this kind of setup returns the whole document, not the chld document.

Il giorno lunedì 30 settembre 2013 09:19:55 UTC+2, Martijn v Groningen ha
scritto:

What parsing error did occur when you were using nested? Can you also
check the mapping of the nested object fields are actually created based on
your dynamic template? I think there is no need to use path match,
match=chld* should be sufficient.

On 29 September 2013 21:09, Luca Belluccini <lucabel...@gmail.com<javascript:>

wrote:

Hello,
I am approaching document design to ease the process on Elasticsearch.

I am using ES to extract stats from ES. The frontend is the well known
Kibana.
The main difference between the typical setup is:

  • I have almost 30M of lines each hour
  • I need to search not only for all log lines containing a value or
    matching a query, but all the "consequent" lines
    • E.g.: I search for a request containing a specific payload; I
      want to produce a facet not only on those lines, but also on lines
      generated by a request needed by the "master" one

My idea is to create a document tree:

  • SERVICE A Y
    • SERVICE B Y
      • SERVICE C Y
      • SERVICE D N
        • SERVICE E Y
        • SERVICE F Y
          • SERVICE G N

As result, I would like to be able to search for SERVICE D and get,
without generating any other query:

  • SERVICE D N
    • SERVICE E Y
    • SERVICE F Y
      • SERVICE G N

And be able to perform a facet on them:

  • Matched 4
  • Facet
    • Y 2
    • N 2

First attept, CHILD MAPPING.

Document structure:
{
"gcx": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"oid": "LON6X0100",
"sap": "1ASAP",
"chld": [
{
"trxnb": "44",
"t": "2013/09/28 11:39:01.123456",
"sn": "PT*",
"st": "C",
"app": "ROC",
"be": "RI",
"d": "OBE",
"chld": [
{
"trxnb": "44-1",
"t": "2013/09/28 11:39:01.223456",
"sn": "PT*",
"st": "C",
"app": "CPL",
"be": "PI",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1",
"t": "2013/09/28 11:39:01.323456",
"sn": "PT*",
"st": "C",
"app": "CPL",
"be": "ACU",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1-1",
"t": "2013/09/28 11:39:01.423456",
"sn": "PEAUDQ",
"st": "E",
"app": "ELT",
"be": "MPP",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1-1-1",
"t": "2013/09/28 11:39:01.523456",
"sn": "PNRADD",
"st": "E",
"app": "ROC",
"be": "DI",
"d": "TPF",
"chld": [

                  ]
                },
                {
                  "trxnb": "44-1-1-1-2",
                  "t": "2013/09/28 11:39:01.623456",
                  "sn": "TFOPCQ",
                  "st": "E",
                  "app": "FOP",
                  "be": "FPP",
                  "d": "OBE",
                  "chld": [
                    
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

]
}
Mapping
{
"template": "aggregate_index*",
"mappings": {
"default" : {
"dynamic_templates": [
{
"chld_template" : {
"mapping": { "type": "nested" },
"path_match": "*.chld"
}
}
]
}
}
}

Nested queries fail due to some parsing issue. Furthermore, they are
designed to return the parent object.

Attempt 2, PARENT-CHILD mapping:
{
"template": "aggregate_index*",
"mappings": {
"aggregate": {
"_parent": {
"type": "aggregate"
},
"_routing": {
"required": true
}
}
}
}

I am unable to index, since the parent is the object itself and it is
recursive.

Any advice?

I need to implement my custom query and add it to ES?

Best regards,
Luca

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Martijn Van Groningen) #4

Yes, with the nested type and query, you only get back the complete doc and
not the individual inner objects.

In that I would look into the using the _parent field instead. So index
each chld inner object as a single child document and index the rest of
the document as parent document. You mentioned in your first email that you
were unable to index because of recursiveness. Can you elaborate a bit more
on that? I see that you have a hierarchy, but this doesn't have to be an
issue.

On 30 September 2013 10:55, Luca Belluccini lucabelluccini@gmail.comwrote:

Using above example, you'll have no way to get data.
I tried to change the data model:

curl -XPUT http://NCE00259:9200/_template/aggregate_template -d'

{
"template": "aggregate_index*",
"mappings": {
"aggregate": {

  "properties" : {

"chld" : {
"type" : "nested",
"include_in_parent" : false
}
}
}
}
}
'

curl -XPOST http://NCE00259:9200/aggregate_index/aggregate -d'

{
"gcx": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"oid": "LON6X0100",
"sap": "1ASAP",
"chld": [
{
"trxnb": "44",
"t": "2013/09/28 11:39:01.123456",
"sn": "PT*",
"st": "C",
"app": "ROC",
"be": "RI",
"d": "OBE"
},

{
  "trxnb": "44-1",
  "t": "2013/09/28 11:39:01.223456",
  "sn": "PT*",
  "st": "C",
  "app": "CPL",
  "be": "PI",
  "d": "OBE"
},

{
  "trxnb": "44-1-1",
  "t": "2013/09/28 11:39:01.323456",
  "sn": "PT*",
  "st": "C",
  "app": "CPL",
  "be": "ACU",
  "d": "OBE"
},

{
  "trxnb": "44-1-1-1",
  "t": "2013/09/28 11:39:01.423456",
  "sn": "PEAUDQ",
  "st": "E",
  "app": "ELT",
  "be": "MPP",
  "d": "OBE"
},

{
  "trxnb": "44-1-1-1-1",
  "t": "2013/09/28 11:39:01.523456",
  "sn": "PNRADD",
  "st": "E",
  "app": "ROC",
  "be": "DI",
  "d": "TPF"
},

{
  "trxnb": "44-1-1-1-2",
  "t": "2013/09/28 11:39:01.623456",
  "sn": "TFOPCQ",
  "st": "E",
  "app": "FOP",
  "be": "FPP",
  "d": "OBE"
}

]
}
'

But this kind of setup returns the whole document, not the chld document.

Il giorno lunedì 30 settembre 2013 09:19:55 UTC+2, Martijn v Groningen ha
scritto:

What parsing error did occur when you were using nested? Can you also
check the mapping of the nested object fields are actually created based on
your dynamic template? I think there is no need to use path match,
match=chld* should be sufficient.

On 29 September 2013 21:09, Luca Belluccini lucabel...@gmail.com wrote:

Hello,
I am approaching document design to ease the process on Elasticsearch.

I am using ES to extract stats from ES. The frontend is the well known
Kibana.
The main difference between the typical setup is:

  • I have almost 30M of lines each hour
  • I need to search not only for all log lines containing a value or
    matching a query, but all the "consequent" lines
    • E.g.: I search for a request containing a specific payload; I
      want to produce a facet not only on those lines, but also on lines
      generated by a request needed by the "master" one

My idea is to create a document tree:

  • SERVICE A Y
    • SERVICE B Y
      • SERVICE C Y
      • SERVICE D N
        • SERVICE E Y
        • SERVICE F Y
          • SERVICE G N

As result, I would like to be able to search for SERVICE D and get,
without generating any other query:

  • SERVICE D N
    • SERVICE E Y
    • SERVICE F Y
      • SERVICE G N

And be able to perform a facet on them:

  • Matched 4
  • Facet
    • Y 2
    • N 2

First attept, CHILD MAPPING.

Document structure:
{
"gcx": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"oid": "LON6X0100",
"sap": "1ASAP",
"chld": [
{
"trxnb": "44",
"t": "2013/09/28 11:39:01.123456",
"sn": "PT*",
"st": "C",
"app": "ROC",
"be": "RI",
"d": "OBE",
"chld": [
{
"trxnb": "44-1",
"t": "2013/09/28 11:39:01.223456",
"sn": "PT*",
"st": "C",
"app": "CPL",
"be": "PI",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1",
"t": "2013/09/28 11:39:01.323456",
"sn": "PT*",
"st": "C",
"app": "CPL",
"be": "ACU",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1-1",
"t": "2013/09/28 11:39:01.423456",
"sn": "PEAUDQ",
"st": "E",
"app": "ELT",
"be": "MPP",
"d": "OBE",
"chld": [
{
"trxnb": "44-1-1-1-1",
"t": "2013/09/28 11:39:01.523456",
"sn": "PNRADD",
"st": "E",
"app": "ROC",
"be": "DI",
"d": "TPF",
"chld": [

                  ]
                },
                {
                  "trxnb": "44-1-1-1-2",
                  "t": "2013/09/28 11:39:01.623456",
                  "sn": "TFOPCQ",
                  "st": "E",
                  "app": "FOP",
                  "be": "FPP",
                  "d": "OBE",
                  "chld": [

                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

]
}
Mapping
{
"template": "aggregate_index*",
"mappings": {
"default" : {
"dynamic_templates": [
{
"chld_template" : {
"mapping": { "type": "nested" },
"path_match": "*.chld"
}
}
]
}
}
}

Nested queries fail due to some parsing issue. Furthermore, they are
designed to return the parent object.

Attempt 2, PARENT-CHILD mapping:
{
"template": "aggregate_index*",
"mappings": {
"aggregate": {
"_parent": {
"type": "aggregate"
},
"_routing": {
"required": true
}
}
}
}

I am unable to index, since the parent is the object itself and it is
recursive.

Any advice?

I need to implement my custom query and add it to ES?

Best regards,
Luca

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5