Question about multiple inputs

Hi, I'm migrating orders and their itemsin a joined SQL and using aggregate filter to list items as a nested field under each order. Each document's id is the respective orders table id.

Now, in another table I have order_shipments and another one for order_history.
Both tables have an order_id column that I want to find in my index and add shipments and history nested fields.

I know you can have multiple jdbc {} blocks, but would I be able to make such an update to the existing documents in the index? If now, how can I achieve this?

When you define the elasticseach output you can also set the "_id" field manually. If you insert a document with an "_id" that already exists then it will be updated.

https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-document_id

Also, please try not to use nested fields. They just created headaches.

Thanks for the hint! In this case though, I've got a JSON column called order_metadata for example, which is a growing/shrinking list of key values. How would you implement something like that without the nested type?

Another example is, order_notes or discount_list their values from the main storage look like:
[{"name":"Discount","code":"","amount":"22.00"}]

I'm not saying you can't use nested field, it just makes dealing with Elasticsearch and Kibana more difficult as you have to create nested data types and do nested aggregations and if you don't you can get strange results.

What happens in the background is that lucene flattens the JSON object and you lose the relationship between the fields in the array since each of they keys and values are put into their own arrays. So when you go to search you may get more results of documents. For example:

You have two documents:

{
"transaction":"ABC123",
"discount" : [
    {"name":"DiscountA","code":"","amount":"4.00"},
    {"name":"Coupon","code":"","amount":"22.00"}
    ]
}

{
"transaction":"DEF456",
"discount" : [
    {"name":"DiscountB","code":"","amount":"22.00"}
    ]
}

If you write a query:

{
"query": {
          "bool": {
                  "must": [
                           { "match": { "discount.name": "Discount" } },
                           { "match": { "discount.amount": "22.00" } }
                    ]
              }
      }
}

You would expect to only get document DEF456 but instead you will get both documents. Try it.

I see your point, indeed. The kind of queries we would make use of this field would always include the top level document's id btw. (order id) (in your example it's equal to having the "transaction_id" in the query and we would be more interested in counting the total of the "amount" field you see there for example, or distinct "codes" found for a document.

I guess what we could do as another option would be saving a custom top level field for the discount totals per document? Not sure about the "code" => "amount" relationship though...

Another very common search we do is looking for a specific key's value in the order_metadata, which looks like this:
{"custom":"foo","another":"835-407-ABC","shipped":"2019-09-25 17:12:26"}

The keys for this field differ in every document. We'd even need to raise the default 10K limit I think because we allow any amount of key-value pairs for this.

We would be searching for example; whether "custom" was set as "foo" or whether the "shipped" exists for a given document id.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.