Combining three different indices into one available to be queried by ODBC with Tableau

Hello,

I have three different indices called: first, second, third.
first is the main index, second references to main and third references to second index using id fields.
The sample data looks like this:

first - 2 sample documents (two different id's)

{"first.id": 1,
"first.desc": "first_1"}

{"first.id": 2,
"first.desc": "first_2"}

second - 2 sample documents (referening to two different first.id)

{"second.id": 1,
"second.desc": "second_1",
"first.id": 1}

{"second.id": 2,
"second.desc": "second_2",
"first.id": 2}

third - 2 sample documents (refering to same second.id document)

{"third.id": 1,
"second.desc": "third_1",
"second.id": 1}

{"third.id": 2,
"second.desc": "third_2",
"second.id": 1}

I would like to automaticaly create an index that would contain documents from all three indices and is automatically updated whenever new document is indexed in any of the indices. Sample of such index would contain below records:

combined index - would contain 3 documents in total.

{"first.id": 1,
"first.desc": "first_1",
"second.id": 1,
"second.desc": "second_1",
"third.id": 1,
"second.desc": "third_1"}

{"first.id": 1,
"first.desc": "first_1",
"second.id": 1,
"second.desc": "second_1",
"third.id": 2,
"second.desc": "third_2"}

{"first.id": 2,
"first.desc": "first_2",
"second.id": 2,
"second.desc": "second_2"}

This is to make sure that our data is denormalized so that it can be pulled by Tableau using ODBC.
Is that possible using existing mechanism?

You might be able to do that with https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-apis.html

Thank you @warkolm,

I've looked at the link you've sent and came up with simple transformation using this data:

PUT first
{"mappings":{"properties":{"first_id":{"type":"short"},"first_desc":{"type":"keyword"}}}}
PUT second
{"mappings":{"properties":{"second_id":{"type":"short"},"second_desc":{"type":"keyword"},"first_id":{"type":"short"}}}}
PUT third
{"mappings":{"properties":{"third_id":{"type":"short"},"third_desc":{"type":"keyword"},"second_id":{"type":"short"}}}}    
PUT combined_index
{"mappings":{"properties":{"first_id":{"type":"short"},"first_desc":{"type":"keyword"},"second_id":{"type":"short"},"second_desc":{"type":"keyword"},"third_id":{"type":"short"},"third_desc":{"type":"keyword"}}}}

Sample data:

POST _bulk
{"index": {"_index": "first", "_id": "1"}}
{"first_id": 1, "first_desc": "first_1"}
{"index": {"_index": "first", "_id": "2"}}
{"first_id": 2, "first_desc": "first_2"}
{"index": {"_index": "second", "_id": "1"}}
{"second_id": 1, "second_desc": "second_1", "first_id": 1}
{"index": {"_index": "second", "_id": "2"}}
{"second_id": 2, "second_desc": "second_2", "first_id": 2}
{"index": {"_index": "third", "_id": "1"}}
{"third_id": 1, "third_desc": "third_1", "second_id": 1}
{"index": {"_index": "third", "_id": "2"}}
{"third_id": 2, "third_desc": "third_2", "second_id": 1} 

I created this basic transformation:

PUT _transform/test_combined
{
  "source": {
    "index": ["first", "second", "third"]
  },
  "pivot": {
    "group_by": {
      "test": {
        "terms": {
          "field": "first_id"
        }
      }
    },
    "aggregations": {
      "min_id": {
        "min": {
          "field": "first_id"
        }
      }
    }
  },
  "description": "Transform test",
  "dest": {
    "index": "combined_index"
  }
}

(there is an error in this transform related to my fields, but ill fix that by expanding the index with additional fields)
Here is the result of GET combined_index/_search

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "combined_index",
        "_type" : "_doc",
        "_id" : "AGvuZWuqqz7c5ytICzX5Z74AAAAAAAAA",
        "_score" : 1.0,
        "_source" : {
          "min_id" : 1.0,
          "test" : 1
        }
      },
      {
        "_index" : "combined_index",
        "_type" : "_doc",
        "_id" : "AA3tqz9zEwuio1D73_EArycAAAAAAAAA",
        "_score" : 1.0,
        "_source" : {
          "min_id" : 2.0,
          "test" : 2
        }
      }
    ]
  }
}

Now I'm trying to figure out how exacty match values from other indices based on id field. I haven't found anything so far, except for this this code from ML webinar: https://gist.github.com/stevedodson/c80d245a5dc6ae8cc93e9f9d25897aef
But understanding how Transform raw data into entity-centric index works exactly, is difficult to grasp without some baseline.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.