How do I combine fields from 2 different docs (and indices) within Kibana?

Hello,

We are trying to get an index with docs combining fields from 2 different (already existing) indices. We tried using the transform, but got an index with both docs, still separate however. We are not sure if we used this transform function the right way, so this might still be a possibility. imgur image of our transform attempt

I already tried creating an index pattern which includes both the indices I need the fields from. In visualizations the fields I need are appearing, but I can't filter data from fields from index a on a fields from index b (within this same index pattern). It will show no data when I try this.

Are there some things we need to change in our transform, or are there different ways to achieve the aforementioned goal (an index with docs which have fields from the 2 already exisiting indices we have)?

We are on Kibana v7.8.0. If any other information is needed, let me know!
Kind regards.

Hi @Luukv ,
Can you please share sample data from both indexes and mention fields which you need combine in one log

Hello, thanks for you response. See below for the samples.

Index 1, Clicks:
_id
id number

_index
company_click_out

_score
0

_type
_doc

click_box_title
specific category

click_out_id
click id number

click_price
1.00

clickbox_category_id
366

company_id
1111

company_location_city
Utrecht

company_location_city_keyword
Company A Utrecht

company_location_country
ENG

company_location_geo_point
14.422657, 17.94846

company_location_id
85 642

company_location_name
Franchise A

company_location_postcode
43253

company_name
Company A

created_at
Timefield when the doc is added

day_of_week_keyword
monday


hour_of_day
1

is_free_click_int
0


is_paid_click_int
1

missed_revenue
0,00

monetization_rate
100%

month_keyword
january

number_of_clicks
1

occurrences_times

original_price
1.00

paid_clicks_percentage
100%

parent_category_title
Parent category

type
booking

updated_at
Jan 1, 2020, @ 00:00:00

visit_id
id of the session

visitor_ip_address
ip address of the user

visitor_ip_address_string
ip address of the user in string type

visitor_user_agent
device information of the user

week_number
1


-- Above is index a, Clicks, below this line starts index b, conversions.

Index 2, conversion:


_id
123

_index
company_conversion

_score
0

_type
_doc

clickbox_category_id
123

clickbox_category_name
Category

conversion_id
123

created_at
Jun 4, 2021 @ 14:00:00.000

experiments
{
String with some experimental data
}

offer_date_from
Start date of the offer

offer_date_until
End date of the offer

permanent_visitor_id
27346278_a424b

rental_item_id
1435

rental_item_name
specific item name

rental_service_provider_id
23452

rental_service_provider_location_id
23552

rental_service_provider_location_name
Company A Utrecht

rental_service_provider_name
Franchise A - Company A Utrecht

type
booking

updated_at
Jun 4, 2021 @ 14:00:00.000

visit_id
id of the session

The bold text is the field with some sample data which I would like in the combined doc. These are from the 2 indices, clicks and conversions, both have the field 'visit_id', which might add some possibilities when it comes to combining docs.

Hope this information is an answer to your request and you can help us out!

If someone were to come across this post unsure if it's still relevant/active, please do share your thoughts, tips or maybe some sources which might be relevant I could check out. This issue has not been resolved and is holding me back from setting up the dashboards we need, so any help is appreciated!

Are you getting 2 documents per visit_id?

This is means you have a mapping problem for index a and b. Check the index mappings for both. From the screenshot it seems you have a multi-field that maps visit_id.keyword to keyword. Does it look the same for both?

Or: Are you getting the 1 document but from the scripted metric you are using, you get a list of output documents:

    "conversions": [
    {
        # doc from index a
    },
    {
        # doc from index b
    }
    ]

This is a problem of your used script, you are collapsing the docs in a list.

A generic way of a join with a scripted_metric:

        "scripted_metric": {
          "init_script": "state.join = new HashMap()",
          "map_script": "String[] fields = new String[] {'click_box_title', 'click_price', 'clickbox_category_name'}; for (e in fields) { if (doc.containsKey(e) && doc[e].size() > 0) {state.join.put(e, doc[e])}}",
          "combine_script": "return state.join",
          "reduce_script": "String[] fields = new String[] {'click_box_title', 'click_price', 'clickbox_category_name'}; Map j=new HashMap(); for (s in states) {for (e in fields) { if (s.containsKey(e)) {j.put(e, s[e].get(0))}}} return j;"
        }

I only took 3 of your fields as an example, please add the remaining field names yourself.

Alternatively you can use aggregations like min, max for numeric or date fields like this:

"offer_date_from": {
    "min": {
        "field": "offer_date_from"
    }
}

For keyword fields there is unfortunately no easy way at the moment (support for top_metrics - available in a future release - will solve this gap). That means for keyword you have to use scripted_metric.

For fields that are available and have the same value in both indices like type you can put it into the group_by section.

Hello, thanks for your response! I'm not sure if I am misunderstanding what you're saying or that maybe we're not completely on the same page. I'll try to explain the situation a bit more clear, as I see my initial post is a bit messy, apologies for that.

We have 2 indices, clicks and conversions. Both of these indices contain the visit_id field. In our case not all clicks turn into conversions, therefore the clicks index will have more docs, and thus visit_id records. For the clicks that do turn into a conversion, a doc is created in the conversions index, with the visit_id of said conversion.

There is no data regarding clicks in the conversion index, and no data regarding conversions in the click index. The clicks index creates docs with fields as listed above, and the same goes for the conversions index respectively. So both these indices have 1 doc structure each, I listed this structure for both indices above (the sample data).

What we are looking for is a 3rd index with a new doc structure, which has fields from both the clicks index as the conversions index. In the sample data above, there are some fields of whicht the text I made bold, these are the fields that we want in the new, 3rd index.

If your post is a solution to the request I described above, I am misunderstanding you. In this case, could you please elaborate what a 'scripted_metric' exactly is, and what the output of the script you shared in your response would be?

The solution I provided is based on your image from the 1st post: Imgur: The magic of the Internet

Maybe you can share your config as text, than I can reuse it and mark the parts.

There you group on visit_id and you use a scripted_metric aggregation to collapse fields. The script concatenates the docs from both indices in a list, see my example. Maybe you can post the output of the _preview command to confirm that.

That's why I think you want 1 structure with 1 value for each field, either from index a or b. The script I provided is a replacement for the scripted metric of the transform in the image. It takes the 1st occurrence of a value of a field and puts it into the output structure.

A possible output:

    "unique_id": "a", 
    "conversions": {
        "click_box_title": "title",
        "click_price": 22,
        ...
    }

In case a field is not found (no conversion for a click), the field isn't created.

I repeat what I said about numeric/date fields, you can use a min or max aggregation instead of a scripted_metric for those fields.

POST _transform/_preview
{
"source": {...},
"pivot": {
    "group_by": {
        "unique-id": {...}
    },
    "aggregations": {
        "offer_date_from": {
            "min": {
                "field": "offer_date_from"
            }
        },
        "original_price": {
            "min": {
                "field": "original_price"
            }
        }
    }
}

should return 1 document with 2 joined fields from a and b and the field unique-id.

Only for non-numeric fields you need the scripted_metric.

@Hendrik_Muhs Thanks for the response, I would like to discuss what you mentioned with a colleague who is unavailable at the moment. I'll get back to you asap.