Grouping and counting with filters using transformations API

I have created these two transformations:

query = {
  "source": {
    "index": "posts",
    "query": {
        "bool": {
          "filter": {
              "match" : {"PostTypeId": "1"}
          }
        }
    }
  },
  "dest": {
    "index": "rq3"
  },
  "pivot": {
    "group_by": {
      "OwnerUserId": {
        "terms": {
          "field": "OwnerUserId"
        }
      }
    },
    "aggregations": {
      "count_question": {
        "value_count": {
          "field": "OwnerUserId"
        }
      }
    }
  }
}

and

query = {
  "source": {
    "index": "posts",
    "query": {
        "bool": {
          "filter": {
              "match" : {"PostTypeId": "2"}
          }
        }
    }
  },
  "dest": {
    "index": "rq3"
  },
  "pivot": {
    "group_by": {
      "OwnerUserId": {
        "terms": {
          "field": "OwnerUserId"
        }
      }
    },
    "aggregations": {
      "count_answer": {
        "value_count": {
          "field": "OwnerUserId"
        }
      }
    }
  }
}

They both add documents to the same index "rq3" and perform a "group_by" and "count" action over an attribute called "OwnerUserId". However, the first transformation only performs this actions over the documents with "PostTypeId" equals to 1 and the second, only over "PostTypeId" equals to 2.

The problem is that, when I review the "rq3" index, there are not "OwnerUserId" with "count_questions" and with "count_answers" (and should be). It seems like the second transformation is updating the repeated "OwnerUserId" and removing those equals to the first transformation.

How can I count "PostTypeId" equals to 1 and "PostTypeId" equals to 2 and relating these two values with their "OwnerUserId" using transformations?

Hi,

2 transforms should not write into the same index. As it happened for you: they might overwrite documents from each other. In addition to that, transform always writes new documents, it can not update old ones. You should use 1 transform instead.

To implement your usecase you need to use a scripted metric aggregation. A similar usecase to yours is available on the transform examples page. Have a look how responses.counts works. I think you should be able to do something similar for count_answer and count_question.

We are working on improving the transform API, adding support for filter aggregations - which I think would be the easiest solution for your usecase - is on our backlog.

I tried my best to apply that example to my use case, but I could not make the transform work. It is because I want to count not only each type of post (answer or question) but each type of post per user (number of answers of user 1, number of questions of user 1, number of answers of user 2, number of questions of user 2 ... Can you show me an example of how my use case could be solved? @Hendrik_Muhs

@Sergio_Blanco

Did you find a solution? Due to holidays I had no time to look into it. The solution I had in mind was to create 1 counter for answers, 1 for questions and use a if on PostTypeId to increment them. The example for response codes is similar to that.

I tried to do something similar to that example, but I could not make it work. @Hendrik_Muhs

Hi,

see attached example which hopefully helps you to progress on this:


PUT /userqanda
{
  "mappings": {
    "properties": {
      "PostTypeId": {
        "type": "long"
      },
      "OwnerUserId": {
        "type": "keyword"
      },
      "question": {
        "type": "text"
      },
      "answer": {
        "type": "text"
      }
    }
  }
}

# questions
PUT userqanda/_doc/1
{
  "PostTypeId": 1,
  "OwnerUserId": "user-1",
  "question": "Where are my glasses?"
}

PUT userqanda/_doc/2
{
  "PostTypeId": 1,
  "OwnerUserId": "user-1",
  "question": "Where are my socks?"
}

PUT userqanda/_doc/3
{
  "PostTypeId": 1,
  "OwnerUserId": "user-2",
  "question": "Where should we eat breakfast?"
}

PUT userqanda/_doc/4
{
  "PostTypeId": 1,
  "OwnerUserId": "user-2",
  "question": "Where should we eat lunch?"
}

PUT userqanda/_doc/5
{
  "PostTypeId": 1,
  "OwnerUserId": "user-2",
  "question": "Where should we eat dinner?"
}

# answers
PUT userqanda/_doc/11
{
  "PostTypeId": 2,
  "OwnerUserId": "user-1",
  "answer": "In the bathroom."
}

PUT userqanda/_doc/12
{
  "PostTypeId": 2,
  "OwnerUserId": "user-1",
  "question": "Lets take the burger place"
}

PUT userqanda/_doc/13
{
  "PostTypeId": 2,
  "OwnerUserId": "user-3",
  "question": "42"
}

PUT userqanda/_doc/14
{
  "PostTypeId": 2,
  "OwnerUserId": "user-3",
  "question": "77"
}

PUT userqanda/_doc/15
{
  "PostTypeId": 2,
  "OwnerUserId": "user-3",
  "question": "999"
}

POST _transform/_preview
{
  "source": {
    "index": "userqanda"
  },
  "pivot": {
    "group_by": {
      "OwnerUserId": {
        "terms": {
          "field": "OwnerUserId"
        }
      }
    },
    "aggregations": {
      "total_q_and_a": {
        "value_count": {
          "field": "OwnerUserId"
        }
      },
      "q_and_a": {
        "scripted_metric": {
          "init_script": "state.qanda = ['questions':0L,'answers':0L]",
          "map_script": """
            def postType = doc['PostTypeId'].value;
            if (postType == 1L) {
              state.qanda.questions += 1 ;
            } else if(postType == 2L) {
              state.qanda.answers+= 1;
            }
            """,
          "combine_script": "state.qanda",
          "reduce_script": """
            def counts = ['questions':0L,'answers':0L];
            for (qanda in states) {
              counts.questions += qanda['questions'];
              counts.answers += qanda['answers'];
            }
            return counts;
            """
        }
      }
    }
  }
}

Hope that helps!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.