How to index an interview

(András Hinkel) #1

Hi all,

I would like to index interviews into Elasticsearch.

The interview is made up of paragraphs, a paragraph can be said by the interviewer or the interviewee. My problem is that I am only interested in what the interviewee says, I don't want to index the questions of the interviewer.

As far as I understand, I can achieve this if I add "index":"no" to a given property when creating a mapping, this field won't be indexed and searchable. However, when run a search query, I need the whole interview as a result, so somehow I need to send the questions of the interviewer in the response.

Let me clarify it with an example:

This is an interview:

  • [interviewer]: How are you?
  • {interviewee}: I am find. And you?
  • [interviewer]: Fine, thanks.

I only want to make the sentence "I am find. And you?" searchable, because it is only paragraph said by the interviewee, but when I Elasticsearch returns a response, I want to get the whole interview as a response.

My question is, who should I go about achieving this? I am not asking for how to create the exact mapping (though any input there is also appreciated), but rather what properties should I have in mapping? How can I make part of the speech searchable, the other part not searchable, while search responses contain the whole interview?

For reference, here is the mapping I came up with so far:

PUT /interviews
  "mappings": {
    "_doc": {
      "properties": {
        "title": {"type": "text"},
        "lead": {"type": "text"},
        "body": {
          "type": "text",
          "analyzer": "hungarian",
          "index_phrases": true
        "interviewerBody" {
          "type": "text",

The problem with this mapping is that I put the questions and the answers into separate properties, so I do not know in which order the questions by the interviewer and the answers by the interviewee came.

Here is an example interview for reference. The p elements with a direct em child element are said by the interviewer, the p elements without any more children are said by interviewee.

(Mark Walkom) #2

You probably want to look at something like parent/child. Where the parent is the interview "event", and then the children are the questions and answers. Then you can control things with a lot more ease.

(András Hinkel) #3

After a bit of research, I found that I needed to use nested type. This is intended to be used when indexing arrays of objects.

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.