How do I return only one result per type?

coogle · March 16, 2017, 9:35pm

Hello,

I have a nested mapping as follows:

{
  "fbs": {
    "mappings": {
      "institution": {
        "properties": {
          "document": {
            "type": "nested",
            "properties": {
              "document_id": {
                "type": "long"
              },
              "expiration_date": {
                "type": "date"
              },
              "flags": {
                "type": "text",
                "norms": false,
                "analyzer": "comma"
              },
              "is_active": {
                "type": "boolean"
              },
              "is_current": {
                "type": "boolean"
              },
              "name": {
                "type": "text"
              },
              "section": {
                "type": "nested",
                "properties": {
                  "created_at": {
                    "type": "date"
                  },
                  "data": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                      }
                    }
                  },
                  "file": {
                    "type": "nested",
                    "properties": {
                      "author": {
                        "type": "text",
                        "fields": {
                          "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                          }
                        }
                      },
                      "content": {
                        "type": "text",
                        "analyzer": "standard",
                        "search_analyzer": "fbs_search_analyzer"
                      },
                      "content_length": {
                        "type": "long"
                      },
                      "content_type": {
                        "type": "text",
                        "fields": {
                          "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                          }
                        }
                      },
                      "date": {
                        "type": "date"
                      },
                      "keywords": {
                        "type": "text",
                        "fields": {
                          "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                          }
                        }
                      },
                      "language": {
                        "type": "text",
                        "fields": {
                          "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                          }
                        }
                      },
                      "title": {
                        "type": "text",
                        "fields": {
                          "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                          }
                        }
                      }
                    }
                  },
                  "filename": {
                    "type": "text",
                    "norms": false
                  },
                  "fingerprint": {
                    "type": "text",
                    "norms": false
                  },
                  "flags": {
                    "type": "text",
                    "norms": false,
                    "analyzer": "comma"
                  },
                  "is_active": {
                    "type": "boolean"
                  },
                  "name": {
                    "type": "text"
                  },
                  "section_id": {
                    "type": "long"
                  },
                  "updated_at": {
                    "type": "date"
                  }
                }
              },
              "start_date": {
                "type": "date"
              }
            }
          },
          "institution_id": {
            "type": "long"
          },
          "name_en": {
            "type": "text",
            "norms": false
          },
          "name_fr": {
            "type": "text",
            "norms": false
          },
          "region_id": {
            "type": "long"
          }
        }
      }
    }
  }
}

I am currently searching on document.section.file.content for example to pull documents out that match my keywords. However, what I want to do is limit the results to only one match (the highest relevancy) per unique document_id.

I.e. I have a single logical "document" that is represented by 10 PDF "sections", I am searching those sections but I only want to return the top result per document (i.e. limit 1 result per document_id)

Is there a way I can coax ES into doing this or am I better off just post-processing the results?

John

nik9000 · March 16, 2017, 10:05pm

I'd use msearch for that.

coogle · March 17, 2017, 12:45am

I'm not sure how msearch would be used to limit the results to one section per document?

nik9000 · March 17, 2017, 2:34pm

Ah, reread your problem. I thought you wanted something else. I don't know nested documents super well so I can't help you off hand. There might be something special for nested documents I don't know about, but this problem looks similar to field collapsing which is implemented in the next minor version of Elasticsearch.

coogle · March 18, 2017, 6:31am

I was able in the end to mostly solve my problem by using a top_hits aggregation. Unfortunately one of the problems there is that in doing so I've lost the ability to page results. Thankfully I have a small enough data set where I can programmatically page once I get the results back without too much trouble but I really wish ES had some sort of way of doing that internally.

system · April 15, 2017, 6:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Querying Nested Datatype Elasticsearch	7	1163	February 24, 2018
How to return only inner hits in nested field Elasticsearch	1	1489	October 8, 2019
Nested query Elasticsearch	4	19	July 22, 2024
Querying Nested Object returns all objects Elasticsearch	3	347	May 10, 2019
Search query nested fields Elasticsearch	2	484	January 17, 2018

How do I return only one result per type?

Related topics