Searching and Aggregating Terms


(Andrew Kowalik) #1

I am trying to create a search query that finds the top results that can then be used as filtered terms. Let me describe the workflow.

  • User wants to filter list of data. The filters should be elasticsearch terms, so exact matches.
  • User enters search string "John Doe". What I want to see is a return of the top hits by field. So for example the results might include the following. 1) full_name: "John Doe" 2) email: "john.doe@gmail.com" 3) title: "something something doe something john"

Is there a nice recipe for this type of search? So far I have been playing around with a match search and then aggregation but I am not 100% sure if this is the best route.


(Dan Tuffery) #2

If I have understood you correctly, it is possible to do what you want using a filter aggregation with a terms aggregation for every field.

To search on the email field you'll need to add a mapping that indexes the email using the simple analyzer so that the email name is searchable, e.g., john.doe@gmail.com is tokenized as 'john' 'doe' 'email' 'com'. You'll also need to store untokenized copies of the fields in the index for the aggregation values. Here is a simple example:

Create the index with the mapping

POST /example
{
    "mappings": {
        "doc": {
            "properties": {
                "full_name": {
                    "type": "string",
                    "fields": {
                        "raw": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                },
                "email": {
                    "type": "string",
                    "fields": {
                        "email_name": {
                            "type": "string",
                            "analyzer": "simple"
                        },
                        "raw": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                },
                "title": {
                    "type": "string",
                    "fields": {
                        "raw": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                }
            }
        }
    }
}

Create some documents:

POST /example/doc/1
{
    "full_name": "John Doe",
    "email": "john.doe@gmail.com",
    "title": "something something"
}

POST /example/doc/2
{
    "full_name": "John Doe",
    "email": "john.doe@gmail.com",
    "title": "something something"
}

POST /example/doc/3
{
    "full_name": "Joe Smith",
    "email": "test@email.com",
    "title": "something something doe something john"
}

Execute the search:

POST /example/doc/_search?search_type=count
{
    "aggregations": {
        "name_filter_query": {
            "filter": {
                "query": {
                    "match": {
                        "full_name": {
                            "query": "John Doe",
                            "operator": "AND"
                        }
                    }
                }
            },
            "aggregations": {
                "full_name": {
                    "terms": {
                        "field": "full_name.raw"
                    }
                }
            }
        },
        "email_filter_query": {
            "filter": {
                "query": {
                    "match": {
                        "email_name": {
                            "query": "John Doe",
                            "operator": "AND"
                        }
                    }
                }
            },
            "aggregations": {
                "full_name": {
                    "terms": {
                        "field": "email.raw"
                    }
                }
            }
        },
        "title_filter_query": {
            "filter": {
                "query": {
                    "match": {
                        "title": {
                            "query": "John Doe",
                            "operator": "AND"
                        }
                    }
                }
            },
            "aggregations": {
                "full_name": {
                    "terms": {
                        "field": "title.raw"
                    }
                }
            }
        }
    }
}

Returns this response:

"aggregations": {
    "email_filter_query": {
        "doc_count": 2,
        "full_name": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "john.doe@gmail.com",
                    "doc_count": 2
                }
            ]
        }
    },
    "title_filter_query": {
        "doc_count": 1,
        "full_name": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "something something doe something john",
                    "doc_count": 1
                }
            ]
        }
    },
    "name_filter_query": {
        "doc_count": 2,
        "full_name": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "John Doe",
                    "doc_count": 2
                }
            ]
        }
    }
}

(system) #3