I am designing an index for logs related to some search API which has information like type/count of searches AND type/count of results served in a search response.
Sample Information in the Log:
{
SearchId: "12345678",
SearchLocation: "India",
Results: [
{ position: 1,
result_accuracy: "high"
},
{ position: 2,
result_accuracy: "medium"
}
]
}
For this information, I want to create an index to visualize the following in Kibana:
Use Cases:
Count of Searches.
Count of Results with medium accuracy at position 1.
Count of Searches with medium accuracy results at position 1.
Approach
Can't use Object type field for "results" because it will serve false information for use case 2 and 3.
Can't use nested/parent-child field types for results because it is not yet supported in Kibana visualization.
One possible solution could be to flatten the results and add search related information in each result doc Like:
Count of Searches with some filtering criteria --> Count Unique of SearchId
Count of Results with medium accuracy at position 1 --> Count of docs with filtering on result_accuracy and position
Count of Searches with medium accuracy results at position 1 --> Count Unique of SearchId with some filters
Now, this field SearchId will have high cardinality. (for 2 million searches per day, and 10 results per search --> 20 million docs with 2 million unique SearchIds)
Challenge 1 Some percentage error with count unique in Cardinality aggregation
Challenge 2 Now when these result documents are distributed across multiple shards, then the no of unique searches will be inflated because unqiue counts will be calculated wrt each shard
It sounds to me like you want to search and analyse 2 different things - searches and results. One way to do this without requiring parent-child or nested mappings (not supported by Kibana) might be to create 2 separate indices, one for searches and one for results.
The searches index might look something like this:
Here I concatenated each result into a single string, which would be analyzed using a whitespace analyzer as well as mapped as a keyword field. You can use different types of queries depending on what you are looking for.
In the results index you would index each result separately with the search information denormalised as in your example:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.