Hi,
I am indexing chess games to create a game explorer. I have a database of millions of games and I will need to index them.
The aim is to be able to query a some opening moves (e.g. "e4 e5 Nf3 Nc6"), and get a response with the 10 most popular next moves, how often they have been played and the number of times white won, number of times black won and the number of times there was a draw.
Here is what I have so far:
The Analyzer:
PUT game/
{
"settings" : {
"analysis" : {
"analyzer" : {
"moves_analyzer" : {
"tokenizer" : "moves_tokenizer"
}
},
"tokenizer" : {
"moves_tokenizer" : {
"type" : "edgeNGram",
"min_gram" : "2",
"max_gram" : "3",
"token_chars": [ "letter", "digit", "whitespace", "punctuation" ]
}
}
}
}
}
The Mapping:
PUT game/game/_mapping
{
"properties": {
"winner":{
"type": "string"
},
"moves":{
"type": "string",
"analyzer": "moves_analyzer"
}
}
}
The Query and Aggregation (hardcoded to search for 'e4 e5'):
GET game/game/_search
{
query : {
query_string : {
query : "moves:e4 e5*",
analyzer : "moves_analyzer"
}
},
"aggs": {
"nextmoves": {
"terms": {
"field": "moves",
"script": "_value.split(' ')[2]",
"size": 10
},
"aggs": {
"winners": {
"terms": {
"field": "winner"
}
}
}
}
}
}
I'm getting some strange results. What I'm looking for is to be able to do pure prefix matching, that doesn't use any fuzziness to account for typos. I also don't like the way I am using scripting in the aggregation, is there another way of doing this?
Any input on a better way to do this would be appreciated.