Most of my experience with elasticsearch has been for storing timebased data that comes from logs we track on the network. We are trying to expand it for use as a database for a web based application being written. I am just looking for any tips on what I can be looking at to try and learn how to do this a little better. Here is the current issue I am working on:
Data being indexed (coming from a different application)
example:
sites = [
{
sys_id: "13c04e140f92b500d55ae498b1050e8a",
name: "Receivables Performance Management, LLC"
},
{
sys_id: "13c04e140f92b500d55ae498b1050e8b",
name: "ROCHESTER NY"
},
{
sys_id: "13c04e140f92b500d55ae498b1050e8c",
name: "ROSELAND NJ"
},
{
sys_id: "17c04e140f92b500d55ae498b1050e8a",
name: "LAYTON UT"
}
]
In an effort to make the data sortable, and searchable, I have chosen to set this up as a type of "nested" with a multi-field purely to use the keyword for sorting. Here is the current mapping and settings:
PUT /sev_sites
{
"settings": {
"analysis":{
"analyzer": {
"site_analyzer":{
"type": "custom",
"tokenizer": "site_tokenizer",
"filter": "lowercase"
}
},
"tokenizer": {
"site_tokenizer":{
"type": "nGram",
"min_gram": 3,
"max_gram": 20
}
}
}
},
"mappings": {
"object": {
"properties": {
"site": {
"type": "nested",
"properties": {
"sys_id": {
"type": "text"
},
"name": {
"type": "text",
"analyzer": "site_analyzer",
"fields":{
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
So now for the problem; I am trying to set up a query to return partial matches. Can someone explain the "proper" way to setup this index and which way is best to return partial matches on the given data set? I setup the analyzer using the Elasticsearch documentation, but a simple "match" query still only returns exactly what it matches (duh!) . I have been playing with the fuzzy and regex queries, which seem to be working well. If anyone could give me any suggestions on if I am setting this up in a logical way and/or what would work better I would really appreciate it.
Also, I am still trying to get the sorting to work correctly, if anyone could help me on that, to sort by the "raw" keyword I would definitely appreciate it.
Current testing of the fuzzy search are working "ok"...
GET /sev_sites/_search
{
"query": {
"fuzzy" : {
"name" : {
"value" : "roseland",
"boost" : 1.0,
"fuzziness" : 2,
"prefix_length" : 2,
"max_expansions": 100
}
}
}
}
Returns:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 3.6795852,
"hits": [
{
"_index": "sev_sites",
"_type": "object",
"_id": "AV4qoQ8DsabpF_YTeXzo",
"_score": 3.6795852,
"_source": {
"sys_id": "dec04e140f92b500d55ae498b1050e2c",
"name": "ROCKLAND"
}
},
{
"_index": "sev_sites",
"_type": "object",
"_id": "AV4qoQXDsabpF_YTeXvT",
"_score": 3.5934165,
"_source": {
"sys_id": "13c04e140f92b500d55ae498b1050e8c",
"name": "ROSELAND NJ"
}
}
]
}
}
However, this query does not return what I had hoped:
GET /sev_sites/_search
{
"query": {
"fuzzy" : {
"name" : {
"value" : "rose",
"boost" : 1.0,
"fuzziness" : 2,
"prefix_length" : 2,
"max_expansions": 100
}
}
}
}
Returns:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4354112,
"hits": [
{
"_index": "sev_sites",
"_type": "object",
"_id": "AV4qoQ77sabpF_YTeXzn",
"_score": 1.4354112,
"_source": {
"sys_id": "dec04e140f92b500d55ae498b1050e2b",
"name": "ROCK TAVERN NY"
}
}
]
}
}