PUT test
{
"mappings" : {
"type1" : {
"properties" : {
"field" : { "type" : "keyword" }
}
}
}
}
PUT test/type1/1
{
"field" : "<html><body>ANY CONTENTS LONGER TO MAKE THE TOTAL LENGTH MORE THAN 32KB which may contains any kind of information, numbers, names, words, sentences, html code, javascript code, xml code, you name it1234567890</body></html>"
}
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Document contains at least one immense term in field=\"field\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[49, 50, 51, 52, 53, 54, 55, 56, 57, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 48]...', original message: bytes can be at most 32766 in length; got 51000"
}
],
"type": "illegal_argument_exception",
"reason": "Document contains at least one immense term in field=\"field\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[49, 50, 51, 52, 53, 54, 55, 56, 57, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 48]...', original message: bytes can be at most 32766 in length; got 51000",
"caused_by": {
"type": "max_bytes_length_exceeded_exception",
"reason": "bytes can be at most 32766 in length; got 51000"
}
},
"status": 400
}
As you can see, I am hitting the 32KB limit when I use a keyword data type.
The actual content is any HTML document you want, and I don't have the permission to submit any actual document content, but it is general html with different language per field.
Now, without the ability to have long documents in keyword data type, then I think I will not be able to use any partial matching techniques, right?
Now trying "index" = false
PUT test1
{
"mappings" : {
"type1" : {
"properties" : {
"field" : { "type" : "text", "index": false }
}
}
}
}
PUT test1/type1/1
{
"field" : "<html><body>ANY CONTENTS LONGER TO MAKE THE TOTAL LENGTH MORE THAN 32KB which may contains any kind of information, numbers, names, words, sentences, html code, javascript code, xml code, you name it1234567890</body></html>"
}
It is successful, but when searching:
GET /test1/_search
{
"query": {
"wildcard" : { "field" : "*234*" }
}
}
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to create query: {\n \"wildcard\" : {\n \"field\" : {\n \"wildcard\" : \"*234*\",\n \"boost\" : 1.0\n }\n }\n}",
"index_uuid": "vPNInbApSWCqnJJijHSrGQ",
"index": "test1"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "test1",
"node": "ajoCRgAOR0icCeAMmR7LYw",
"reason": {
"type": "query_shard_exception",
"reason": "failed to create query: {\n \"wildcard\" : {\n \"field\" : {\n \"wildcard\" : \"*234*\",\n \"boost\" : 1.0\n }\n }\n}",
"index_uuid": "vPNInbApSWCqnJJijHSrGQ",
"index": "test1",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Cannot search on field [field] since it is not indexed."
}
}
}
]
},
"status": 400
}
As you can see, if I didn't index it, it will be stored, but it will not allow me to use the wildcard query.