Hi,
Representative Index:
PUT test_index
{
"mappings": {
"_source": {
"excludes": [
"bigtext"
]
},
"properties": {
"bigtext": {
"type": "text"
},
"orgids": {
"type": "integer"
},
"folder": {
"type": "keyword"
},
"docname": {
"type": "keyword"
},
"docgroupid": {
"type": "keyword"
}
}
}
}
Representative search query is below.
The "filter" is a complex "bool" query based on our organization hierarchy and our various user role assignments on the organizations where each role has different folder permissions. Under certain organizations are document groups. Users can also be assigned direct access to document group folders.
GET /test_index/_search
{
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"term": {
"folder": {
"value": "A"
}
}
},
{
"bool": {
"must": [
{
"term": {
"folder": {
"value": "B"
}
}
},
{
"term": {
"orgids": {
"value": 111
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"folder": {
"value": "C"
}
}
},
{
"term": {
"orgids": {
"value": 123
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"folder": {
"value": "B"
}
}
},
{
"terms": {
"docgroupid": [
"XXX",
"YYY"
]
}
}
]
}
}
]
}
}
],
"must": [
{
"simple_query_string": {
"query": "***USER_SEARCH_INPUT***",
"fields": [
"bigtext"
]
}
}
]
}
}
}
The problem I am having is the _update_by_query command below is extremely slow. The "orgids" have to be updated when a document group is moved to another organization. The "bigtext" never changes but _update_by_query has to read the source of every document in the document group and re-index all the "bigtext"s.
In my 1 node test, a document group with 75 documents (10 are large - averaging 8 million characters) takes 90 seconds to complete the _update_by_query on the 75 docs.
POST /test_index/_update_by_query
{
"script": {
"source": "ctx._source.orgids=[ 843, 43, 974 ]",
"lang": "painless"
},
"query": {
"term": {
"docgroupid": "ZZZ"
}
}
}
Questions: Is there any way around this unnecessary reading and re-indexing of "bigtext"? I also need highlighting of "bigtext" so any proposed solution needs to support this.
Is there some other technique like multiple indices or child documents that would allow fast updating of "orgids"?
I tried increasing the refresh_interval, but the _update_by_query is still slow.
I tried excluding "bigtext" from _source as you will notice from the representative mapping. This sped up the _update_by_query drastically, but then "bigtext" is gone and can't be searched.
Thanks,
Jeff