I use Elasticsearch to store large amount of log data, so I'm particularly concerned abount storage costs.
This is a Lucene segment in my ES index:
1.5K _3ee.fdm
2.4G _3ee.fdt
172K _3ee.fdx
3.4K _3ee.fnm
172M _3ee.kdd
518K _3ee.kdi
555B _3ee.kdm
720B _3ee.si
279M _3ee_Lucene80_0.dvd
2.0K _3ee_Lucene80_0.dvm
390M _3ee_Lucene84_0.doc
1.2G _3ee_Lucene84_0.pos
452M _3ee_Lucene84_0.tim
6.8M _3ee_Lucene84_0.tip
2.3K _3ee_Lucene84_0.tmd
It shows that the .pos file
accounts for ~25% of the total size.
So I want to use index_options: docs
to save the .pos file
space and part of the .doc file
space.
My question is: without the position data, is there any alternative solution to phrase query
or LIKE "%xxx yyy%" in MYSQL
?
For example, I have tried the script query, access _source field
in the script and use contains
to replace the phrase query
:
GET my_index/_search
{
"script_fields": {
"source_field": {
"script": {
"source": "params['_source']['msg']"
}
},
"source_field_contains_xxx_yyy": {
"script": {
"source": "params['_source']['msg'].contains('xxx yyy')"
}
}
}
}
But it seems that the _source field
can only be accessed in the field context
, can't be accessed in the filter context
.