Hi,
I'm trying to migrate a text search tool from oracle to elasticsearch. It's working, but unfortunately searching is a lot slower than the oracle version, and I'm hoping for clues as to what can be done about it.
We're indexing millions of text documents, but rather than indexing the text directly, we index a number of fingerprints for each document. Each fingerprint is an integer of appr. 50 bits, and each document produces in the order of hundreds of fingerprints (thousands, occasionally).
The query part is to find documents that are similar to an input document. Two documents are similar if there is any overlap in their fingerprints. So the query is a terms query with the terms from the input document.
This query is the slow part.
The problem is mainly CPU usage. When oracle and ES/windows has the data in the cache, a single query takes oracle 5 to 25 ms, while it takes ES 200 to 350 ms.
Suggestions as to how to make it faster will be much appreciated.
Setup:
Windows server 2012r2.
Single node ES cluster
Elasticsearch 2.2.0
ES memory pool: 10GB
Client: NEST 1 / .NET
CPU: E5-2620 (same for oracle and ES)
Sample query:
POST /myindex-v8/document/_search
{ "size": 5000,
"fields": [ "properties" ],
"filter": { "terms": { "fingerprints": [ 458555129998123,
426113387683010,
1047636941817882,
339603061195725,
496322075763366,
898270748861439,
715145004425481,
1083745791071972,
364744235080270,
[ quite a few more of these ]
1042455538904631,
582951484394277,
264684524292891,
827184922645852 ] } } }
Mapping:
{
"myindex-v8": {
"mappings": {
"document": {
"_all": {
"enabled": false
},
"properties": {
"fingerprints": {
"type": "double",
"precision_step": 2147483647
},
"properties": {
"type": "string",
"index": "no"
},
"refKey": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Mapping notes:
The "precision_step" setting in the mapping provided a substantial indexing performance boost.
Query notes:
Most of these queries return 1 to 20 results.