I have a case where each document is likely to introduce some specific
& often unique properties -- leading to millions or more fields.
The values stored under these should be searchable. But the mapping
structure itself is not used in queries.
Having ElasticSearch treat these as dynamic fields works -- but how
well does ElasticSearch cope with very large mappings?
Am I right in thinking each property adds a field to the mapping,
which is shared & held in memory by all nodes?
So a mapping can become too large, and create performance issues. E.g
100 million documents might lead to gigabytes of ram locked up
modelling the mapping.
Is that correct?
You can switch off dynamic fields altogether -- but then I think the
content stored within them becomes unsearchable.
Is there a way to have dynamic fields accessible to search, without
modelling them as part of the mapping?
I have a case where each document is likely to introduce some specific
& often unique properties -- leading to millions or more fields.
The values stored under these should be searchable. But the mapping
structure itself is not used in queries.
Having Elasticsearch treat these as dynamic fields works -- but how
well does Elasticsearch cope with very large mappings?
Am I right in thinking each property adds a field to the mapping,
which is shared & held in memory by all nodes?
So a mapping can become too large, and create performance issues. E.g
100 million documents might lead to gigabytes of ram locked up
modelling the mapping.
Is that correct?
You can switch off dynamic fields altogether -- but then I think the
content stored within them becomes unsearchable.
Is there a way to have dynamic fields accessible to search, without
modelling them as part of the mapping?
What is your data model?
It seems you could live with search on a single field only.
A relevant point I left out: I have concurrent non-conflicting edits going
on, which I want to merge. I.e. several users setting their own dynamic
fields at the same time.
Using separate fields with the update command makes this painless.
NB: I'm assuming ES handles concurrent updates nicely, but I haven't
stress-tested this yet.
If we had a single-field, we'd need to introduce versioning and conflict
handling.
Then I conclude why not use key/value stream, search on value, and use
highlighting for looking up the key(s) in the result doc?
I'm not sure I understand what you're proposing here.
NB: The alternatives I'm considering are:
Lots of dynamic fields -- with the concerns talked about in this thread.
In my code, splitting edits to the "concurrent fields" off from other
edits. Then using normal put and update for normal fields, and MVEL update
scripts for edits to concurrent fields.
Versioning and conflict-handling.
Writing a plugin for the ES backend to allow for custom merge logic on a
single field.
Best regards,
Daniel
On Monday, 28 April 2014 08:13:17 UTC+1, Jörg Prante wrote:
What is your data model?
It seems you could live with search on a single field only.
Then I conclude why not use key/value stream, search on value, and use
highlighting for looking up the key(s) in the result doc?
{
"key" : "...",
"value" : "..."
}
Jörg
On Sun, Apr 27, 2014 at 1:15 PM, Daniel Winterstein < daniel.wi...@gmail.com <javascript:>> wrote:
Hello,
I have a case where each document is likely to introduce some specific
& often unique properties -- leading to millions or more fields.
The values stored under these should be searchable. But the mapping
structure itself is not used in queries.
Having Elasticsearch treat these as dynamic fields works -- but how
well does Elasticsearch cope with very large mappings?
Am I right in thinking each property adds a field to the mapping,
which is shared & held in memory by all nodes?
So a mapping can become too large, and create performance issues. E.g
100 million documents might lead to gigabytes of ram locked up
modelling the mapping.
Is that correct?
You can switch off dynamic fields altogether -- but then I think the
content stored within them becomes unsearchable.
Is there a way to have dynamic fields accessible to search, without
modelling them as part of the mapping?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.