Dynamic mapping with *lots* of fields?


(Daniel Winterstein) #1

Hello,

I have a case where each document is likely to introduce some specific
& often unique properties -- leading to millions or more fields.
The values stored under these should be searchable. But the mapping
structure itself is not used in queries.

Having ElasticSearch treat these as dynamic fields works -- but how
well does ElasticSearch cope with very large mappings?

Am I right in thinking each property adds a field to the mapping,
which is shared & held in memory by all nodes?
So a mapping can become too large, and create performance issues. E.g
100 million documents might lead to gigabytes of ram locked up
modelling the mapping.
Is that correct?

You can switch off dynamic fields altogether -- but then I think the
content stored within them becomes unsearchable.
Is there a way to have dynamic fields accessible to search, without
modelling them as part of the mapping?

Thank you &
Best regards,

  • Daniel

--
Dr Daniel Winterstein
Director

A: CodeBase Argyle House, Edinburgh, EH3 9DR
M: +44 (0)772 5172 612
http://winterwell.com http://sodash.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEmLStmrGs%2BHaW%3De1XChi%3DFQ_LKEPEntRUvT8zt_BJKPUESkeg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

What is your data model?

It seems you could live with search on a single field only.

Then I conclude why not use key/value stream, search on value, and use
highlighting for looking up the key(s) in the result doc?

{
"key" : "...",
"value" : "..."
}

Jörg

On Sun, Apr 27, 2014 at 1:15 PM, Daniel Winterstein <
daniel.winterstein@gmail.com> wrote:

Hello,

I have a case where each document is likely to introduce some specific
& often unique properties -- leading to millions or more fields.
The values stored under these should be searchable. But the mapping
structure itself is not used in queries.

Having ElasticSearch treat these as dynamic fields works -- but how
well does ElasticSearch cope with very large mappings?

Am I right in thinking each property adds a field to the mapping,
which is shared & held in memory by all nodes?
So a mapping can become too large, and create performance issues. E.g
100 million documents might lead to gigabytes of ram locked up
modelling the mapping.
Is that correct?

You can switch off dynamic fields altogether -- but then I think the
content stored within them becomes unsearchable.
Is there a way to have dynamic fields accessible to search, without
modelling them as part of the mapping?

Thank you &
Best regards,

  • Daniel

--
Dr Daniel Winterstein
Director

A: CodeBase Argyle House, Edinburgh, EH3 9DR
M: +44 (0)772 5172 612
http://winterwell.com http://sodash.com

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEmLStmrGs%2BHaW%3De1XChi%3DFQ_LKEPEntRUvT8zt_BJKPUESkeg%40mail.gmail.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHZfdBS31PB4aOMgusfrND1poGmDbb-%2B9So28x1ZDeKFQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Daniel Winterstein) #3

Hello Jörg,

What is your data model?
It seems you could live with search on a single field only.

A relevant point I left out: I have concurrent non-conflicting edits going
on, which I want to merge. I.e. several users setting their own dynamic
fields at the same time.
Using separate fields with the update command makes this painless.
NB: I'm assuming ES handles concurrent updates nicely, but I haven't
stress-tested this yet.
If we had a single-field, we'd need to introduce versioning and conflict
handling.

Then I conclude why not use key/value stream, search on value, and use
highlighting for looking up the key(s) in the result doc?

I'm not sure I understand what you're proposing here.

NB: The alternatives I'm considering are:

  1. Lots of dynamic fields -- with the concerns talked about in this thread.
  2. In my code, splitting edits to the "concurrent fields" off from other
    edits. Then using normal put and update for normal fields, and MVEL update
    scripts for edits to concurrent fields.
  3. Versioning and conflict-handling.
  4. Writing a plugin for the ES backend to allow for custom merge logic on a
    single field.

Best regards,

  • Daniel

On Monday, 28 April 2014 08:13:17 UTC+1, Jörg Prante wrote:

What is your data model?

It seems you could live with search on a single field only.

Then I conclude why not use key/value stream, search on value, and use
highlighting for looking up the key(s) in the result doc?

{
"key" : "...",
"value" : "..."
}

Jörg

On Sun, Apr 27, 2014 at 1:15 PM, Daniel Winterstein <
daniel.wi...@gmail.com <javascript:>> wrote:

Hello,

I have a case where each document is likely to introduce some specific
& often unique properties -- leading to millions or more fields.
The values stored under these should be searchable. But the mapping
structure itself is not used in queries.

Having ElasticSearch treat these as dynamic fields works -- but how
well does ElasticSearch cope with very large mappings?

Am I right in thinking each property adds a field to the mapping,
which is shared & held in memory by all nodes?
So a mapping can become too large, and create performance issues. E.g
100 million documents might lead to gigabytes of ram locked up
modelling the mapping.
Is that correct?

You can switch off dynamic fields altogether -- but then I think the
content stored within them becomes unsearchable.
Is there a way to have dynamic fields accessible to search, without
modelling them as part of the mapping?

Thank you &
Best regards,

  • Daniel

--
Dr Daniel Winterstein
Director

A: CodeBase Argyle House, Edinburgh, EH3 9DR
M: +44 (0)772 5172 612
http://winterwell.com http://sodash.com

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEmLStmrGs%2BHaW%3De1XChi%3DFQ_LKEPEntRUvT8zt_BJKPUESkeg%40mail.gmail.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0f402257-4d5a-4b69-b0a3-4cc5e939aaa8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4