FAST stored the source data in distributed machines, only the control API
was not distributed (similar to ES HTTP curl requests, which also connect
to one host only).
Of course you could index raw JSON to a preparer index with a single field,
_all disabled, and field set to "not indexed" so there is no Lucene
activity on it. This preparer index could also hold mappings in special
documents for the indexing runs.
The data duplication factor depends on the complexity of the mapping(s),
and the characteristics of the data (dictionary size, analyzer / tokenizer
output, norms etc.)
A plugin would do no magic at all, it could bundle the calls that otherwise
a client would have to execute from remote, and adds some convenience
commands for managing the prepare stage (e.g. suspend/resume) and showing
the current state of indexing.
If redundant data is a no-go, then the whole approach is counterintuitive.
Jörg
On Tue, Nov 11, 2014 at 7:46 PM, Amish Asthana asthanaamish@gmail.com
wrote:
With existing Elastic Search I can think of an architecture like this.
Index : indexForDataDump : No mapping(Is it possible?) or minimum mapping.
Use only to dump data from external system. There is some primary key.
There are different search indexes with different mapping : search-index1,
search-index2 etc.
These indexes get populated from the indexForDataDump using technique
mentioned here
http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/.
So this way I can drop the search index as desired and create new one with
new mapping.
Any pros/cons or issue with this approach? There will be data duplication
but I am hoping its minimum. ( Any way to quantify it?)
regards and thanks
amish
On Tuesday, November 11, 2014 10:02:46 AM UTC-8, Amish Asthana wrote:
I am not aware of FAST but the idea looks promising.
However it might not be that easy to just have plugin for ES, as the data
itself is distributed on different machines.
So it will not be possible to have just one server with the data, as it
will become single point of failure.
regards and thanks
amish
On Tuesday, November 11, 2014 1:21:53 AM UTC-8, Jörg Prante wrote:
I know from the FAST Search engine ten years ago there was a two-phase
commit for distributed search and indexing. One server could listen on the
API and keep the (compressed) input stored, and all the other indexing
servers were supplied by this input in another phase to create binary
indexes, either automatically, or by manual operation, called
"suspend/resume indexing API".
The advantage was that data could be received permanently via API while
FAST indexing could be stopped temporarily in order to balance between
indexing and search performance on limited hardware.
Do you think of something like that also for Elasticsearch? This
architecture is possible to implement by a plugin.
Jörg
On Mon, Nov 10, 2014 at 10:13 PM, Amish Asthana asthan...@gmail.com
wrote:
Hi
Is there a way we can decouple data and associated mapping/indexing in
Elasticsearch itself.
Basically store the raw data as source( json or some other format) and
various mapping/index can be used on top of that.
I understand that one can use an outside database or file system, but
can it be natively achieved in ES itself.
Basically we are trying to see how our ES instance will work when we
have to change mapping of existing and continuously incoming data without
any downtime for the end user.
We have an added wrinkle that our indexing has to be edit aware for
versioning purpose; unlike ES where each edit is a new record.
regards and thanks
amish
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEcAt0xR5Ch7dE53SQcoOgjkbd%3DcBX4dRsG9EDVdnWUfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.