Cloud plugin for Windows Azure

Hi,

Our startup is planning to use elasticsearch as our cloud search
solution, however, we are primarily .NET shop and running on Windows
Azure, which elasticsearch doesn't currently have a plugin for, so we
need to build our own.

I have a budget set aside for it, but in order to reduce TTM I would
prefer it done by someone who's already familiar with the codebase, so
I'd rather outsource it to someone in this group for a reasonable fee.
Naturally the resulting code will be contributed back for everyone
else to use.

We need a solid solution for discovery, efficient persistance of
indices and configuration/logging via Windows Azure standard means.

Any takers?

Regards,
Ronen.

Just a note that I think requires emphasis. Running elasticsearch on a
"cloud" does not require a specific cloud plugin. The AWS cloud plugin
simplifies the deployment of elasticsearch on AWS, but you can run
elasticsearch without it.

  1. Discovery: The AWS plugin simplifies the discovery of nodes. But, its
    not required. One can use unicast discovery and list the hosts.

  2. S3 gateway: Most users on AWS use the default local gateway, which is
    more lightweight compared to S3.

-shay.banon

On Wed, Nov 23, 2011 at 8:57 AM, Ronen ronen@quilink.com wrote:

Hi,

Our startup is planning to use elasticsearch as our cloud search
solution, however, we are primarily .NET shop and running on Windows
Azure, which elasticsearch doesn't currently have a plugin for, so we
need to build our own.

I have a budget set aside for it, but in order to reduce TTM I would
prefer it done by someone who's already familiar with the codebase, so
I'd rather outsource it to someone in this group for a reasonable fee.
Naturally the resulting code will be contributed back for everyone
else to use.

We need a solid solution for discovery, efficient persistance of
indices and configuration/logging via Windows Azure standard means.

Any takers?

Regards,
Ronen.

Hi Shay,

It is indeed possible to run elasticsearch on Azure as is (we are
actually doing it right now). However, it is rather awkward and to
make it a first class citizen I feel a proper plugin is needed. To be
more specific:

  1. Config: while it is certainly possible to work with current config
    files, azure provides in build configuration mechanisms, including
    events on config changes, and it would provide a much smoother
    experience if we could take an advantage of it.

  2. Discovery: The whole idea of cloud is that instances come and go,
    and to manually list hosts kind of defeats the purpose. We have some,
    rather convoluted, logic to pre-seed the config file with currently-
    alive hosts, before launching the instance, but, again - it's very
    awkward.

  3. S3 gateway: I'm less familiar with AWS, but on Azure I really need
    a persistent way to store indices, unless I want to do a full index
    rebuild on each cluster restart. When I push a new version out VMs are
    rebuilt squeaky clean, so local persistence would not work (unless I
    want to do a rolling upgrade and wait for each new instance to get
    data from its peers, which, for a large cluster, could take quite a
    while and would complicate the upgrade significantly) . Another option
    is to use azure blob backed local drive (an EBS analog), however, it
    too has its problems: first of all, for local gateway elasticsearch
    uses synchronous persistence, which has a negative impact on
    performance (as we are actually doing cloud storage reads/writes
    behind the seen); furthermore, local disk access tends to be rather
    chatty, when compared to remote access, so this could potentially
    impact performance as well, plus rack up the usage charges (as azure
    blob storage is billed, among other things, per request); finally –
    and correct me if I’m wrong here - EBS option would store “everything”
    on the cloud – per instance, including shards replicas ?, which would
    just waste a lot of space and IO for nothing.
    All this can be rectified by a proper gateway implementation that
    would provide a solid asynchronous backend, optimized in terms of
    performance, traffic and storage usage. (For example – we have used an
    alternative search implementation based on Lucene.NET, which uses a
    composite Directory object which is backed by either a FSDirectory or
    even a RAMDirectory and also concurrently/asynchronously updates the
    Azure storage with proper request batching – this makes a significant
    difference both in terms of performance and usage cost)

To summarize, while it is indeed possible to run elasticsearch on
Azure as-is (and we are doing that), to make it truly shine I believe
a proper backend plugin is required.

Ronen.

On Nov 23, 5:15 pm, Shay Banon kim...@gmail.com wrote:

Just a note that I think requires emphasis. Running elasticsearch on a
"cloud" does not require a specific cloud plugin. The AWS cloud plugin
simplifies the deployment of elasticsearch on AWS, but you can run
elasticsearch without it.

  1. Discovery: The AWS plugin simplifies the discovery of nodes. But, its
    not required. One can use unicast discovery and list the hosts.

  2. S3 gateway: Most users on AWS use the default local gateway, which is
    more lightweight compared to S3.

-shay.banon

On Wed, Nov 23, 2011 at 8:57 AM, Ronen ro...@quilink.com wrote:

Hi,

Our startup is planning to use elasticsearch as our cloud search
solution, however, we are primarily .NET shop and running on Windows
Azure, which elasticsearch doesn't currently have a plugin for, so we
need to build our own.

I have a budget set aside for it, but in order to reduce TTM I would
prefer it done by someone who's already familiar with the codebase, so
I'd rather outsource it to someone in this group for a reasonable fee.
Naturally the resulting code will be contributed back for everyone
else to use.

We need a solid solution for discovery, efficient persistance of
indices and configuration/logging via Windows Azure standard means.

Any takers?

Regards,
Ronen.- Hide quoted text -

  • Show quoted text -

Did not fully understand 3, or your reasoning. First, the Lucene.NET
solution, FSDirectory is synchronous, so against which file system did you
run it? Second, I don't understand your concept of store on the "cloud", it
seems to be all over the place. EBS is storing on the cloud, but what is
not? An ephemeral file system? You can use the local gateway on both,
replicas make sense on both cases. Sure, storing "replicas" does not make
sense in the case of s3 gateway. Maybe its because I am not familiar with
Azure and what features/options it has.

On Thu, Nov 24, 2011 at 11:00 AM, Ronen ronen@quilink.com wrote:

Hi Shay,

It is indeed possible to run elasticsearch on Azure as is (we are
actually doing it right now). However, it is rather awkward and to
make it a first class citizen I feel a proper plugin is needed. To be
more specific:

  1. Config: while it is certainly possible to work with current config
    files, azure provides in build configuration mechanisms, including
    events on config changes, and it would provide a much smoother
    experience if we could take an advantage of it.

  2. Discovery: The whole idea of cloud is that instances come and go,
    and to manually list hosts kind of defeats the purpose. We have some,
    rather convoluted, logic to pre-seed the config file with currently-
    alive hosts, before launching the instance, but, again - it's very
    awkward.

  3. S3 gateway: I'm less familiar with AWS, but on Azure I really need
    a persistent way to store indices, unless I want to do a full index
    rebuild on each cluster restart. When I push a new version out VMs are
    rebuilt squeaky clean, so local persistence would not work (unless I
    want to do a rolling upgrade and wait for each new instance to get
    data from its peers, which, for a large cluster, could take quite a
    while and would complicate the upgrade significantly) . Another option
    is to use azure blob backed local drive (an EBS analog), however, it
    too has its problems: first of all, for local gateway elasticsearch
    uses synchronous persistence, which has a negative impact on
    performance (as we are actually doing cloud storage reads/writes
    behind the seen); furthermore, local disk access tends to be rather
    chatty, when compared to remote access, so this could potentially
    impact performance as well, plus rack up the usage charges (as azure
    blob storage is billed, among other things, per request); finally –
    and correct me if I’m wrong here - EBS option would store “everything”
    on the cloud – per instance, including shards replicas ?, which would
    just waste a lot of space and IO for nothing.
    All this can be rectified by a proper gateway implementation that
    would provide a solid asynchronous backend, optimized in terms of
    performance, traffic and storage usage. (For example – we have used an
    alternative search implementation based on Lucene.NET, which uses a
    composite Directory object which is backed by either a FSDirectory or
    even a RAMDirectory and also concurrently/asynchronously updates the
    Azure storage with proper request batching – this makes a significant
    difference both in terms of performance and usage cost)

To summarize, while it is indeed possible to run elasticsearch on
Azure as-is (and we are doing that), to make it truly shine I believe
a proper backend plugin is required.

Ronen.

On Nov 23, 5:15 pm, Shay Banon kim...@gmail.com wrote:

Just a note that I think requires emphasis. Running elasticsearch on a
"cloud" does not require a specific cloud plugin. The AWS cloud plugin
simplifies the deployment of elasticsearch on AWS, but you can run
elasticsearch without it.

  1. Discovery: The AWS plugin simplifies the discovery of nodes. But, its
    not required. One can use unicast discovery and list the hosts.

  2. S3 gateway: Most users on AWS use the default local gateway, which is
    more lightweight compared to S3.

-shay.banon

On Wed, Nov 23, 2011 at 8:57 AM, Ronen ro...@quilink.com wrote:

Hi,

Our startup is planning to use elasticsearch as our cloud search
solution, however, we are primarily .NET shop and running on Windows
Azure, which elasticsearch doesn't currently have a plugin for, so we
need to build our own.

I have a budget set aside for it, but in order to reduce TTM I would
prefer it done by someone who's already familiar with the codebase, so
I'd rather outsource it to someone in this group for a reasonable fee.
Naturally the resulting code will be contributed back for everyone
else to use.

We need a solid solution for discovery, efficient persistance of
indices and configuration/logging via Windows Azure standard means.

Any takers?

Regards,
Ronen.- Hide quoted text -

  • Show quoted text -

Hi Shay,

I want to benefit from both worlds here - from the low latency and
cost of the local storage (i.e. storage local to the machine running
the instance - like it's physical HDD or RAM) and reliability and
persistence of cloud storage (and by cloud storage I mean persistent
remote storage - like Azure blobs or S3). So, the idea is to use a
composite Directory object which routes the reads/writes to two
locations - local disk/ram and cloud storage, where local access is
synchronous (but I don't care, as compared to cloud storage access
times it's still very fast) and cloud storage access is buffered,
asynchronous and happens in the background. You can think of as
asynchronously replicating local storage to cloud storage.
Furthermore, we can be smart about what to persists to the cloud and
how to do it (i.e. op batching, optimistic concurrency, etc.)

I’m not saying that composite Directory object is a right way to
implement it in ES, but the idea behind it is still valid, imho.

Ronen.

On Nov 24, 3:59 pm, Shay Banon kim...@gmail.com wrote:

Did not fully understand 3, or your reasoning. First, the Lucene.NET
solution, FSDirectory is synchronous, so against which file system did you
run it? Second, I don't understand your concept of store on the "cloud", it
seems to be all over the place. EBS is storing on the cloud, but what is
not? An ephemeral file system? You can use the local gateway on both,
replicas make sense on both cases. Sure, storing "replicas" does not make
sense in the case of s3 gateway. Maybe its because I am not familiar with
Azure and what features/options it has.

On Thu, Nov 24, 2011 at 11:00 AM, Ronen ro...@quilink.com wrote:

Hi Shay,

It is indeed possible to run elasticsearch on Azure as is (we are
actually doing it right now). However, it is rather awkward and to
make it a first class citizen I feel a proper plugin is needed. To be
more specific:

  1. Config: while it is certainly possible to work with current config
    files, azure provides in build configuration mechanisms, including
    events on config changes, and it would provide a much smoother
    experience if we could take an advantage of it.
  1. Discovery: The whole idea of cloud is that instances come and go,
    and to manually list hosts kind of defeats the purpose. We have some,
    rather convoluted, logic to pre-seed the config file with currently-
    alive hosts, before launching the instance, but, again - it's very
    awkward.
  1. S3 gateway: I'm less familiar with AWS, but on Azure I really need
    a persistent way to store indices, unless I want to do a full index
    rebuild on each cluster restart. When I push a new version out VMs are
    rebuilt squeaky clean, so local persistence would not work (unless I
    want to do a rolling upgrade and wait for each new instance to get
    data from its peers, which, for a large cluster, could take quite a
    while and would complicate the upgrade significantly) . Another option
    is to use azure blob backed local drive (an EBS analog), however, it
    too has its problems: first of all, for local gateway elasticsearch
    uses synchronous persistence, which has a negative impact on
    performance (as we are actually doing cloud storage reads/writes
    behind the seen); furthermore, local disk access tends to be rather
    chatty, when compared to remote access, so this could potentially
    impact performance as well, plus rack up the usage charges (as azure
    blob storage is billed, among other things, per request); finally –
    and correct me if I’m wrong here - EBS option would store “everything”
    on the cloud – per instance, including shards replicas ?, which would
    just waste a lot of space and IO for nothing.
    All this can be rectified by a proper gateway implementation that
    would provide a solid asynchronous backend, optimized in terms of
    performance, traffic and storage usage. (For example – we have used an
    alternative search implementation based on Lucene.NET, which uses a
    composite Directory object which is backed by either a FSDirectory or
    even a RAMDirectory and also concurrently/asynchronously updates the
    Azure storage with proper request batching – this makes a significant
    difference both in terms of performance and usage cost)

To summarize, while it is indeed possible to run elasticsearch on
Azure as-is (and we are doing that), to make it truly shine I believe
a proper backend plugin is required.

Ronen.

On Nov 23, 5:15 pm, Shay Banon kim...@gmail.com wrote:

Just a note that I think requires emphasis. Running elasticsearch on a
"cloud" does not require a specific cloud plugin. The AWS cloud plugin
simplifies the deployment of elasticsearch on AWS, but you can run
elasticsearch without it.

  1. Discovery: The AWS plugin simplifies the discovery of nodes. But, its
    not required. One can use unicast discovery and list the hosts.
  1. S3 gateway: Most users on AWS use the default local gateway, which is
    more lightweight compared to S3.

-shay.banon

On Wed, Nov 23, 2011 at 8:57 AM, Ronen ro...@quilink.com wrote:

Hi,

Our startup is planning to use elasticsearch as our cloud search
solution, however, we are primarily .NET shop and running on Windows
Azure, which elasticsearch doesn't currently have a plugin for, so we
need to build our own.

I have a budget set aside for it, but in order to reduce TTM I would
prefer it done by someone who's already familiar with the codebase, so
I'd rather outsource it to someone in this group for a reasonable fee.
Naturally the resulting code will be contributed back for everyone
else to use.

We need a solid solution for discovery, efficient persistance of
indices and configuration/logging via Windows Azure standard means.

Any takers?

Regards,
Ronen.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -

What you mention is the idea behind the shared gateway (with an s3
implementation for example). Its not done using a Directory wrapper (its
the wrong place to integrate it with Lucene, btw), but has similar features.

On Sun, Nov 27, 2011 at 10:54 AM, Ronen ronen@quilink.com wrote:

Hi Shay,

I want to benefit from both worlds here - from the low latency and
cost of the local storage (i.e. storage local to the machine running
the instance - like it's physical HDD or RAM) and reliability and
persistence of cloud storage (and by cloud storage I mean persistent
remote storage - like Azure blobs or S3). So, the idea is to use a
composite Directory object which routes the reads/writes to two
locations - local disk/ram and cloud storage, where local access is
synchronous (but I don't care, as compared to cloud storage access
times it's still very fast) and cloud storage access is buffered,
asynchronous and happens in the background. You can think of as
asynchronously replicating local storage to cloud storage.
Furthermore, we can be smart about what to persists to the cloud and
how to do it (i.e. op batching, optimistic concurrency, etc.)

I’m not saying that composite Directory object is a right way to
implement it in ES, but the idea behind it is still valid, imho.

Ronen.

On Nov 24, 3:59 pm, Shay Banon kim...@gmail.com wrote:

Did not fully understand 3, or your reasoning. First, the Lucene.NET
solution, FSDirectory is synchronous, so against which file system did
you
run it? Second, I don't understand your concept of store on the "cloud",
it
seems to be all over the place. EBS is storing on the cloud, but what is
not? An ephemeral file system? You can use the local gateway on both,
replicas make sense on both cases. Sure, storing "replicas" does not make
sense in the case of s3 gateway. Maybe its because I am not familiar with
Azure and what features/options it has.

On Thu, Nov 24, 2011 at 11:00 AM, Ronen ro...@quilink.com wrote:

Hi Shay,

It is indeed possible to run elasticsearch on Azure as is (we are
actually doing it right now). However, it is rather awkward and to
make it a first class citizen I feel a proper plugin is needed. To be
more specific:

  1. Config: while it is certainly possible to work with current config
    files, azure provides in build configuration mechanisms, including
    events on config changes, and it would provide a much smoother
    experience if we could take an advantage of it.
  1. Discovery: The whole idea of cloud is that instances come and go,
    and to manually list hosts kind of defeats the purpose. We have some,
    rather convoluted, logic to pre-seed the config file with currently-
    alive hosts, before launching the instance, but, again - it's very
    awkward.
  1. S3 gateway: I'm less familiar with AWS, but on Azure I really need
    a persistent way to store indices, unless I want to do a full index
    rebuild on each cluster restart. When I push a new version out VMs are
    rebuilt squeaky clean, so local persistence would not work (unless I
    want to do a rolling upgrade and wait for each new instance to get
    data from its peers, which, for a large cluster, could take quite a
    while and would complicate the upgrade significantly) . Another option
    is to use azure blob backed local drive (an EBS analog), however, it
    too has its problems: first of all, for local gateway elasticsearch
    uses synchronous persistence, which has a negative impact on
    performance (as we are actually doing cloud storage reads/writes
    behind the seen); furthermore, local disk access tends to be rather
    chatty, when compared to remote access, so this could potentially
    impact performance as well, plus rack up the usage charges (as azure
    blob storage is billed, among other things, per request); finally –
    and correct me if I’m wrong here - EBS option would store “everything”
    on the cloud – per instance, including shards replicas ?, which would
    just waste a lot of space and IO for nothing.
    All this can be rectified by a proper gateway implementation that
    would provide a solid asynchronous backend, optimized in terms of
    performance, traffic and storage usage. (For example – we have used an
    alternative search implementation based on Lucene.NET, which uses a
    composite Directory object which is backed by either a FSDirectory or
    even a RAMDirectory and also concurrently/asynchronously updates the
    Azure storage with proper request batching – this makes a significant
    difference both in terms of performance and usage cost)

To summarize, while it is indeed possible to run elasticsearch on
Azure as-is (and we are doing that), to make it truly shine I believe
a proper backend plugin is required.

Ronen.

On Nov 23, 5:15 pm, Shay Banon kim...@gmail.com wrote:

Just a note that I think requires emphasis. Running elasticsearch on
a
"cloud" does not require a specific cloud plugin. The AWS cloud
plugin
simplifies the deployment of elasticsearch on AWS, but you can run
elasticsearch without it.

  1. Discovery: The AWS plugin simplifies the discovery of nodes. But,
    its
    not required. One can use unicast discovery and list the hosts.
  1. S3 gateway: Most users on AWS use the default local gateway,
    which is
    more lightweight compared to S3.

-shay.banon

On Wed, Nov 23, 2011 at 8:57 AM, Ronen ro...@quilink.com wrote:

Hi,

Our startup is planning to use elasticsearch as our cloud search
solution, however, we are primarily .NET shop and running on
Windows
Azure, which elasticsearch doesn't currently have a plugin for, so
we
need to build our own.

I have a budget set aside for it, but in order to reduce TTM I
would
prefer it done by someone who's already familiar with the
codebase, so
I'd rather outsource it to someone in this group for a reasonable
fee.
Naturally the resulting code will be contributed back for everyone
else to use.

We need a solid solution for discovery, efficient persistance of
indices and configuration/logging via Windows Azure standard means.

Any takers?

Regards,
Ronen.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -

Hi Ronen and Shay, and everyone else:

We are also very interested in running Elasticsearch (which is awesome!) on
Azure.

Ronen - if you could share your experiences and how you set it up, thus
far, that would be great.

If anyone else in the community has already installed and run on Azure, I
would love to hear how you did it.

From what I can gather thus far, similar to SOLR, we'd write a c# program
that runs at startup to mount a drive to blob storage and then poke into
Elasticsearch config files to tell it to the path to data/log files before
starting up elastic. (Mounted drive letters are not persistent so this
would have to happen every time the instance starts up.)

I would very much benefit from any and all experiences you have had running
Elasticsearch on Azure.

Thank you.

-Pete

On Wednesday, November 23, 2011 1:57:55 AM UTC-5, Ronen wrote:

Hi,

Our startup is planning to use elasticsearch as our cloud search
solution, however, we are primarily .NET shop and running on Windows
Azure, which elasticsearch doesn't currently have a plugin for, so we
need to build our own.

I have a budget set aside for it, but in order to reduce TTM I would
prefer it done by someone who's already familiar with the codebase, so
I'd rather outsource it to someone in this group for a reasonable fee.
Naturally the resulting code will be contributed back for everyone
else to use.

We need a solid solution for discovery, efficient persistance of
indices and configuration/logging via Windows Azure standard means.

Any takers?

Regards,
Ronen.

On Wednesday, November 23, 2011 1:57:55 AM UTC-5, Ronen wrote:

Hi,

Our startup is planning to use elasticsearch as our cloud search
solution, however, we are primarily .NET shop and running on Windows
Azure, which elasticsearch doesn't currently have a plugin for, so we
need to build our own.

I have a budget set aside for it, but in order to reduce TTM I would
prefer it done by someone who's already familiar with the codebase, so
I'd rather outsource it to someone in this group for a reasonable fee.
Naturally the resulting code will be contributed back for everyone
else to use.

We need a solid solution for discovery, efficient persistance of
indices and configuration/logging via Windows Azure standard means.

Any takers?

Regards,
Ronen.

Someone could run "Elasticsearch" in "Windows Azure". I will appreciate any
help you can give.

regards

El miércoles, 16 de mayo de 2012 11:52:36 UTC-3, Peter O'Toole escribió:

Hi Ronen and Shay, and everyone else:

We are also very interested in running Elasticsearch (which is awesome!)
on Azure.

Ronen - if you could share your experiences and how you set it up, thus
far, that would be great.

If anyone else in the community has already installed and run on Azure, I
would love to hear how you did it.

From what I can gather thus far, similar to SOLR, we'd write a c# program
that runs at startup to mount a drive to blob storage and then poke into
Elasticsearch config files to tell it to the path to data/log files before
starting up elastic. (Mounted drive letters are not persistent so this
would have to happen every time the instance starts up.)

I would very much benefit from any and all experiences you have had
running Elasticsearch on Azure.

Thank you.

-Pete

On Wednesday, November 23, 2011 1:57:55 AM UTC-5, Ronen wrote:

Hi,

Our startup is planning to use elasticsearch as our cloud search
solution, however, we are primarily .NET shop and running on Windows
Azure, which elasticsearch doesn't currently have a plugin for, so we
need to build our own.

I have a budget set aside for it, but in order to reduce TTM I would
prefer it done by someone who's already familiar with the codebase, so
I'd rather outsource it to someone in this group for a reasonable fee.
Naturally the resulting code will be contributed back for everyone
else to use.

We need a solid solution for discovery, efficient persistance of
indices and configuration/logging via Windows Azure standard means.

Any takers?

Regards,
Ronen.

On Wednesday, November 23, 2011 1:57:55 AM UTC-5, Ronen wrote:

Hi,

Our startup is planning to use elasticsearch as our cloud search
solution, however, we are primarily .NET shop and running on Windows
Azure, which elasticsearch doesn't currently have a plugin for, so we
need to build our own.

I have a budget set aside for it, but in order to reduce TTM I would
prefer it done by someone who's already familiar with the codebase, so
I'd rather outsource it to someone in this group for a reasonable fee.
Naturally the resulting code will be contributed back for everyone
else to use.

We need a solid solution for discovery, efficient persistance of
indices and configuration/logging via Windows Azure standard means.

Any takers?

Regards,
Ronen.

--