Elastichsearch CDH Hadoop Integration

I have the problem with elastichsearch integrating with hadoop.
I am new to elasticsearch. I have installed on CDH Single node cluster.
Please someone suggest me what should we keep at
path.data: /path to data

that means, Indexed data location should be hdfs or local directory?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

For best results and performance, point Elasticsearch to your local storage.
It's just like any other service (MySQL, Postgres, etc..)

On 4/1/15 11:48 AM, Ravi sai kumar wrote:

I have the problem with elastichsearch integrating with hadoop.
I am new to elasticsearch. I have installed on CDH Single node cluster. Please someone suggest me what should we keep at
path.data: /path to data

that means, Indexed data location should be hdfs or local directory?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/551BB6DB.7060203%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Costin Leau.
My data will be in terabytes local mode doesnt suits my requirement may be.
If i set the hadoop path,
Are there any changes we have to do for gateway.type?

existing property is
gateway.type: local

On Wednesday, April 1, 2015 at 5:14:16 PM UTC+8, Costin Leau wrote:

For best results and performance, point Elasticsearch to your local
storage.
It's just like any other service (MySQL, Postgres, etc..)

On 4/1/15 11:48 AM, Ravi sai kumar wrote:

I have the problem with elastichsearch integrating with hadoop.
I am new to elasticsearch. I have installed on CDH Single node cluster.
Please someone suggest me what should we keep at
path.data: /path to data

that means, Indexed data location should be hdfs or local directory?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:> <mailto:
elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com

<
https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/05d4ef8a-e1a4-47ef-ac26-7b4aada7cfac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch has no problem handling tera-bytes of data and automatically handles the data sharding, distribution and
replication across multiple machines. You don't need to have that space on one node, ES automatically 'spreads' the data
across all its node - this the whole premise of scaling out.

However assuming you still don't want to do this do note that that HDFS is not a file-system in the traditional sense,
you can't just point things to it. You can however expose it as a local file system through NFS (check your Hadoop
documentation). However, do note that HDFS is significantly slower than a local FS even in this case and in case of
writes might lead to inconsistent/corrupted data as mentioned here [1]. That's not to say it should work rather it's an
unsupported scenario due to the many variables involved outside Elasticsearch itself.

Cheers,

[1] Support for storing indices on HDFS · Issue #9072 · elastic/elasticsearch · GitHub

On 4/1/15 12:23 PM, Ravi sai kumar wrote:

Thanks Costin Leau.
My data will be in terabytes local mode doesnt suits my requirement may be. If i set the hadoop path,
Are there any changes we have to do for gateway.type?

existing property is
gateway.type: local

On Wednesday, April 1, 2015 at 5:14:16 PM UTC+8, Costin Leau wrote:

For best results and performance, point Elasticsearch to your local storage.
It's just like any other service (MySQL, Postgres, etc..)

On 4/1/15 11:48 AM, Ravi sai kumar wrote:
> I have the problem with elastichsearch integrating with hadoop.
> I am new to elasticsearch. I have installed on CDH Single node cluster. Please someone suggest me what should we keep at
> path.data: /path to data
>
> that means, Indexed data location should be hdfs or local directory?
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>elasticsearc...@googlegroups.com <javascript:> <mailto:elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
> To view this discussion on the web visit
>https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com>
> <https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com?utm_medium=email&utm_source=footer
<https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com?utm_medium=email&utm_source=footer>>.

> For more options, visithttps://groups.google.com/d/optout <https://groups.google.com/d/optout>.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/05d4ef8a-e1a4-47ef-ac26-7b4aada7cfac%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/05d4ef8a-e1a4-47ef-ac26-7b4aada7cfac%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/551BC184.3010803%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thank you so much Costin. I will configure it in locally.

On Wednesday, April 1, 2015 at 6:00:02 PM UTC+8, Costin Leau wrote:

Elasticsearch has no problem handling tera-bytes of data and automatically
handles the data sharding, distribution and
replication across multiple machines. You don't need to have that space on
one node, ES automatically 'spreads' the data
across all its node - this the whole premise of scaling out.

However assuming you still don't want to do this do note that that HDFS is
not a file-system in the traditional sense,
you can't just point things to it. You can however expose it as a local
file system through NFS (check your Hadoop
documentation). However, do note that HDFS is significantly slower than a
local FS even in this case and in case of
writes might lead to inconsistent/corrupted data as mentioned here [1].
That's not to say it should work rather it's an
unsupported scenario due to the many variables involved outside
Elasticsearch itself.

Cheers,

[1] Support for storing indices on HDFS · Issue #9072 · elastic/elasticsearch · GitHub

On 4/1/15 12:23 PM, Ravi sai kumar wrote:

Thanks Costin Leau.
My data will be in terabytes local mode doesnt suits my requirement may
be. If i set the hadoop path,
Are there any changes we have to do for gateway.type?

existing property is
gateway.type: local

On Wednesday, April 1, 2015 at 5:14:16 PM UTC+8, Costin Leau wrote:

For best results and performance, point Elasticsearch to your local 

storage.

It's just like any other service (MySQL, Postgres, etc..) 

On 4/1/15 11:48 AM, Ravi sai kumar wrote: 
> I have the problem with elastichsearch integrating with hadoop. 
> I am new to elasticsearch. I have installed on CDH Single node 

cluster. Please someone suggest me what should we keep at

> path.data: /path to data 
> 
> that means, Indexed data location should be hdfs or local 

directory?

> 
> -- 
> You received this message because you are subscribed to the Google 

Groups "elasticsearch" group.

> To unsubscribe from this group and stop receiving emails from it, 

send an email to

>elasticsearc...@googlegroups.com <javascript:> <mailto:

elasticsearch+unsubscribe@googlegroups.com <javascript:> <javascript:>>.

> To view this discussion on the web visit 
>

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com

<

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com>

> <

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com?utm_medium=email&utm_source=footer

<

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com?utm_medium=email&utm_source=footer>>.

> For more options, visithttps://groups.google.com/d/optout <

https://groups.google.com/d/optout>.

-- 
Costin 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:> <mailto:
elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/05d4ef8a-e1a4-47ef-ac26-7b4aada7cfac%40googlegroups.com

<
https://groups.google.com/d/msgid/elasticsearch/05d4ef8a-e1a4-47ef-ac26-7b4aada7cfac%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5a35fd76-c47c-4c7d-8ba4-ebb3d82e940b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Costin,
Does it supports exact value search? rather than % like % search?

On Wednesday, April 1, 2015 at 6:08:45 PM UTC+8, Ravi sai kumar wrote:

Thank you so much Costin. I will configure it in locally.

On Wednesday, April 1, 2015 at 6:00:02 PM UTC+8, Costin Leau wrote:

Elasticsearch has no problem handling tera-bytes of data and
automatically handles the data sharding, distribution and
replication across multiple machines. You don't need to have that space
on one node, ES automatically 'spreads' the data
across all its node - this the whole premise of scaling out.

However assuming you still don't want to do this do note that that HDFS
is not a file-system in the traditional sense,
you can't just point things to it. You can however expose it as a local
file system through NFS (check your Hadoop
documentation). However, do note that HDFS is significantly slower than a
local FS even in this case and in case of
writes might lead to inconsistent/corrupted data as mentioned here [1].
That's not to say it should work rather it's an
unsupported scenario due to the many variables involved outside
Elasticsearch itself.

Cheers,

[1] Support for storing indices on HDFS · Issue #9072 · elastic/elasticsearch · GitHub

On 4/1/15 12:23 PM, Ravi sai kumar wrote:

Thanks Costin Leau.
My data will be in terabytes local mode doesnt suits my requirement may
be. If i set the hadoop path,
Are there any changes we have to do for gateway.type?

existing property is
gateway.type: local

On Wednesday, April 1, 2015 at 5:14:16 PM UTC+8, Costin Leau wrote:

For best results and performance, point Elasticsearch to your local 

storage.

It's just like any other service (MySQL, Postgres, etc..) 

On 4/1/15 11:48 AM, Ravi sai kumar wrote: 
> I have the problem with elastichsearch integrating with hadoop. 
> I am new to elasticsearch. I have installed on CDH Single node 

cluster. Please someone suggest me what should we keep at

> path.data: /path to data 
> 
> that means, Indexed data location should be hdfs or local 

directory?

> 
> -- 
> You received this message because you are subscribed to the 

Google Groups "elasticsearch" group.

> To unsubscribe from this group and stop receiving emails from it, 

send an email to

>elasticsearc...@googlegroups.com <javascript:> <mailto:

elasticsearch+unsubscribe@googlegroups.com <javascript:>>.

> To view this discussion on the web visit 
>

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com

<

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com>

> <

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com?utm_medium=email&utm_source=footer

<

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com?utm_medium=email&utm_source=footer>>.

> For more options, visithttps://groups.google.com/d/optout <

https://groups.google.com/d/optout>.

-- 
Costin 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <mailto:
elasticsearch+unsubscribe@googlegroups.com>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/05d4ef8a-e1a4-47ef-ac26-7b4aada7cfac%40googlegroups.com

<
https://groups.google.com/d/msgid/elasticsearch/05d4ef8a-e1a4-47ef-ac26-7b4aada7cfac%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c729c7c-71bd-4cc4-8335-5e296788be2b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Costin,
Does it supports exact value search? rather than % like % search?

e,g I have two json records, I only want to exact search eventtype is
‘Detection/Aband22oned’ records, but
the eventtype is ‘Detection/Aband22oned ddddd’ records also return to
me. What should be the condition here i have to use?

{

  "_index" : "alarms",

  "_type" : "alarm",

  "_id" : "1",

  "_score" : 1.0,

  "_source":{

  "format" : "CAPAlarm",

  "id" : "al.1",

  "eventtype" : " Detection/Aband22oned ",

  "note" : "Detection/Abandoned at some place",

  "infos" : [ {

        "format" : "CAPInfo",

        "language" : "en-US",

        "description" : "Detection of Abandoned"

  } ]

}},{

  "_index" : "alarm",

  "_type" : "alarm",

  "_id" : "AUx34CqkvZjDPm2oNyN0",

  "_score" : 1.0,

  "_source":{

  "format" : "CAPAlarm",

  "id" : "al.1",

  "eventtype" : "Detection/Aband22oned ddddd",

  "note" : "Detection/Abandoned at some place",

  "infos" : [ {

        "format" : "CAPInfo",

        "language" : "en-US",

        "description" : "Detection of Abandoned"

  } ]

} }

On Wednesday, April 1, 2015 at 6:08:45 PM UTC+8, Ravi sai kumar wrote:

Thank you so much Costin. I will configure it in locally.

On Wednesday, April 1, 2015 at 6:00:02 PM UTC+8, Costin Leau wrote:

Elasticsearch has no problem handling tera-bytes of data and
automatically handles the data sharding, distribution and
replication across multiple machines. You don't need to have that space
on one node, ES automatically 'spreads' the data
across all its node - this the whole premise of scaling out.

However assuming you still don't want to do this do note that that HDFS
is not a file-system in the traditional sense,
you can't just point things to it. You can however expose it as a local
file system through NFS (check your Hadoop
documentation). However, do note that HDFS is significantly slower than
a local FS even in this case and in case of
writes might lead to inconsistent/corrupted data as mentioned here [1].
That's not to say it should work rather it's an
unsupported scenario due to the many variables involved outside
Elasticsearch itself.

Cheers,

[1] Support for storing indices on HDFS · Issue #9072 · elastic/elasticsearch · GitHub

On 4/1/15 12:23 PM, Ravi sai kumar wrote:

Thanks Costin Leau.
My data will be in terabytes local mode doesnt suits my requirement
may be. If i set the hadoop path,
Are there any changes we have to do for gateway.type?

existing property is
gateway.type: local

On Wednesday, April 1, 2015 at 5:14:16 PM UTC+8, Costin Leau wrote:

For best results and performance, point Elasticsearch to your 

local storage.

It's just like any other service (MySQL, Postgres, etc..) 

On 4/1/15 11:48 AM, Ravi sai kumar wrote: 
> I have the problem with elastichsearch integrating with hadoop. 
> I am new to elasticsearch. I have installed on CDH Single node 

cluster. Please someone suggest me what should we keep at

> path.data: /path to data 
> 
> that means, Indexed data location should be hdfs or local 

directory?

> 
> -- 
> You received this message because you are subscribed to the 

Google Groups "elasticsearch" group.

> To unsubscribe from this group and stop receiving emails from 

it, send an email to

>elasticsearc...@googlegroups.com <javascript:> <mailto:

elasticsearch+unsubscribe@googlegroups.com <javascript:>>.

> To view this discussion on the web visit 
>

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com

<

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com>

> <

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com?utm_medium=email&utm_source=footer

<

https://groups.google.com/d/msgid/elasticsearch/b750ba78-360f-47af-8d8b-cb79e9e346f9%40googlegroups.com?utm_medium=email&utm_source=footer>>.

> For more options, visithttps://groups.google.com/d/optout <

https://groups.google.com/d/optout>.

-- 
Costin 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <mailto:
elasticsearch+unsubscribe@googlegroups.com>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/05d4ef8a-e1a4-47ef-ac26-7b4aada7cfac%40googlegroups.com

<
https://groups.google.com/d/msgid/elasticsearch/05d4ef8a-e1a4-47ef-ac26-7b4aada7cfac%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/345b1a23-13a1-4ac0-9ded-1b9cec581e61%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.