Re: Digest for elasticsearch@googlegroups.com - 25 Messages in 11 Topics

On Fri, Dec 28, 2012 at 11:14 AM, elasticsearch@googlegroups.com wrote:

Today's Topic Summary

Group: http://groups.google.com/group/elasticsearch/topics

  • cloud-aws not working 0.20.1 and 0.19.12?<#13be2ef9ec3e7ff1_group_thread_0>[1 Update]
  • Sample geo region facet <#13be2ef9ec3e7ff1_group_thread_1> [1 Update]
  • How can I handle duplicate data in Elasticsearch?<#13be2ef9ec3e7ff1_group_thread_2>[2 Updates]
  • IntelliJ settings for Elasticsearch (JAR shading issue)<#13be2ef9ec3e7ff1_group_thread_3>[5 Updates]
  • nested type <#13be2ef9ec3e7ff1_group_thread_4> [1 Update]
  • load JSONs from filesystem <#13be2ef9ec3e7ff1_group_thread_5> [2
    Updates]
  • Need some help to get started. <#13be2ef9ec3e7ff1_group_thread_6> [1
    Update]
  • Elastic search and hadoop <#13be2ef9ec3e7ff1_group_thread_7> [9
    Updates]
  • Join in elasticsearch <#13be2ef9ec3e7ff1_group_thread_8> [1 Update]
  • ElasticSearch Java TransportClient: Need to set TransportAddress<#13be2ef9ec3e7ff1_group_thread_9>[1 Update]
  • Any suggestion for duplicate data?<#13be2ef9ec3e7ff1_group_thread_10>[1 Update]

cloud-aws not working 0.20.1 and 0.19.12?http://groups.google.com/group/elasticsearch/t/27e62668aab401d5

Brian Schrock schrock.brian@gmail.com Dec 28 10:41AM -0800

All,

I previously had 0.19.11 (not sure exact version of 19) installed and
working with the aws-plugin for EC2 discovery. I have re-installed
into a
new AMI and regardless of which 0.19.12 and 0.20.1 if I put
"discovery.type: ec2" into the config file neither version will start,
but
with that line commented out Elasticsearch will start.

My problem is that I don't know where to begin troubleshooting, not
much is
entered into the logs even though I configured logging.yml with
"debug" for
as much as I could. The only thing I see in the elasticsearch log is
[INFO
] plugins ] [ec2-hostname] loaded [], sites [cloud-aws], after that
nothing.

How do I go about tracking down whether this is an issue with my EC2
setup
or AMI (security groups, or some other setting I am not aware of) or
something wrong with the plugin?

Thank you for any suggestions or help offered.

p.s. I know the security keys are correct since I am using them to
execute
ec2-* commands.

Sample geo region facethttp://groups.google.com/group/elasticsearch/t/e410c7308c80dcc2

Alexander Reelsen alr@spinscale.de Dec 28 04:20PM +0100

Hi,

holidays, at your parents for too long, split from your better half...
inevitably leads to code :slight_smile:

So I hacked up
https://github.com/spinscale/elasticsearch-facet-georegion

It is a small sample facet, which everyone is encouraged to use for
his own
projects. The georegion facet groups results, which have a geo location
stored in the document, by arbitrary region. My sample includes a
document
for many countries in the world, but you could change that to reflect
only
the states in germany or just the continents by changing the
configuration
to a different file, which includes the data you need (in GeoJSON
format if
I am not mistaken).

When indexing a few towns like Paris, Munich, Berlin and Cologne with
its
correct location (as a geo_point type), the facet will return
"Germany:3"
and "France:1"...

It is slow as hell, so performance improvements or better integration
in
the not too well documented geo features of elasticsearch would be
great as
minor tips :slight_smile: (however I do not intend to develop it any further)

Have fun, and reuse whatever you want from the code.

--Alexander

How can I handle duplicate data in Elasticsearch?http://groups.google.com/group/elasticsearch/t/1878f33cea810e4c

Radu Gheorghe radu.gheorghe@sematext.com Dec 28 05:05PM +0200

Hello,

Assuming you don't have to update rooms and flats too often, you might
be
better off with nested documents.

Back to your original question, you could store the ID of the parent
in the
child when you insert it. Then, you could use the GET or Multi Get[0]
API
to retrieve the parents once you have the children.

Another possibility, which is just as hackish, is to use the IDs of
children that you search for, and use the has_child filter[1] to get
the
parent with that ID. You might want to use an IDs query[2] within that.

[0] http://www.elasticsearch.org/guide/reference/api/multi-get.html
[1]

http://www.elasticsearch.org/guide/reference/query-dsl/has-child-filter.html
[2]

http://www.elasticsearch.org/guide/reference/query-dsl/has-child-filter.html

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Thu, Dec 27, 2012 at 11:40 PM, Burak Emre Kabakcı

Radu Gheorghe radu.gheorghe@sematext.com Dec 28 05:18PM +0200

Hi again :slight_smile:

Actually, you don't need to filter to find the parent, since you have
the
parent IDs when you search for children. You just have to add
"_parent" to
the list of fields you want to be returned when you search. For
example:

curl -XPOST localhost:9200/test/room/_search?pretty=true -d '{
"fields": ["_source", "_parent"],
"query": {
"match_all" :{}
}
}'

May return hits like this:
{
"_index" : "test",
"_type" : "room",
"_id" : "1",
"_score" : 1.0, "_source" : {"name": "room1", "floor": 2},
"fields" : {
"_parent" : "1"
}
}

More details here:
http://www.elasticsearch.org/guide/reference/api/search/fields.html

So once you have the "child" restults, you can use the Multi Get API
to get
the needed parents. No need to duplicate the parent ID in another
field or
other such hacks.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Fri, Dec 28, 2012 at 5:05 PM, Radu Gheorghe

IntelliJ settings for Elasticsearch (JAR shading issue)http://groups.google.com/group/elasticsearch/t/5304c147ee19f76c

David Pilato david@pilato.fr Dec 28 12:18PM +0100

Hi there,

This a question a little bit outside ES scope, but not so far...

I'm trying to play with IntelliJ (instead of Eclipse) for
Elasticsearch.
Everything is fine when I compile/test 0.20 branch in IntelliJ.

My problem comes when I try to have 2 maven projects in the same
workspace (I
should say IntelliJ project):

When I launch tests from spore project, I got the following errors
when starting
ES Nodes (in Java):

java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/JsonParser
at

org.elasticsearch.common.xcontent.XContentType$3.xContent(XContentType.java:85)
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:110)
...

I think it's only a configuration problem as Jackson is shaded in
elasticsearch
jar and does not have this class in the original package. How do you
setup
things in IntelliJ to add breakpoints in Elasticsearch when launching
tests from
another project?

Sorry to be a little bit outside the ES scope.

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Itamar Syn-Hershko itamar@code972.com Dec 28 01:29PM +0200

Can't you do that by adding ES via Maven to your project, and attach
sources? IIRC this worked for me before

David Pilato david@pilato.fr Dec 28 02:32PM +0100

Thanks Itamar, It works fine without importing ES project.

The fact is I can not modify ES source code directly this way. But,
it's
probably a bad practice to launch tests from another project than from
ES
project itself!

Thanks again.

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Itamar Syn-Hershko itamar@code972.com Dec 28 03:50PM +0200

Yeah, I know, tried that too with a few hacks but it kept confusing
IntelliJ.

My workflow is to have ES open in another IntelliJ instance and
whenever I
need to change ES I change it in that other instance, do mvn install
and
have the other IntelliJ instance import it again.

Visual Studio allows you to do that, I just couldn't figure out how
this
can be done with IntelliJ (yet?).

David Pilato david@pilato.fr Dec 28 03:12PM +0100

I was wondering if it's a goog idea to shade libraries when building
the
artifact instead of doing this in the generate-resources phase?
What can be done (I will check that), is to shade all libraries in the
generate-resources phase into the target dir. So it will be available
when
compiling and that way, on the ES project, imports will be done on the
target
package not on the original one.

I don't know yet if it can be done, either if it's a best practice or
not. I
will try it.

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

nested typehttp://groups.google.com/group/elasticsearch/t/2875326d3c3cf3cd

Nicolas Ippolito ippolito.nicolas@gmail.com Dec 28 04:31AM -0800

Hi,

I search "foo" in a nested obj. with:

curl -XGET 'http://localhost:9200/horyou_fr/member/_search' -d '{
"query": {
"nested": {
"path": "contents",
"query": {
"bool" : {
"must" : [

{

"text" : {"contents.content" : "foo"}
}
]
}
}
}
}
}'

The response is:

{

"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 2.098612,
"hits": [
{
"_index": "horyou_fr",
"_type": "member",
"_id": "member-1",
"_score": 2.098612,
"_source": {
"slug": "jacque-selere",
"firstname": "Jacque",
"lastname": "Selere",
"__description": null,
"stars": 0,
"picture":
"5a9a8f24d0131d2ef1f3a1eb75d82a4a/:format/default.jpg",
"contents": [
{
"content": "foo"
},
{
"content": "bar"
}
]
}
}
]
}
}

Is it possible to remove "content": "bar" (which doesn't match my
query)
from the response?

Thanks!

load JSONs from filesystemhttp://groups.google.com/group/elasticsearch/t/98490d96f52ccc31

Valentin pletzer@gmail.com Dec 28 02:38AM -0800

Hi,

I know there is a filesystem river plugin. But can this index JSONs as
well

  • sounds like it is for pdf, doc etc? I have a directory with changing
    content (json-files get overwritten, deleted, added ...) and I would
    love
    to index that in way the filesystem plugin describes it.

Greetings
Valentin

David Pilato david@pilato.fr Dec 28 11:44AM +0100

No. The FSRiver plugin only index documents through the mapping
attachment
plugin. But, it really make sense to add this new feature.
Could you open an issue in the FSRiver plugin to track it?

Thanks for the idea.

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Need some help to get started.http://groups.google.com/group/elasticsearch/t/7e77a15940b317ed

Charly Koza cka@f4-group.com Dec 28 11:30AM +0100

Hi there,

I'm new to elastic search and I'm not sure how to proceed.

I have a lot of documents indexed, and I'd like to search for them
using
their name field, which are composed of many words, using some
fuzziness
to fix any misspelling. And I'd like to sort by closeness between the
query and the name, with exact matches first, and if query exactly
matches name it goes first (having additional words in name should not
be before it), with some custom boosting (negative or positive) based
on
other fields of the documents.

I'd also like to search in title field if name is not specified, but I
don't want searches with both title and name to have a higher score.

And if possible remove any result that are less than half the first
result score.

example :

doc1: {name:'market tower toilet', title:'market tower toilet',
amenity:'toilet'}
doc2: {name:'market tower', tourism:'attraction'}
doc3: {title:'tower market viewpoint'}

I want a negative boost on amenity:'toilet' and a positive one on
tourism:'attraction'

So search market should yield : doc2/doc3/doc1

Right now I have :
query: {
boosting: {
positive: {
bool: {
should: [
{
custom_boost_factor: {
query: {
match_phrase: {
name: search
}
},
boost_factor: 2.0
}
},
{
fuzzy_like_this: {
fields: ["name"],
like_text: search,
boost: 1.0
}
}
]
},
},
negative: {
term: {
amenity: 'toilet'
}
},
negative_boost: 0.2
}
}
but I feel like this is not how I'm supposed to do it.

I'd like some pointer to start in the right direction.

Thanks!
Charly

Elastic search and hadoophttp://groups.google.com/group/elasticsearch/t/c4188e12847781f7

Tan Chween Tah jundatan85@gmail.com Dec 28 12:11PM +0700

hi thanks but the attachment don't seem to be working too .

David Pilato david@pilato.fr Dec 28 07:51AM +0100

If you need help, you have to give some details as "doesn't seem to
work" is not enough to find your problem.

So, what did you do?
What can you see? Is there any errors in logs?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 déc. 2012 à 06:11, Tan Chween Tah jundatan85@gmail.com a écrit
:

hi thanks but the attachment don't seem to be working too .

--

Tan Chween Tah jundatan85@gmail.com Dec 28 02:19PM +0700

i create the file.sh file and place the code
#!/bin/sh

http://cloud.github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.19.10.zip

#unzip elasticsearch-0.19.10.zip

#elasticsearch-0.19.10/bin/plugin -install
elasticsearch/elasticsearch-mapper-attachments/1.6.0

../../usr/local/elasticsearch-0.20.1/bin/elasticsearch

sleep 10

host=localhost:9200

curl -X DELETE "${host}/test"

curl -X PUT "${host}/test" -d '{

"settings" : { "index" : { "number_of_shards" : 1,
"number_of_replicas" : 0
}}

}'

curl -X GET
"${host}/_cluster/health?wait_for_status=green&pretty=1&timeout=5s"

curl -X PUT "${host}/test/attachment/_mapping" -d '{

"attachment" : {

"properties" : {

"file" : {

"type" : "attachment",

"fields" : {

"title" : { "store" : "yes" },

"file" : { "term_vector":"with_positions_offsets", "store":"yes" }

}

}

}

}

}'

curl -C - -O
http://www.intersil.com/content/dam/Intersil/documents/fn67/fn6742.pdf

coded=cat fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'

json="{"file":"${coded}"}"

echo "$json" > json.file

curl -X POST "${host}/test/attachment/" -d @json.file

echo

curl -XPOST "${host}/_refresh"

curl "${host}/_search?pretty=true" -d '{

"fields" : ["title"],

"query" : {

"query_string" : {

"query" : "amplifier"

}

},

"highlight" : {

"fields" : {

"file" : {}

}

}

}'
i run the file.sh from my terminal. it will said
./file.sh: 31: ./file.sh: curl: not found
./file.sh: 39: ./file.sh: curl: not found
./file.sh: 43: ./file.sh: curl: not found
./file.sh: 71: ./file.sh: curl: not found
cat: fn6742.pdf: No such file or directory
./file.sh: 81: ./file.sh: curl: not found

./file.sh: 87: ./file.sh: curl: not found
./file.sh: 91: ./file.sh: curl: not found

David Pilato david@pilato.fr Dec 28 08:36AM +0100

So you don't have curl on your system.
Install it or add it in your Path...

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 déc. 2012 à 08:19, Tan Chween Tah jundatan85@gmail.com a écrit
:

i create the file.sh file and place the code
#!/bin/sh

http://cloud.github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.19.10.zip

#unzip elasticsearch-0.19.10.zip

#elasticsearch-0.19.10/bin/plugin -install
elasticsearch/elasticsearch-mapper-attachments/1.6.0

../../usr/local/elasticsearch-0.20.1/bin/elasticsearch

sleep 10

host=localhost:9200

curl -X DELETE "${host}/test"

curl -X PUT "${host}/test" -d '{

"settings" : { "index" : { "number_of_shards" : 1,
"number_of_replicas" : 0 }}

}'

curl -X GET
"${host}/_cluster/health?wait_for_status=green&pretty=1&timeout=5s"

curl -X PUT "${host}/test/attachment/_mapping" -d '{

"attachment" : {

"properties" : {

"file" : {

"type" : "attachment",

"fields" : {

"title" : { "store" : "yes" },

"file" : { "term_vector":"with_positions_offsets", "store":"yes" }

}

}

}

}

}'

curl -C - -O
http://www.intersil.com/content/dam/Intersil/documents/fn67/fn6742.pdf

coded=cat fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'

json="{"file":"${coded}"}"

echo "$json" > json.file

curl -X POST "${host}/test/attachment/" -d @json.file

echo

curl -XPOST "${host}/_refresh"

curl "${host}/_search?pretty=true" -d '{

"fields" : ["title"],

"query" : {

"query_string" : {

"query" : "amplifier"

}

},

"highlight" : {

"fields" : {

"file" : {}

}

}

}'
i run the file.sh from my terminal. it will said
./file.sh: 31: ./file.sh: curl: not found
./file.sh: 39: ./file.sh: curl: not found
./file.sh: 43: ./file.sh: curl: not found
./file.sh: 71: ./file.sh: curl: not found
cat: fn6742.pdf: No such file or directory
./file.sh: 81: ./file.sh: curl: not found

./file.sh: 87: ./file.sh: curl: not found
./file.sh: 91: ./file.sh: curl: not found

--

Tan Chween Tah jundatan85@gmail.com Dec 28 02:41PM +0700

so u mean curl does not come with eleasticsearch-0.20.1 version?

Tan Chween Tah jundatan85@gmail.com Dec 28 02:54PM +0700

i search online for curl it is for php-curl do i need to install that ?

Ivan Brusic ivan@brusic.com Dec 27 11:58PM -0800

curl is a command-line utility. You should use your operating system's
package manager (yum/apt-get) to install it. Do not install php-curl.

Please use gist (https://gist.github.com/) when posting large amounts
of
code. Easier for everyone to read.

--
Ivan

Tan Chween Tah jundatan85@gmail.com Dec 28 04:01PM +0700

Thanks after installing the curl on my linux system and i ran the
file.sh
file

i was return with some mapping error

{"ok":true,"acknowledged":true}{"ok":true,"acknowledged":true}{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 6,
"active_primary_shards" : 1,
"active_shards" : 1,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}{"error":"MapperParsingException[No handler for type [attachment]
declared
on field [file]]","status":400}** Resuming transfer from byte position
416315
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
0 406k 0 0 0 0 0 0 --:--:-- --:--:--
--:--:-- 0
curl: (33) HTTP server doesn't seem to support byte ranges. Cannot
resume.

{"ok":true,"_index":"test","_type":"attachment","_id":"5vC4T4jOTvKqDzZqqBpG9w","_version":1}
{"ok":true,"_shards":{"total":1,"successful":1,"failed":0}}{
"took" : 78,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}

any idea ?

David Pilato david@pilato.fr Dec 28 10:32AM +0100

The attachment plugin is not correctly setup.
You probably didn't execute: bin/plugin -install
elasticsearch/elasticsearch-mapper-attachments/1.6.0 and restart your
ES
instance.

Have a look at plugin doc
https://github.com/elasticsearch/elasticsearch-mapper-attachments .

Also, look at this tutorial
<
http://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html

.

David.

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Join in elasticsearchhttp://groups.google.com/group/elasticsearch/t/f0af92855ccd28f5

David Pilato david@pilato.fr Dec 28 10:28AM +0100

For information, the same thread is opened on the french ml
<
https://groups.google.com/d/topic/elasticsearch-fr/N7zxurhUKjI/discussion>
.

David

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

ElasticSearch Java TransportClient: Need to set TransportAddresshttp://groups.google.com/group/elasticsearch/t/33614bc3de08da63

karan jaskaran1981@gmail.com Dec 28 01:08AM -0800

Hi David ,

Thanks alot. Ill look at the spring based project.

thanks,
Karan

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-Java-TransportClient-Need-to-set-TransportAddress-tp4027459p4027507.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Any suggestion for duplicate data?http://groups.google.com/group/elasticsearch/t/6f92678a838b26b6

"Burak Emre Kabakcı" emrekabakci@gmail.com Dec 27 06:56PM -0800

Here is the mapping of my index: https://gist.github.com/4394060

I have used parent & child mapping to normalize data but as far as I
understand there is no way to get any fields from _parent document.
Now,
I'm trying to find the best way of storing artist_name and
release_name in
song type. I won't query these fields but I should be able to get them
when
I query other fields like name. There will be millions of song and I
don't
have much memory so I suspect that these duplicate values may cause
out of
memory. For now, artist_name and release_name fields are has "index":
"no"
property and I turned on compression for _source field. Do you have
any
efficient suggestion for avoiding duplicate values like querying
multiple
queries or hacky way to get fields from _parent document or
denormalized
data is the only way to handle this kindle of problem?

You received this message because you are subscribed to the Google Group
elasticsearch.
You can post via email elasticsearch@googlegroups.com.
To unsubscribe from this group, sendelasticsearch+unsubscribe@googlegroups.coman empty message.
For more options, visithttp://groups.google.com/group/elasticsearch/topicsthis group.

--

--