Elastic search questions


(flowdave) #1

Hello List!

I am very interested in using Elastic Search! I've read the archived
mailing list, as well as the documentation, and I have to say that I
am impressed by the features it offers.
Having said that, I've some questions that I have not been able to
answer for myself, though I've tried.

Let me first say that I would prefer to use the java client, rather
than the REST/JSON API.

Having said that, here is first a description of my situation:

I have objects that I would like to index. These objects are all of
the same class, but can have any number of dynamically generated
fields. These fields can only be of certain fixed types.

Here is an example....

Object 1:

"tweet1":{
"type":"tweet",
"guid":{"type":"GUID", "value":"1234"},
"author":{"type":"email_string","value":"me@example.com"},
"description:{"type":"string","value":"a description of the tweet"},
"text":{"type":"string","value":"the actual tweet, 140 characters"},
"length":{"type","int",value="28"},
"user_tagged_relev_score":{"type":"double","value":"3.4"}
}

Object 2:

"tweet2":{
"type":"tweet",
"guid":{"type":"string", "value":"5678"},
"author":{"type":"string","value":"Jeff"},
"description:{"type":"string","value":"a description of the tweet"},
"text":{"type":"string","value":"the actual tweet, 140 characters"},
"location":{"type":"location","value":{"3.4","2.3"}}
}

You can see from these examples that Obj 1 and Obj 2 are both of type
"tweet", but they have different fields, and they can have different
types for fields of the same name (as in the "guid" field which here
has types "string" and "GUID"). Additionally, some fields can be
"complex", as in the case of the "location" field, which is a lat/long
tuple.

I have been able to index these. As per:

$ curl -XPOST localhost:9200/twitter/tweet/1 -d '{ "tweet":{
"type" : "tweet",
"guid": {"type":"GUID", "value":"1234"},
"author": {"type":"email_string", "value":"me@example.com"},
"description":{"type":"string", "value":"this is a tweet"},
"text": {"type":"string", "value":"the actual tweet
140 characters"},
"length": {"type":"int", "value":"28"},
"score": {"type":"double", "value":"34"}
}}'

$ curl -XPOST localhost:9200/twitter/tweet/2 -d '{ "tweet":{
"type": "tweet",
"guid": {"type":"string", "value":"5678"},
"author": {"type":"EMAIL", "value":"you@example.com"},
"description":{"type":"string", "value":"this is a different
tweet"},
"text": {"type":"string", "value":"some test tweet"},
"score": {"type":"int", "value":"1"},
"location": {"type":"location", "value":
{"lat":"2.4","long":"4.0"}}
}}'

$ curl localhost:9200/twitter/tweet/1
{"_index":"twitter","_type":"tweet","_id":"1", "_source" : { "tweet":{
"type" : "tweet",
"guid": {"type":"GUID", "value":"1234"},
"author": {"type":"email_string", "value":"me@example.com"},
"description":{"type":"string", "value":"this is a tweet"},
"text": {"type":"string", "value":"the actual tweet 140
characters"},
"length": {"type":"int", "value":"28"},
"score": {"type":"double", "value":"34"}

$ curl -XGET localhost:9200/twitter/tweet/2
{"_index":"twitter","_type":"tweet","_id":"2", "_source" : { "tweet":{
"type": "tweet",
"guid": {"type":"string", "value":"5678"},
"author": {"type":"EMAIL", "value":"you@example.com"},
"description":{"type":"string", "value":"this is a different
tweet"},
"text": {"type":"string", "value":"some test tweet"},
"score": {"type":"int", "value":"1"},
"location": {"type":"location", "value":{"lat":"2.4","long":"4.0"}}

My question now is about performing searches for these items.
In some cases, I want to be able to specify that I want to search for:
"tweets that have a location field" - this should return 2 and not 1
"tweets that have a 'guid' field of type 'GUID'" - which should return
1 and not 2
"any field that has type='EMAIL' which is of value='me@example.com'" -
which should return 1 and not 2
"a tweet with a 'score' field of any type, so long as it is below 4" -
which should return both 1 and 2

I have not been able to get these searches to work correctly. I have
not found on this list, or in the documentation, any obvious way to
deal with the notion of typed fields.
Perhaps I am looking in the wrong place, but after looking for a week
now, I'm willing to ask the list for help.

I hope I have included enough information to explain my problem here.

Thank you very much.


(system) #2