How to create a template based on a Lucene document?

Hello everyone,
I'm new to ElasticSearch and I need to migrate an application that is
indexing documents with Lucene to ES.
I set up the environment and run few tests.
Coming from Lucene, I have already defined all my index schema (read
documents) and I want to maintain exactly the same indexing approach.

This is the Lucene document:
public static Document getDocument(ResultSet rs) throws SQLException {
Document doc = new Document();
String postcode = rs.getString("POSTCODE");
doc.add(new Field("postcode", postcode,
Field.Store.YES,Field.Index.ANALYZED));

doc.add(new Field("longitude", 

Util.encodeLongitudeForIndex(rs.getDouble("LONGITUDE")), Field.Store.YES,
Field.Index.NOT_ANALYZED));

doc.add(new Field("latitude", 

Util.encodeLatitudeForIndex(rs.getDouble("LATITUDE")), Field.Store.YES,
Field.Index.NOT_ANALYZED));

return doc;

}

I have some problem understanding how to map this in ES and the meaning of
the syntax to create a schema template.
I created a template like this:

curl -XPUT localhost:9200/_template/*postcode *-d '
{
"template" : "postcode",
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"postcode" : {
"_source" : { "enabled" : false },
"properties" : {
"postcode" : { "type" : "string", "index" : "analyzed",
"store" : "yes" },
"longitude" : { "type" : "string", "index" : "not_analyzed", "store" :
"yes" },
"latitude" : { "type" : "string", "index" : "not_analyzed", "store" : "yes"
}
}
}
}
}

Postcode is the type of the document in Lucene.
Now it looks like I've more objects with the same postcode name in ES:

  1. I need to use postcode in the url so that ES knows what I am talking
    about. This should identify the name of the template.
  2. I need to inform ES that I want to map calls to postcode to this
    template
  3. When I declare the mapping again it looks like I've to provide an id

Am I doing it right?
When I index a document is there a way to look at the indexed document in a
way similar to Luke for Lucene?

Thank you for your help,
Tullio

--

Hiya

I'm not completely sure what you're trying to achieve in your code, but I
think you might have misunderstood some terminology.

In Elasticsearch:

  • An "index" is like a database
  • A "type" is like a table in the database
  • A "mapping" is like the column definition in the table
  • A "document" is like a row in the table

So presumably, you have a "type" called "postcode. You create a mapping
for that type (once).

Then you index documents into that type, specifying: / index / type / ID,
eg "/my_index/postcode/12345"
You can then retrieve that document using the same parameters.

The ID could be the postcode itself.

Apologies if I haven't understood what you're trying to achieve (eg not
sure why you're trying to use templates here?)

Re examining the document: the easiest way is probably just to use the
terms facet to see what values have been indexed.
http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html

Also, you're disabling _source and then setting all of your fields to
stored. This is an anti-pattern. It is almost always more efficient to
retrieve the single _source field than to retrieve a list of stored fields,
as each field requires a disk seek.

clint

On Mon, Jan 21, 2013 at 3:52 PM, Tullio Coppotelli coppotelli@gmail.comwrote:

Hello everyone,
I'm new to ElasticSearch and I need to migrate an application that is
indexing documents with Lucene to ES.
I set up the environment and run few tests.
Coming from Lucene, I have already defined all my index schema (read
documents) and I want to maintain exactly the same indexing approach.

This is the Lucene document:
public static Document getDocument(ResultSet rs) throws SQLException {
Document doc = new Document();
String postcode = rs.getString("POSTCODE");
doc.add(new Field("postcode", postcode,
Field.Store.YES,Field.Index.ANALYZED));

doc.add(new Field("longitude",

Util.encodeLongitudeForIndex(rs.getDouble("LONGITUDE")), Field.Store.YES,
Field.Index.NOT_ANALYZED));

doc.add(new Field("latitude",

Util.encodeLatitudeForIndex(rs.getDouble("LATITUDE")), Field.Store.YES,
Field.Index.NOT_ANALYZED));

return doc;

}

I have some problem understanding how to map this in ES and the meaning of
the syntax to create a schema template.
I created a template like this:

curl -XPUT localhost:9200/_template/*postcode *-d '
{
"template" : "postcode",
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"postcode" : {
"_source" : { "enabled" : false },
"properties" : {
"postcode" : { "type" : "string", "index" : "analyzed",
"store" : "yes" },
"longitude" : { "type" : "string", "index" : "not_analyzed", "store" :
"yes" },
"latitude" : { "type" : "string", "index" : "not_analyzed", "store" :
"yes" }
}
}
}
}

Postcode is the type of the document in Lucene.
Now it looks like I've more objects with the same postcode name in ES:

  1. I need to use postcode in the url so that ES knows what I am
    talking about. This should identify the name of the template.
  2. I need to inform ES that I want to map calls to postcode to this
    template
  3. When I declare the mapping again it looks like I've to provide an id

Am I doing it right?
When I index a document is there a way to look at the indexed document in
a way similar to Luke for Lucene?

Thank you for your help,
Tullio

--

--

Hi Clint,
Thank you for your reply. Actually you helped a lot.
U were totally right, I was confused about the terminology. Moreover I was
using template without a reason.
I was not familiar with the concept of source, since it is not in lucene.
it was there only as a copy and paste error. I'm happy you pointed it out
because now I'm starting to think at the advantages of it.

At the end what I wanted was something like this:
post on http://localhost:9200/myproject/
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
},
"mappings" : {
"postcode" : {
"_source" : { "enabled" : true
},
"properties" : {
"postcode" : { "type" : "string", "index" : "analyzed",
"store" : "no" },
"combined_postcode" : { "type" : "string", "index" : "analyzed", "store" :
"no" },
"longitude" : { "type" : "string", "index" : "not_analyzed", "store" : "no"
},
"latitude" : { "type" : "string", "index" : "not_analyzed", "store" : "no" }
}
}
}

Now I'm moving forward and looking at percolation.
Thank you again,
Tullio

On Tuesday, January 22, 2013 10:57:27 AM UTC, Clinton Gormley wrote:

Hiya

I'm not completely sure what you're trying to achieve in your code, but I
think you might have misunderstood some terminology.

In Elasticsearch:

  • An "index" is like a database
  • A "type" is like a table in the database
  • A "mapping" is like the column definition in the table
  • A "document" is like a row in the table

So presumably, you have a "type" called "postcode. You create a mapping
for that type (once).

Then you index documents into that type, specifying: / index / type / ID,
eg "/my_index/postcode/12345"
You can then retrieve that document using the same parameters.

The ID could be the postcode itself.

Apologies if I haven't understood what you're trying to achieve (eg not
sure why you're trying to use templates here?)

Re examining the document: the easiest way is probably just to use the
terms facet to see what values have been indexed.
http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html

Also, you're disabling _source and then setting all of your fields to
stored. This is an anti-pattern. It is almost always more efficient to
retrieve the single _source field than to retrieve a list of stored fields,
as each field requires a disk seek.

clint

On Mon, Jan 21, 2013 at 3:52 PM, Tullio Coppotelli <coppo...@gmail.com<javascript:>

wrote:

Hello everyone,
I'm new to ElasticSearch and I need to migrate an application that is
indexing documents with Lucene to ES.
I set up the environment and run few tests.
Coming from Lucene, I have already defined all my index schema (read
documents) and I want to maintain exactly the same indexing approach.

This is the Lucene document:
public static Document getDocument(ResultSet rs) throws SQLException {
Document doc = new Document();
String postcode = rs.getString("POSTCODE");
doc.add(new Field("postcode", postcode,
Field.Store.YES,Field.Index.ANALYZED));

doc.add(new Field("longitude", 

Util.encodeLongitudeForIndex(rs.getDouble("LONGITUDE")), Field.Store.YES,
Field.Index.NOT_ANALYZED));

doc.add(new Field("latitude", 

Util.encodeLatitudeForIndex(rs.getDouble("LATITUDE")), Field.Store.YES,
Field.Index.NOT_ANALYZED));

return doc;

}

I have some problem understanding how to map this in ES and the meaning
of the syntax to create a schema template.
I created a template like this:

curl -XPUT localhost:9200/_template/*postcode *-d '
{
"template" : "postcode",
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"postcode" : {
"_source" : { "enabled" : false },
"properties" : {
"postcode" : { "type" : "string", "index" : "analyzed",
"store" : "yes" },
"longitude" : { "type" : "string", "index" : "not_analyzed", "store" :
"yes" },
"latitude" : { "type" : "string", "index" : "not_analyzed", "store" :
"yes" }
}
}
}
}

Postcode is the type of the document in Lucene.
Now it looks like I've more objects with the same postcode name in ES:

  1. I need to use postcode in the url so that ES knows what I am
    talking about. This should identify the name of the template.
  2. I need to inform ES that I want to map calls to postcode to this
    template
  3. When I declare the mapping again it looks like I've to provide an
    id

Am I doing it right?
When I index a document is there a way to look at the indexed document
in a way similar to Luke for Lucene?

Thank you for your help,
Tullio

--

--