Hi,
I'm trying to index documents in elastic search. I'm using elastic search
1.2.1, from the Java API.
My cluster is remote, 3 nodes on 3 servers (one node on each server),
optimised for indexing (one shard per node, no replication).
For this, i read a CSV file, from which i generate mapping file.
For performance reasons, i try to disable _source, which works, the mapping
i can read after index creation is correct.
The thing is after inserting data, i hava nothing in my docs except the id
generated by ES. If i allow _source field, i only have my data in the
_source field.
Here is how i generate the mapping :
-
XContentBuilder mapping = jsonBuilder()*
-
.startObject()*
-
.startObject("record")*
-
//.startObject("_source").field("enabled",
false).endObject()*
-
.startObject("properties");*
-
for (ColumnMetadata column :
dataset.getMetadata().getColumns()) {*
*
mapping.startObject(column.geName()).field("type",
ESColumnTypeHelper.getESType(column.getType())).field("store",
"yes").field("index", "analyzed").endObject();*
-
}*
-
mapping.endObject()*
-
.endObject()*
-
.endObject();*
Then i create the index :
Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name",
storeClusterName).build();
-
CreateIndexRequestBuilder createIndexRequestBuilder =
client.admin().indices().prepareCreate(datasetName).addMapping("record", *
mapping*);*
-
CreateIndexRequest request = createIndexRequestBuilder.request();*
-
try {*
-
CreateIndexResponse createResponse =
client.admin().indices().create(request).actionGet();*
-
if (!createResponse.isAcknowledged()) {*
-
logger.log(Level.SEVERE, "Index creation not
acknowledged.");*
-
} else {*
-
logger.log(Level.INFO, "Index creation acknowledged.");*
-
}*
-
} catch (IndexAlreadyExistsException iae) {*
-
logger.log(Level.SEVERE, "Index already exists...");*
-
}*
And now how i index using the Bulk API :
-
BulkRequestBuilder bulkRequest = client.prepareBulk();*
-
try {*
-
logger.log(Level.INFO, "Creating records");*
-
for (Record record : records) {*
-
IndexRequestBuilder builder =
client.prepareIndex(datasetName, "record");*
-
XContentBuilder data = jsonBuilder();*
-
data.startObject();*
-
for (ColumnMetadata column :
dataset.getMetadata().getColumns()) {*
-
Object value =
record.getCell(column.getName()).getValue();*
-
if (value == null || (value instanceof String &&
value.equals("NULL"))) {*
-
value = null;*
-
}*
-
data.field(column.getNormalizedName(), value);*
-
}*
-
data.endObject();*
-
builder.setSource(data);*
-
bulkRequest.add(builder);*
-
logger.log(Level.INFO, "Creating records");*
-
}*
-
logger.log(Level.INFO, "Created "+
bulkRequest.numberOfActions() +" records");*
-
BulkResponse bulkResponse = bulkRequest.execute().actionGet();*
-
if (bulkResponse.hasFailures()) {*
-
logger.log(Level.SEVERE, "Could not index : " +
bulkResponse.buildFailureMessage());*
-
}*
Now for the resulting data. First the mapping i read using cURL (in this
one, i allow _default) :
curl -XGET 'http://myserver:9200/realestateagencies/_mapping/record'
*{ "realestateagencies" : *
- { "mappings" : *
-
{ "record" : *
-
{ "properties" : *
-
{*
-
"agencystatus" : { "store" : true,*
-
"type" : "string"*
-
},*
-
"cardnumber" : { "store" : true,*
-
"type" : "string"*
-
},*
-
"city_id" : { "store" : true,*
-
"type" : "string"*
-
},*
-
"companyname" : { "store" : true,*
-
"type" : "string"*
-
},*
[REMOVED SOME MAPPING TO SHORTEN DISPLAY]
-
"streetlabel" : { "store" : true,*
-
"type" : "string"*
-
},*
-
"streetnumber" : { "store" : true,*
-
"type" : "string"*
-
},*
-
"streetnumbercomplement" : { "store" : true,*
-
"type" : "string"*
-
},*
-
"streettype" : { "store" : true,*
-
"type" : "string"*
-
},*
-
"summarizedagency_id" : { "store" : true,*
-
"type" : "string"*
-
},*
-
"updatedate" : { "store" : true,*
-
"type" : "string"*
-
},*
-
"websiteurl" : { "store" : true,*
-
"type" : "string"*
-
}*
-
} *
-
}*
-
}*
-
}*
- }*
Can't see any indexed field in my mapping, even i explicitly gave it a
value in my mapping ( .field("index", "analyzed") ), but i suppose it's
because index : analyzed is the default value.
After that, the result of a query on my index type gives this :
{
- took: 7
- timed_out: false
-
_shards: {
- total: 3
- successful: 3
-
failed: 0
}
-
hits: {
- total: 100000
- max_score: 1
-
hits: [
-
{
- _index: realestateagencies
- _type: record
- _id: c2yWW2S3TyKkJGGFpgVS4g
- _score: 1
-
_source: {
- id: 83163
- crawlsource: 1
- deletedate: null
- updatedate: null
- dealerkind: 1
- email: null
- name: Agence Principale - Colombes
- phonenumber: 0142423333
- latitude: 0
- longitude: 0
- normalized: 6RUEGABRIELPERI92700COLOMBES
- original: - 6 RUE GABRIEL PERI 92700 COLOMBES
- street1: 6 rue Gabriel Peri
- street2: null
- streetlabel: Gabriel Peri
- streetnumber: 6
- streetnumbercomplement: 0
- streettype: 20
- cardnumber: null
- companyname: null
- createdate: 1969-12-31T23:00:00.000Z
- faxnumber: null
- logourl: null
- normalizedname: AGENCEPRINCIPALECOLOMBES
- rcsnumber: null
- sirennumber: 450499298
- siretnumber: null
- websiteurl: null
- agencystatus: 1
- keyperportal: 484737|AGENCEPRINCIPALECOLOMBES
- reconciliationpolicy: 1
- city_id: 17988
-
summarizedagency_id: 408837
}
}, [REST IS REMOVED]
-
{
And of course if I disable _source field, resulting docs are empty.
I cannot see what i'm doing wrong here. Anyone can see something wrong ?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d226606c-fbb4-4167-87ec-92f2f8fe7728%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.