Input file with custom delimiter

Gopimanikandan_Sengo · January 7, 2015, 11:40am

Hi All,

We are planning to load the data to elastic search from the delimited file.

The file has been delimited with 0x88(ˆ) delimiter.

Can you please let me know how to load the delimited file to Elastic?

Also, Please let me know what is the best and fastest way to load the
millions of data to Elastic search?

SAMPLE:

XXXXXˆYYYYYYˆZZZZ

Thanks,
Gopi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26e8c669-ec2a-481f-86dc-4c7fe4e1039a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dadoonet · January 7, 2015, 12:21pm

Have a look at logstash. It will help you here.

My 2 cents.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 janv. 2015 à 12:40, Gopimanikandan Sengodan gopimanikandan@gmail.com a écrit :

Hi All,

We are planning to load the data to Elasticsearch from the delimited file.

The file has been delimited with 0x88(ˆ) delimiter.

Can you please let me know how to load the delimited file to Elastic?

Also, Please let me know what is the best and fastest way to load the millions of data to Elastic search?

SAMPLE:

XXXXXˆYYYYYYˆZZZZ

Thanks,
Gopi

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26e8c669-ec2a-481f-86dc-4c7fe4e1039a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1F5D8517-43E4-4FC2-984A-2F75C9FA1EDB%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Gopimanikandan_Sengo · January 7, 2015, 12:55pm

Hi David,

Thanks for your suggestions.

I have tried using logstash but this delimiter not working. It loaded in
single column instead of multiple one.

On Wednesday, January 7, 2015 6:00:33 PM UTC+5:30, David Pilato wrote:

Have a look at logstash. It will help you here.

My 2 cents.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 janv. 2015 à 12:40, Gopimanikandan Sengodan <gopiman...@gmail.com
<javascript:>> a écrit :

Hi All,

We are planning to load the data to Elasticsearch from the delimited file.

The file has been delimited with 0x88(ˆ) delimiter.

Can you please let me know how to load the delimited file to Elastic?

Also, Please let me know what is the best and fastest way to load the
millions of data to Elastic search?

SAMPLE:

XXXXXˆYYYYYYˆZZZZ

Thanks,
Gopi

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/26e8c669-ec2a-481f-86dc-4c7fe4e1039a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/26e8c669-ec2a-481f-86dc-4c7fe4e1039a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b04ed2b7-0347-433a-b1d3-f92ff9372d4b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

brian_yoder · January 7, 2015, 3:30pm

Gopi,

You really have a CSV file but using ^ instead of , as your delimiter.

I happen to write my own CSV-to-JSON converter, giving it the options I
needed (including specification or auto-detection of numbers, date format
normalization, auto-creating of the action and meta data line, and so on).
I did this before stumbling across logstash, but still found it easier to
write and maintain this code myself.

Choose the language you wish: I wrote one version of mine in C++ but the
subsequent version in Java. I also wrote a bulk load client in Java to
avoid the limitations of curl (and also its complete lack of existence on
various platforms).

(logstash is much better for log files; my converter is much better for
generic CSV)

I know this isn't exactly the pre-written tool you are looking for. But
converting the CSV (with the option to override the delimiter values) into
JSON isn't very hard to do. And once that's done, it's an easy matter to
add the action and meta data and have a bulk-ready data stream.

Brian

On Wednesday, January 7, 2015 6:40:34 AM UTC-5, Gopimanikandan Sengodan
wrote:

Hi All,

We are planning to load the data to Elasticsearch from the delimited file.

The file has been delimited with 0x88(ˆ) delimiter.

Can you please let me know how to load the delimited file to Elastic?

Also, Please let me know what is the best and fastest way to load the
millions of data to Elastic search?

SAMPLE:

XXXXXˆYYYYYYˆZZZZ

Thanks,
Gopi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c0e6be2e-d94c-4538-89d6-d7afdb6945af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gopimanikandan_Sengo · January 7, 2015, 3:41pm

Thank you brian. Let me change it accodingly as per your suggestion.
Could it possible to share the bulk load client and csv to json converter?
On Jan 7, 2015 9:00 PM, "Brian" brian.from.fl@gmail.com wrote:

Gopi,

You really have a CSV file but using ^ instead of , as your delimiter.

I happen to write my own CSV-to-JSON converter, giving it the options I
needed (including specification or auto-detection of numbers, date format
normalization, auto-creating of the action and meta data line, and so on).
I did this before stumbling across logstash, but still found it easier to
write and maintain this code myself.

Choose the language you wish: I wrote one version of mine in C++ but the
subsequent version in Java. I also wrote a bulk load client in Java to
avoid the limitations of curl (and also its complete lack of existence on
various platforms).

(logstash is much better for log files; my converter is much better for
generic CSV)

I know this isn't exactly the pre-written tool you are looking for. But
converting the CSV (with the option to override the delimiter values) into
JSON isn't very hard to do. And once that's done, it's an easy matter to
add the action and meta data and have a bulk-ready data stream.

Brian

On Wednesday, January 7, 2015 6:40:34 AM UTC-5, Gopimanikandan Sengodan
wrote:

Hi All,

We are planning to load the data to Elasticsearch from the delimited
file.

The file has been delimited with 0x88(ˆ) delimiter.

Can you please let me know how to load the delimited file to Elastic?

Also, Please let me know what is the best and fastest way to load the
millions of data to Elastic search?

SAMPLE:

XXXXXˆYYYYYYˆZZZZ

Thanks,
Gopi

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/j8LIPILQr6s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c0e6be2e-d94c-4538-89d6-d7afdb6945af%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c0e6be2e-d94c-4538-89d6-d7afdb6945af%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABZ89JgFy80u2GuszADMeX6HFig_hmdXjt3O3hTK%2BF9m7pzzVQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

brian_yoder · January 7, 2015, 4:05pm

I wish I could, but currently prohibited. However, I can point you to some
very good Java libraries:

The CSV parser supplied by the Apache project works well:

https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVParser.html

You can override the delimiter using the static CSVFormat newFormat(char
delimiter) method which creates a new CSV format with the specified
delimiter:

https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html

Then use the XContentBuilder cb = jsonBuilder() method call to create a
content builder to convert your records to single-line JSON.

For example, the action and meta data object I use is based on the
following ENUM and toString method to emit as JSON. I've left out the parst
that I use in other custom libraries that allow Java code to easily set up
this information, and also to set this from a search response or a
get-by-id response:

public enum OpType
{
CREATE,
INDEX,
DELETE
}

@Override
public String toString()
{
try
{
XContentBuilder cb = jsonBuilder();
cb.startObject();

  cb.field(opType.toString().toLowerCase());
  cb.startObject();

  cb.field("_index", index);
  cb.field("_type", type);
  if (id != null)
    cb.field("_id", id);

  if (version > 0)
  {
    cb.field("_version", version);
    if (versionType == VersionType.EXTERNAL)
      cb.field("_version_type", "external");
  }

  if (ttl != null)
    cb.field("_ttl", ttl);

  cb.endObject();

  cb.endObject();
  return cb.string();
}
catch (IOException e)
{
  return ("null");
}

}

/* Operation type (action): "create" or "index" or "delete" */
private OpType opType = OpType.INDEX;

/* Metadata that this object supports */
private String index = null;
private String type = null;
private String id = null;
private long version = 0;
private VersionType versionType = VersionType.INTERNAL;
private TimeValue ttl = null;

And the actual data line that would follow is similarly constructed using
the content builder.

I wish I could help you more.

Brian

On Wednesday, January 7, 2015 10:41:26 AM UTC-5, Gopimanikandan Sengodan
wrote:

Thank you brian. Let me change it accodingly as per your suggestion.
Could it possible to share the bulk load client and csv to json converter?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9d46f746-04c6-48fe-93bc-a0c8612539ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gopimanikandan_Sengo · January 7, 2015, 4:08pm

Thank you so much brian. I will make use of this in my project.
On Jan 7, 2015 9:35 PM, "Brian" brian.from.fl@gmail.com wrote:

I wish I could, but currently prohibited. However, I can point you to some
very good Java libraries:

The CSV parser supplied by the Apache project works well:

CSVParser (Apache Commons CSV 1.10.0 API)

You can override the delimiter using the static CSVFormat newFormat(char
delimiter) method which creates a new CSV format with the specified
delimiter:

CSVFormat (Apache Commons CSV 1.10.0 API)

Then use the XContentBuilder cb = jsonBuilder() method call to create a
content builder to convert your records to single-line JSON.

For example, the action and meta data object I use is based on the
following ENUM and toString method to emit as JSON. I've left out the parst
that I use in other custom libraries that allow Java code to easily set up
this information, and also to set this from a search response or a
get-by-id response:

public enum OpType
{
CREATE,
INDEX,
DELETE
}

@Override
public String toString()
{
try
{
XContentBuilder cb = jsonBuilder();
cb.startObject();
  cb.field(opType.toString().toLowerCase());
  cb.startObject();

  cb.field("_index", index);
  cb.field("_type", type);
  if (id != null)
    cb.field("_id", id);

  if (version > 0)
  {
    cb.field("_version", version);
    if (versionType == VersionType.EXTERNAL)
      cb.field("_version_type", "external");
  }

  if (ttl != null)
    cb.field("_ttl", ttl);

  cb.endObject();

  cb.endObject();
  return cb.string();
}
catch (IOException e)
{
  return ("null");
}
}

/* Operation type (action): "create" or "index" or "delete" */
private OpType opType = OpType.INDEX;

/* Metadata that this object supports */
private String index = null;
private String type = null;
private String id = null;
private long version = 0;
private VersionType versionType = VersionType.INTERNAL;
private TimeValue ttl = null;

And the actual data line that would follow is similarly constructed using
the content builder.

I wish I could help you more.

Brian

On Wednesday, January 7, 2015 10:41:26 AM UTC-5, Gopimanikandan Sengodan
wrote:

Thank you brian. Let me change it accodingly as per your suggestion.
Could it possible to share the bulk load client and csv to json converter?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/j8LIPILQr6s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9d46f746-04c6-48fe-93bc-a0c8612539ca%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9d46f746-04c6-48fe-93bc-a0c8612539ca%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABZ89JgyKKLUU5h53Lupas2-fE7a1Ka9fDnzGke9sr0p8TL6%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Bulk importing CSVs with different headers Elasticsearch	9	1494	July 6, 2017
Loading JSON to ElasticSearch Elasticsearch	5	636	July 6, 2017
Load CSV data into ElasticSearch using Logstash Logstash	5	413	May 11, 2020
Is anyway to bulk huge data to ES without rest Elasticsearch	17	550	July 6, 2017
Elasticsearch-hadoop: bulk indexing JSON Elasticsearch	5	571	July 6, 2017

Input file with custom delimiter

Thanks, Gopi

Related topics

Thanks,
Gopi