I am trying to index the raw Twitter data. Twitter returns the date in the format "Sun Sep 13 19:12:56 +0000 2015" and for that I have PUT mapping for my index with date as format "EEE MMM dd HH:mm:ss +ZZZZZ yyyy". But I am getting the error:
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:621)
at org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:549)
at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:235)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:406)
... 12 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "Sun Sep 13 19:12:56 +0000 2015" is malformed at "0000 2015"
at org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:754)
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:615)
... 15 more
[2015-09-13 15:12:56,866][WARN ][river.twitter ] There was failures while executing bulk
I also tried "EEE MMM dd HH:mm:ss ZZZZZ yyyy", "date", "dateOptionalTime" but nothing works.
Been stuck on this for a while now! I'd appreciate any help.
Regardless of whether I use the river or Logstash, the question is still the same - what should be the mapping to successfully ingest the date returned by Twitter. In case of choosing the truncated data as is default, this is taken care of. However, when I want the entire stream ("raw" or "full_tweet" option), the date is ingested as a string.
The river uses Twitter4J.
Twitter4J returns a java.util.Date [1] so perhaps the question should be more about how Twitter4J parses whatever Twitter provides in its API?
I think the difference is Twitter4J uses in-built Java date parsing while we use joda which supports timezones differently [1]
If you are not using the Twitter river/Twitter4J/java date parsing call stack and working with raw Twitter records you run into this discrepancy over timezones.
Ok so it seem to be indexing now after I used the mapping as "EEE MMM dd HH:mm:ss +0000 yyyy".
Now one question I have is - will I need to search in the same format? That would be useless, right?
I tried searching for "2015-09-14" in the field but it gave me an error
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[5mltrJcdSMmx-qQon8_m7A][twittertool_untruncated][0]: SearchParseException[[twittertool_untruncated][0]: from[-1],size[-1]:
Parse Failure [Failed to parse source [{\n \"query\": {\n \"match\": {\n \"created_at\": \"2015-09-14\"\n }\n }\n}\n]]]; nested: ElasticsearchParseException[failed to parse date field [2015-09-14], tried both date format [EEE MMM dd HH:mm:ss +0000 yyyy], and timestamp number]; nested: IllegalArgumentException[Invalid format: \"2015-09-14\"]; }
I have the same problem as you and nothing works. I tried the mapping you posted but it does not work either. Can you post you entire mapping so I can check that we are doing the same thing?
Thanks
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.