(note: the json file is the direct result from a search of twitter, so it is genuine.)
The errors I got:
type: illegal argument exception / malformed action/metadata line 1; expected START_object or END_object but found [VALUE_STRING], STATUS 400
curl: (3) [globbing] unmatched brace in column 1
curl: (6) could not resolve host: flu,
curl: (6) could not resolve host: _type
curl: (6) could not resolve host: tweets,
curl: (6) could not resolve host: _id
curl: (3) [globbing] unmatched close brace/bracket in column 2
curl: (6) could not resolve host: \n
curl: (3) [globbing] unmatched close brace in column 1
curl: (3) [globbing] unmatched close brace/bracket in column 11
curl: (6) could not resolve host: \n
{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"
Validation Failed: 1: no requests added;"}],"type":"action_request_validation_ex
ception","reason":"Validation Failed: 1: no requests added;"},"status":400}
and when I try this curl:
c:>curl -XPOST "http://localhost:9200/_bulk" --data-binary @I:\ES\flu_tweet_file.json
I get this error:
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]"}],"type":"illegal_argument_exception","reason":"Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]"},"status":400}
In short, I still can't get ES to "import" (index) a (bulk) json file.
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"illegal_argument_exception","reason":"Malformed content, found extra data after parsing: START_OBJECT"}},"status":400}
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"illegal_argument_exception","reason":"Malformed content, found extra data after parsing: START_OBJECT"}},"status":400}
Please point me to the appropriate docs you are referring to. Thanks.
Yes, I did read that document. Several times. Please look at my posts again: I have made attempts to follow the document. It's when the attempts fail that I try other things.
Here are the issues as I see / understand them:
When I use Kibana -Sense (per ES2.1 / Kibana 4.3), I get the same errors as when I use command-line CURL
When I use CURL, I have not found a way to add a new line and complete the {action: statements}.
Based on the posts above and the documents you referred me to which should be simple enough but haven't worked out that way for me:
a) What am I missing?
b) Is my json file in the wrong format (is it pretty-printed -- I didn't design it that way; 'just used the search result as is.
c) Do you need to see the Sense statements and results?
d) What else can I do?
Thanks.
Thank you for your ongoing interest in helping me out.
Task: generate a json file to be indexed later in ES from searching a topic in Twitter. The Twitter search is performed using Python. The resulting json file is "new_tweet_file.json"
Python code is posted below. Search term is "H3N2"
I am open to other ways of performing the same task so long as it yields the correct-fomat json file "importable" to ES for indexing.
Thanks in advance.
from __future__ import division, print_function
import twitter # work with Twitter APIs
import json # methods for working with JSON data
windows_system = True # set to True if this is a Windows computer
if windows_system:
line_termination = '\r\n' # Windows line termination
if (windows_system == False):
line_termination = '\n' # Unix/Linus/Mac line termination
json_filename = 'new_tweet_file.json'
full_text_filename = 'new_tweet_review_file.txt'
partial_text_filename = 'new_tweet_text_file.txt'
def oauth_login():
CONSUMER_KEY = ''
CONSUMER_SECRET = ''
OAUTH_TOKEN = ''
OAUTH_TOKEN_SECRET = ''
auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET,
CONSUMER_KEY, CONSUMER_SECRET)
twitter_api = twitter.Twitter(auth=auth)
return twitter_api
def twitter_search(twitter_api, q, max_results=200, **kw):
search_results = twitter_api.search.tweets(q=q, count=100, **kw)
statuses = search_results['statuses']
max_results = min(1000, max_results)
for _ in range(10): # 10*100 = 1000
try:
next_results = search_results['search_metadata']['next_results']
except KeyError, e: # No more results when next_results doesn't exist
break
kwargs = dict([ kv.split('=')
for kv in next_results[1:].split("&") ])
search_results = twitter_api.search.tweets(**kwargs)
statuses += search_results['statuses']
if len(statuses) > max_results:
break
return statuses
twitter_api = oauth_login()
print(twitter_api) # verify the connection
q = "*H3N2*" # one of many possible search strings
results = twitter_search(twitter_api, q, max_results = 200) # limit to 200 tweets
print('\n\ntype of results:', type(results))
print('\nnumber of results:', len(results))
print('\ntype of results elements:', type(results[0]))
item_count = 0 # initialize count of objects dumped to file
with open(json_filename, 'w') as outfile:
for dict_item in results:
json.dump(dict_item, outfile, encoding = 'utf-8')
item_count = item_count + 1
if item_count < len(results):
outfile.write(line_termination) # new line between items
item_count = 0 # initialize count of objects dumped to file
with open(full_text_filename, 'w') as outfile:
for dict_item in results:
outfile.write('Item index: ' + str(item_count) +\
' -----------------------------------------' + line_termination)
# indent for pretty printing
outfile.write(json.dumps(dict_item, indent = 4))
item_count = item_count + 1
if item_count < len(results):
outfile.write(line_termination) # new line between items
item_count = 0 # initialize count of objects dumped to file
with open(partial_text_filename, 'w') as outfile:
for dict_item in results:
outfile.write(json.dumps(dict_item['text']))
item_count = item_count + 1
if item_count < len(results):
outfile.write(line_termination) # new line between text items
Next step is to index the result in json file format in ES:
Great reference; I've been using it. Ordered another one on the same subject not yet released (Dec 8, I'm told by Amazon). But, I think it is for For Developers.
Need one for real beginners and non-developers who may be data scientists or data enthusiasts.
Analogy: teaching what a driver can do on the freeway or on a race course or obstacle course (a lot of spectacular things) versus teaching or including how to get on the freeway / race course / obstacle course in the first place.
Thanks for your patience. If you have other books, I'm certain to get them.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.