Hi, I am using bulk api to inject pandas dataframe into elasticsearch index. I first converted dataframe to dict and then used bulk.
In my .csv data file , I have a column name "start_date", I have converted it into datetime using to_datetime and it has some empty rows, when I used bulk. I got this error:
After that I converted my empty rows i.e '' to pd.NaT using replace function, but still I am getting the same error.
Please help how to resolve this issue, and also, Is NaT while injecting into elasticsearch has some issue to be taken care of?
Thanks
It is unclear what the question is here: If you try to insert the empty string (or pd.NaT for that matter) in a date field, then it will complain that the input is not a date - which is indeed true.
The question for you is: What do you want elasticsearch to do for the lines where you have no data for "start_date"? One possible answer could be to just not have the field for the relevant documents - if that is what you want, then you must delete the "start_date" key from your dict for these documents: While elasticsearch will croak at attempting to index the dict {"other_data" : "blabla", "start_date": ""} (for example), it will be quite happy with the dict {"other_data" : "blabla"}, even if other documents do have the "start_date" field...
But, ultimately, the answer depends on what you want from a solution.
Hi @ftr , thanks for your response. I am assuming that in missing date rows if I put pd.NaT, so then when I inject into elasticsearch I should not get any error. Is the assumption correct?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.