I am sorry if this is too basic... Essentially, I am using pandas to load a huge CSV file and then convert it to a numpy array for post processing. I appreciate any help!
The issue is that some of the strings were missing during the transformation (from pandas dataframe to numpy array). For example, strings in the column "abstract" was complete see below print datafile["abstract"][0]. However, once I converted them to a numpy array, only a few strings left. see below print df_all[0,3]
import pandas as pd
import csv
import numpy as np
datafile = pd.read_csv(path, header=0)
df_all = pd.np.array(datafile, dtype='string')
header_t = list(datafile.columns.values)
Strings were complete in pandas dataframe`
print datafile["abstract"][0]
In order to test the widely held assumption that homeopathic medicines contain negligible quantities of their major ingredients, six such medicines labeled in Latin as containing arsenic were purchased over the counter and by mail order and their arsenic contents measured. Values determined were similar to those expected from label information in only two of six and were markedly at variance in the remaining four. Arsenic was present in notable quantities in two preparations. Most sales personnel interviewed could not identify arsenic as being an ingredient in these preparations and were therefore incapable of warning the general public of possible dangers from ingestion. No such warnings appeared on the labels.
Strings were incomplete in numpy`
print df_all[0,3]
In order to test the widely held assumption that homeopathic me