I have a CSV file and each word in a sentence is represented in cell, with a null cell between each sentence.
My problem is in run_id column, after I load the csv file using pandas I separate each sentence using function "get sent from df" but I've a line of assertion that double check that the run_id is unique and =1 but it fails because it take "Null" as a "Null sentence"
Below is a snippet of my code, I hope you can help
Note : I working on T="test_RE"
def load_dataset(fn,T):
if T=="test_RE":
df = pandas.read_csv(fn,
sep= ";",
header=0,
keep_default_na=False)
df.drop(df.columns[df.columns.str.contains('unnamed',case = False)],axis = 1, inplace = True)
df.word_id = pd.to_numeric(df.word_id, errors='coerce').astype('Int64')
df.run_id = pd.to_numeric(df.run_id, errors='coerce').astype('Int64')
df.sent_id = pd.to_numeric(df.sent_id, errors='coerce').astype('Int64')
df.head_pred_id = pd.to_numeric(df.head_pred_id, errors='coerce').astype('Int64')
else:
df = pandas.read_csv(fn,
sep= "\t",
header=0,
keep_default_na=False)
print (df.dtypes)
if T=="train":
encoder.fit(df.label.values)
print('this is the IF cond')
print('df.label.values. shape',df.label.values.shape)
sents = get_sents_from_df(df)
print('shape of sents 0',sents[0].shape)
print('sents[0]',sents[0])
print('shape of sents 1',sents[1].shape)
print('sents[1]',sents[1])
#make sure that all sents agree on run_id
assert(all([len(set(sent.run_id.values)) == 1
for sent in sents])) **ERROR HERE**
the function
def get_sents_from_df( df):
#Split a data frame by rows accroding to the sentences
return [df[df.run_id == run_id]
for run_id
in sorted(set(df.run_id.values))]
shape of sent 0 is (10,8) which is correct and the sent[0] is correct
but shape of sent1 is (0,8) and of course sent1 isn't printed because it null, I should have sent1 shape = (6,8) any help ?
Image of Output of print statements:


[check image below]: there no image and besides that it's always better to post a sample of your input data