1

I have a CSV file and each word in a sentence is represented in cell, with a null cell between each sentence.

CSV snippet file

My problem is in run_id column, after I load the csv file using pandas I separate each sentence using function "get sent from df" but I've a line of assertion that double check that the run_id is unique and =1 but it fails because it take "Null" as a "Null sentence"

Below is a snippet of my code, I hope you can help

Note : I working on T="test_RE"

def load_dataset(fn,T):

            if T=="test_RE":
          df = pandas.read_csv(fn,
                         sep= ";",
                         header=0,
                         keep_default_na=False)
          df.drop(df.columns[df.columns.str.contains('unnamed',case = False)],axis = 1, inplace = True)
          df.word_id = pd.to_numeric(df.word_id, errors='coerce').astype('Int64')
          df.run_id = pd.to_numeric(df.run_id, errors='coerce').astype('Int64')
          df.sent_id = pd.to_numeric(df.sent_id, errors='coerce').astype('Int64')
          df.head_pred_id = pd.to_numeric(df.head_pred_id, errors='coerce').astype('Int64')
      else:
            df = pandas.read_csv(fn,
                         sep= "\t",
                         header=0,
                         keep_default_na=False)
      print (df.dtypes)

      if T=="train":
        encoder.fit(df.label.values)
        print('this is the IF cond')
        print('df.label.values. shape',df.label.values.shape)

      sents = get_sents_from_df(df)

      print('shape of sents 0',sents[0].shape)
      print('sents[0]',sents[0])
      print('shape of sents 1',sents[1].shape)
      print('sents[1]',sents[1])

      #make sure that all sents agree on run_id

                assert(all([len(set(sent.run_id.values)) == 1
                    for sent in sents])) **ERROR HERE**

the function

def get_sents_from_df( df):

      #Split a data frame by rows accroding to the sentences
      return [df[df.run_id == run_id]
            for run_id
            in sorted(set(df.run_id.values))]

shape of sent 0 is (10,8) which is correct and the sent[0] is correct

but shape of sent1 is (0,8) and of course sent1 isn't printed because it null, I should have sent1 shape = (6,8) any help ?

Image of Output of print statements:

Output of print stsatemts

6
  • [check image below] : there no image and besides that it's always better to post a sample of your input data Commented Jun 20, 2019 at 8:17
  • Yea I added it. Commented Jun 20, 2019 at 8:18
  • can you post an output? Commented Jun 20, 2019 at 8:20
  • @SebastienD posted an image of the output Commented Jun 20, 2019 at 8:27
  • On the first place, always prefer code to screenshots. Second, what is your code supposed to do? What is the desired output? Commented Jun 20, 2019 at 8:32

1 Answer 1

1

To skip the blank rows (which contain both None values and empty strings) , why not just do:

df = df[df.word.apply(lambda x : len(x)>0)]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.