1

I am trying to create a data-frame from a list which has varying lengths for each row.

A sample of the list looks like this (which is how I would like it to)

[(dwstweets gop, broadened, base people), 1]
[(bushs campaign video, features, kat), 2]
[3]
[4]
[5]
[(president obama, wants, york), 6]
[(jeb bush, talked, enforcement), (lets, see, plan), 7]

The code I am using the try and append the list with each row to create the data-frame is:

count = 0;
df2 = pd.DataFrame();
for index, row in df1.iterrows():
  doc = nlp(unicode(row));
  text_ext = textacy.extract.subject_verb_object_triples(doc);
  mylist = list(text_ext) + [index]
  count+=1;
  df2 = df2.append(mylist, ignore_index=True)

However I get the error:

TypeError: object of type 'int' has no len()

I saw there are several questions with this error but as far as I can see they are not caused by the same thing.

How would I go about creating a data-frame with 7 columns that is unique on the index? (I know many of which will be empty for at least 3 of the columns and all columns except the index)

Thanks.

2 Answers 2

2

I suggest create list of tuples first by append by tuples without [index] and then call DataFrame constructor like:

count = 0
L = []
df2 = pd.DataFrame();
for index, row in df1.iterrows():
  doc = nlp(unicode(row))
  text_ext = textacy.extract.subject_verb_object_triples(doc)
  #remove join index 
  mylist = list(text_ext)
  count+=1;
  #append to list
  L.append(mylist)

df2 = pd.DataFrame(L, index=df1.index)
print (df2)
                                         0                  1
1  (dwstweets gop, broadened, base people)               None
2    (bushs campaign video, features, kat)               None
3                                     None               None
4                                     None               None
5                                     None               None
6           (president obama, wants, york)               None
7          (jeb bush, talked, enforcement)  (lets, see, plan)
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your answer - I get the error: ValueError: Shape of passed values is (1, 2), indices imply (1, 3214) - where 3214 is the total number of rows in my sample dataset (though in the future it will be much bigger). How would I resolve this? Otherwise this looks very close to working perfectly!
Hard to know without data what is problem... One idea, how working mylist = list((text_ext)) ?
I ran it again today and it worked perfectly. No clue what the difference was but thank you so much!
0

I believe the error could be in your for loop line in the code:

for index, row in df1.iterrows():

DataFrame.iterrows() returns an iterator object which cannot be used for defining a for loop at least in this case.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html

1 Comment

Hi - it works fine if I use mylist = list(text_ext) instead so I don't think this is the case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.