1

I have a pandas dataframe consisting of strings, i.e 'P1', 'P2', 'P3', ..., null.

When I try to concatenate this data frame with another, all of the strings get replaced with 'NaN'.

See my code below:

descriptions = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/short_desc.json')
descriptions = descriptions.reset_index(drop=1)
descriptions['desc'] = descriptions.short_desc.apply(operator.itemgetter(0)).apply(operator.itemgetter('what'))
f1=pd.DataFrame(descriptions['desc'])

bugPrior = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/priority.json')
bugPrior = bugPrior.reset_index(drop=1)
bugPrior['priority'] = bugPrior.priority.apply(operator.itemgetter(0)).apply(operator.itemgetter('what'))
f2=pd.DataFrame(bugPrior['priority'])

df = pd.concat([f1,f2])
print(df.head())

The output is as follows:

              desc                                     priority
0    Usability issue with external editors (1GE6IRL)      NaN
1             API - VCM event notification (1G8G6RR)      NaN
2  Would like a way to take a write lock on a tea...      NaN
3  getter/setter code generation drops "F" in ".....      NaN
4  Create Help Index Fails with seemingly incorre...      NaN

Any ideas as to how I might stop this from happening?

Ultimately, my goal is to have everything in a single data frame so that I might removes all rows with "null" values. It would also help later on in the code.

Thanks.

2 Answers 2

2

Assuming you want to concatenate those columns horizontally, you'll need to pass axis=1 to pd.concat, because by default, concatenation is vertical.

df = pd.concat([f1,f2], axis=1)

To drop those NaN rows, you should be able to use df.dropna. Call df.reset_index after.

df = pd.concat([f1, f2], 1)
df = df.dropna().reset_index(drop=True)
print(df.head(10))
                                                desc priority
0  Create Help Index Fails with seemingly incorre...       P3
1  Internal compiler error when compiling switch ...       P3
2  Default text sizes in org.eclipse.jface.resour...       P3
3  [Presentations] [ViewMgmt] Holding mouse down ...       P3
4  Parsing of function declarations in stdio.h is...       P2
5  CCE in RenameResourceAction while renaming ele...       P3
6  Option to prevent cursor from moving off end o...       P3
7        Tasks section in the user doc is very stale       P3
8  Importing existing project with different case...       P3
9  Workspace in use --> choose new workspace but ...       P3

Printing out df.priority.unique(), we see there are 5 unique priorities:

print(df.priority.unique())
array(['P3', 'P2', 'P4', 'P1', 'P5'], dtype=object)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your help, this dataset is driving m nuts already, and this is just the data import!
2

I think the best there is not create DataFrames from columns:

descriptions = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/short_desc.json')
descriptions = descriptions.reset_index(drop=1)

#get Series to f1
f1 = descriptions.short_desc.apply(operator.itemgetter(0)).apply(operator.itemgetter('what'))
print (f1.head())

bugPrior = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/priority.json')
bugPrior = bugPrior.reset_index(drop=1)

#get Series to f2
f2 = bugPrior.priority.apply(operator.itemgetter(0)).apply(operator.itemgetter('what'))
print (f2.head())

Then use same solution as cᴏʟᴅsᴘᴇᴇᴅ answer:

df = pd.concat([f1,f2], axis=1).dropna().reset_index(drop=True)
print (df.head())
                                          short_desc priority
0  Create Help Index Fails with seemingly incorre...       P3
1  Internal compiler error when compiling switch ...       P3
2  Default text sizes in org.eclipse.jface.resour...       P3
3  [Presentations] [ViewMgmt] Holding mouse down ...       P3
4  Parsing of function declarations in stdio.h is...       P2

5 Comments

This is exactly my answer. :)
It's okay. You didn't have to make that edit, but thanks, I appreciate it.
@jezrael Thanks for your answer. I think I might apply your recommendation and just create columns.
@jezrael There are no problems. I actually upvoted your answer.
@cᴏʟᴅsᴘᴇᴇᴅ Thanks ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.