2

I'm very new to python so there may be a simple solution here. I'm trying to clean a data set about rent prices/square footage within a panda data frame. My data column for bedrooms includes information about bedrooms AND square feet. Most of the entries are formatted like "/ 1br - 950ft²" but some are "/ 1br" and some are "/950ft²". I'm trying to create a clean column with just bedrooms, but because of formatting I can't just split the string after a certain character.

I've decided I need to create a function to test for if the string contains "br", but I'm getting an error.

Here's my code:

def cleaned_bedrooms(x):
    if df[df['bedrooms'].str.contains('br')]:
        df['bedrooms'] = df['bedrooms'].str.split('-').str[0]
    else:
        return None
df['bedrooms'].map(cleaned_bedrooms)

I seem to have set up a boolean function though (I assume triggered by the if statement), because the error I'm getting is "ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." for the line containing the .map(cleaned_bedrooms)

1
  • Could you edit your question to include the full traceback you're getting? Commented Oct 2, 2017 at 19:30

1 Answer 1

1

If this is your dataframe,

    bedrooms
0   / 1br - 950ft²
1   / 1br
2   /950ft²

You can use str.extract to extract bedrooms

df['bedrooms'] = df['bedrooms'].str.extract('(\d+?br)', expand = False)

You get

    bedrooms
0   1br
1   1br
2   NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.