2

I have a list of columns in a dataframe that I want to run through and perform an operation on them. the columns hold datetimes or nothing.

For each column in the list, I would like to trim every value in the column that contains "20" in it to the first 10 characters, otherwise leave it as is.

I've tried this a few ways, but get a variety of errors or imperfect results.

The following version throws an error of " 'str' object has no attribute 'apply'", but if I don't use ".astype(str)", then I get an error of " argument of type 'datetime.datetime' is not iterable".

df_combined[dateColumns] = df_combined[dateColumns].fillna(notFoundText).astype(str)
    print (dateColumns)
    for column in dateColumns:
        for row in range(len(column)):
            print(df_combined[column][row])
            if "20" in (df_combined[column][row]):
                df_combined[column][row].apply(lambda x: x[:10], axis=1)
            print(df_combined[column][row])

Halp. Thanks in advance.

0

2 Answers 2

3

Loops are considered an abomination in pandas. I'd recommend just doing something like this, with str.contains + np.where.

for c in df.columns:
    # df[c] = df[c].astype(str) # uncomment this if your columns aren't dtype=str 
    df[c] = np.where(df[c].str.contains("20"), df[c].str[:10], df[c])
Sign up to request clarification or add additional context in comments.

Comments

3

IIUC:

You want to do this over the entire dataframe.
If so, here is a vectorized way using numpy over the entire dataframe at once.

Setup

df = pd.DataFrame([
    ['xxxxxxxx20yyyy', 'z' * 14, 'wwwwwwww20vvvv'],
    ['k' * 14, 'dddddddd20ffff', 'a' * 14]
], columns=list('ABC'))

df

                A               B               C
0  xxxxxxxx20yyyy  zzzzzzzzzzzzzz  wwwwwwww20vvvv
1  kkkkkkkkkkkkkk  dddddddd20ffff  aaaaaaaaaaaaaa

Solution
Using numpy.core.defchararray.find and np.where

from numpy.core.defchararray import find

v = df.values.astype(str)
i, j = np.where(find(v, '20') > -1)

v[i, j] = v[i, j].astype('<U10')

df.loc[:] = v

df

                A               B               C
0      xxxxxxxx20  zzzzzzzzzzzzzz      wwwwwwww20
1  kkkkkkkkkkkkkk      dddddddd20  aaaaaaaaaaaaaa

If you don't want to overwrite the old dataframe, you can create a new one:

pd.DataFrame(v, df.index, df.columns)

                A               B               C
0      xxxxxxxx20  zzzzzzzzzzzzzz      wwwwwwww20
1  kkkkkkkkkkkkkk      dddddddd20  aaaaaaaaaaaaaa

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.