0

I have a pandas dataframe read from file, some of whose columns contain strings, some of which in turn contain substrings separated by semicolons. My goal is to turn the semicolon-separated substrings into lists of strings and put those back in the dataframe.

When I use df.iloc[-1][-1] = df.iloc[-1][-1].split(';'); on a cell that contains a string with semicolons, there's no error but the value df.iloc[-1][-1] is not changed.

When I use

newval          = df.iloc[-1,-1];              newval
newval          = df.iloc[-1,-1].split( ';' ); newval
df.iloc[-1][-1] = newval;                      df.iloc[-1][-1]

It shows the original string for the first line and the list of substrings for the second, but then the original string again for the third. It looks as if nothing has been assigned -- but there was no error message either.

My first guess was that it was not allowed to put a list of strings in a cell that contains strings but a quick test showed me that that is OK:

>>> df = pd.DataFrame([["a", "a;b"], ["a;A", "a;b;A;B"]], index=[1, 2], columns=['A', 'B']);
>>> df
     A        B
1    a      a;b
2  a;A  a;b;A;B
>>> for row in range ( df.shape [ 0 ] ):
...     for col in range ( df.shape [ 1 ] ):
...         value = df.iloc[row][col];        
...         if ( type ( value ) == str ):
...             value = value.split( ';' );
...             df.iloc[row][col] = value;      
>>> df
        A             B
1     [a]        [a, b]
2  [a, A]  [a, b, A, B]

So I'm puzzled why (i) the assignment works in the example but not for my CSV-imported dataframe, and (ii) why python does not give an error message?

1 Answer 1

1

Honestly, you can simplify your code avoiding the loops with a simple applymap. Loops should be avoided with pandas. Here applymap won't necessarily be faster, but it's definitely much easier to use and understand.

out = df.applymap(lambda x: x.split(';'))

output:

        A             B
1     [a]        [a, b]
2  [a, A]  [a, b, A, B]
why your approach failed

You're using df.iloc[row][col] = value which can cause setting value on a copy, you should use df.iloc[row, col] = value instead. Did you get a SettingWithCopyWarning?

not all values are strings:
df.applymap(lambda x: x.split(';') if isinstance(x, str) else x)

Example:

df = pd.DataFrame([["a", 2], ["a;A", "a;b;A;B"]], index=[1, 2], columns=['A', 'B'])
df.applymap(lambda x: x.split(';') if isinstance(x, str) else x)

        A             B
1     [a]             2
2  [a, A]  [a, b, A, B]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.