3

I am reading a .csv file and creating a Panda Dataframe. From this Dataframe I am fetching a value which is supposed to be a "list" item with comma separated values in it. But it comes out as a "string" item and I have to use a separator to split the values in it.

For example : I have a string variable by name "column_names" with below values

column_names = "First_Name, Last_Name,Middle_Name"
column_names = column_name.split(',')

Please note the space before the second value. So when I print this variable, I would be getting a space before the second element which will further create trouble while extracting values from this variable.

print(column_names)

['First_Name', ' Last_Name', 'Middle_Name']

In order to overcome this, if I keep separator to have a space along with actual separator (here it will be ', ' ), then the values are not getting splitted properly as seen below

column_names = "First_Name, Last_Name,Middle_Name"
column_names = column_names.split(', ')
print(column_names)

['First_Name', 'Last_Name,Middle_Name']

Notice the space to the right of comma while splitting. Using this separator, I am able to get only two values instead of three values.

My problem is the variable may contain comma separated values along with a space to the left or right of the comma or there may be no space at all. I have to handle all the cases with a single command (if possible). Something like providing multiple separator values while splitting.

For example : column_names.split(','|', '|' ,').

Not sure whether there is any as such but any pointers to this will be helpful.

3 Answers 3

4

This is a common issue with CSVs. Fortunately, you can nip this in the bud, simply by reading your CSV properly, so you don't have to do all this unnecessary post-processing later.

When reading your dataframe with read_csv, pass a regex to sep\ delimiter -

df = pd.read_csv(..., sep='\s*,\s*', engine='python')

Now, df.columns should be a list of strings.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks @coldspeed . This helps . But just for knowledge. Lets leave pandas. Inorder to split a string variable with multiple separators can I use any of the answers mentioned here ?
@JKC No. You will have to use regular expressions in that case.
@coldspeed I think your answer should contain pd.read_csv instead of pd.DataFrame
Thanks @coldspeed :-)
3

You can make use of skipinitialspace=True parameter:

df = pd.read_csv(filename, sep=',', skipinitialspace=True)

skipinitialspace : boolean, default False

Skip spaces after delimiter.

NOTE: this parameter takes care only of spaces after delimiter, so @cᴏʟᴅsᴘᴇᴇᴅ's answer is more generic.

6 Comments

By the way, does this also work if the space exists before the delimiter?
@cᴏʟᴅsᴘᴇᴇᴅ, no, it'll take care only of spaces after delimiter
Oh, ok. I think that should be fine, but might just want to make it clear that "First_Name , Last_Name ,Middle_Name" for example will retain trailing spaces.
@cᴏʟᴅsᴘᴇᴇᴅ, well Galen's answer would help OP to deal with the list of columns, so it does answer the original question. BUT OP would face a next problem (spaces in values) immediately after that... ;-)
Thanks @MaxU. Good one. But I have to go with Coldspeed's answer as it is more generic.
|
0
import re

column_names = "First_Name , Last_Name,Middle_Name"
l = re.compile("\s*,\s*").split(column_names)
print(l)

2 Comments

Please add some explanation as to how this solves the question.
@Masudul Hasan This is good to split string variables but please explain when you provide an answer so that others users may understand easily. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.