1

Hello I have a csv file and it currently has 2 columns with 1000> rows. I want each comma seperated value to be a new column from the one column that it is in.

Here is an example of my csv:

print df4

 keys                                                env
0         FIT-2990                                          3000.0010
1         FIT-2918                                          3000.0004
2         FIT-2854                               2110.0070, 2110.0071
3    UXSCIENCE-640                                          1808.0001
4         FIT-2814                    1135.0017, 1135.0018, 1135.0019
5         FIT-2766                               1908.0043, 1908.0044
6         FIT-2760  1901.0012, 1903.0045, 1906.0020, 1922.0032, 19...
7         FIT-2725                                          0147.0001
8         FIT-2706                               1903.0045, 1922.0032
9         FIT-2554                               1802.0024, 1805.0028
10        FIT-2383                                             , 1910
11        FIT-2339                                          2113.0021
12   UXSCIENCE-438                    4000.0237, 4000.0238, 4000.0339
13        FIT-2201                    2023.0013, 2016.0013, 2019.0013

I want to split ex : 2110,0070 | 2110.0071 into separate columns for the entire csv.

What I got so far..

df5 = df4.join(df4.apply(lambda x: Series(x.split(', '))))
print df5

1 Answer 1

2

You can try str.split and concat:

import pandas as pd
import numpy as np
import io

temp1=u"""keys;env
FIT-2990;3000.0010
FIT-2918;3000.0004
FIT-2854;2110.0070, 2110.0071
UXSCIENCE-640;1808.0001
FIT-2814;1135.0017, 1135.0018, 1135.0019
FIT-2766;1908.0043, 1908.0044
FIT-2760;1901.0012, 1903.0045, 1906.0020, 1922.0032, 19...
FIT-2725;0147.0001
FIT-2706;1903.0045, 1922.0032
FIT-2554;1802.0024, 1805.0028
FIT-2383;, 1910
FIT-2339;2113.0021
UXSCIENCE-438;4000.0237, 4000.0238, 4000.0339
FIT-2201;2023.0013, 2016.0013, 2019.0013"""

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp1),  sep=";", index_col=None)
print df

#faster
df1 = pd.DataFrame([ x.split(',') for x in df['env'].tolist() ])
#slower
df1 = df['env'].str.split(',', expand=True)
print pd.concat([df['keys'], df1], axis=1)
             keys          0           1           2           3       4
0        FIT-2990  3000.0010        None        None        None    None
1        FIT-2918  3000.0004        None        None        None    None
2        FIT-2854  2110.0070   2110.0071        None        None    None
3   UXSCIENCE-640  1808.0001        None        None        None    None
4        FIT-2814  1135.0017   1135.0018   1135.0019        None    None
5        FIT-2766  1908.0043   1908.0044        None        None    None
6        FIT-2760  1901.0012   1903.0045   1906.0020   1922.0032   19...
7        FIT-2725  0147.0001        None        None        None    None
8        FIT-2706  1903.0045   1922.0032        None        None    None
9        FIT-2554  1802.0024   1805.0028        None        None    None
10       FIT-2383                   1910        None        None    None
11       FIT-2339  2113.0021        None        None        None    None
12  UXSCIENCE-438  4000.0237   4000.0238   4000.0339        None    None
13       FIT-2201  2023.0013   2016.0013   2019.0013        None    None 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.