Create a dataframe from a list with multiple columns

Question

I want to create a dataframe from a list, the thing is that my column name is also in the list.

List:

['Input_file_column_name,Is_key,Config_file_column_name,Value\nEmployee ID,Y,identifierValue,identityTypeCode:001\nCumb ID,N,identifierValue,identityTypeCode:002\nFirst Name,N,first_Name \nLast Name,N,last_Name   \nEmail,N,email_Address   \nEntityID,N,entity_Id,entity_Id:01\nSourceCode,N,sourceCode,sourceCode:AHRWB\n']

Resulting dataframe:

Input_file_column_name Is_key Config_file_column_name                 Value
0            Employee ID      Y         identifierValue  identityTypeCode:001
1                Cumb ID      N         identifierValue  identityTypeCode:002
5               EntityID      N               entity_Id          entity_Id:01
6             SourceCode      N              sourceCode      sourceCode:AHRWB

How do I convert it? Do I convert the list to a dictionary and then do it or is there a way that it can be done directly?

Code:

import pandas as pd
with open('onboard_config.txt') as myFile:
  text = myFile.read()
result = text.split("regex")
print result 

df=pd.DataFrame[[sub.split(",") for sub in result]]

Your data is a list of a single string, is that the intended input format? — Dani Mesejo
– Dani Mesejo, Commented Dec 7, 2018 at 18:09
No, it is not. The file is in a comma separated form, but I need some other text from the file as well which should not be in the dataframe. Question updated with code. — forsaken
– forsaken, Commented Dec 7, 2018 at 18:11

user3483203 · Accepted Answer · 2018-12-07 18:14:01Z

2

Seems like you need splitlines then convert to Series.str.split

df=pd.Series(l[0].splitlines()).str.split(',',expand=True).T.set_index(0).T.dropna()
df
Out[1183]: 
0 Input_file_column_name          ...                          Value
1            Employee ID          ...           identityTypeCode:001
2                Cumb ID          ...           identityTypeCode:002
6               EntityID          ...                   entity_Id:01
7             SourceCode          ...               sourceCode:AHRWB
[4 rows x 4 columns]

edited Dec 7, 2018 at 18:14

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

answered Dec 7, 2018 at 18:11

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

forsaken Over a year ago

What if I do not want to remove "na" values?

BENY Over a year ago

@ManasJani df=pd.Series(l[0].splitlines()).str.split(',',expand=True).T.set_index(0).T

forsaken Over a year ago

How do I not have the header column with index=0?

BENY Over a year ago

@ManasJani del df.columns.name

Vikika · Accepted Answer · 2018-12-07 18:17:57Z

0

    split=list[0].split('\n')
    df= []
    for i in split:
        df.append(i.split(','))

    columns= df[0]
    df=df[1:]
    pd.DataFrame(df, columns=columns)

This will give you your desired df.

answered Dec 7, 2018 at 18:17

Vikika

3181 silver badge9 bronze badges

Collectives™ on Stack Overflow

Create a dataframe from a list with multiple columns

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related