1

I currently have a text data file that is just a list of strings separated by newlines. The data is listed in groups so that all values belonging to one group are listed beneath the group title. All of the group titles with their values are listed consecutively. When I use the following basic code to load it I end up with something looking like this.
df = pd.load_csv('file.txt', header = None, sep = '\n')

    0
0  Group One []
1  John
2  Jacob
3  James
4  Group Two []
5  Mary
6  Molly
7  Group Three []
8  Anthony
9  Alan

The Group labels should actually be their own column and correspond to the values underneath them. So the format I am trying to get would look like.

    0           1
0  Group One    John
1  Group One    Jacob
2  Group One    James
3  Group Two    Mary
3  Group Two    Molly
4  Group Three  Anthony
5  Group Three  Alan

I am struggling to figure out how to accomplish this.

1 Answer 1

1

Use Series.str.startswith for match Group values, replace not matched to missing values by Series.where and forward filling them for repeating, last remove rows with same values in both columns and rename column 0:

df.insert(0, 'Group', df[0].where(df[0].str.startswith('Group')).ffill())
df = df[df[0].ne(df['Group'])].rename(columns={0:'Data'})
print (df)
            Group     Data
1    Group One []     John
2    Group One []    Jacob
3    Group One []    James
5    Group Two []     Mary
6    Group Two []    Molly
8  Group Three []  Anthony
9  Group Three []     Alan
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.