0

I am using python 3.9 on Spyder, I receive data frames from a source where I can not control how the data is received. However, I know that the data is grouped under a certain header. When trying to group the data using pandas it is failing. Below is an example of the received dataframes and the output needed.

enter image description here

and below is how I want it to be arranged.

enter image description here

Any ideas on how I can achieve this? Note that I have a very large amount of data so I am searching for a method with reduced memory usage.

Edit: I had a typo in name and age, I also added that the headers are different than name and age such as column1 and column2.

4
  • To be clear: if the name and age are both mis-labelled as the other, then swap the values; otherwise don't change the row? Commented Mar 31, 2021 at 22:46
  • RegEx perhaps? Think that would be the easiest as you have a set format of the incoming data, just look for 0-9 on the right of the = Commented Mar 31, 2021 at 22:46
  • If you don't actually receive a dataframe and instead receive a file that contains "Name: Value" and "Age:Value", then read JSON might point you in the right direction. Commented Mar 31, 2021 at 22:47
  • yes there is a typo i also edited my question for headers and values. Commented Apr 1, 2021 at 1:24

2 Answers 2

1

Assuming your main DataFrame is the traditional variable df:

# Create a copy of the dataframe
df2 = df.copy()

# Look in the Age field where the right-side is non-numeric;
# Set that value to name
df.loc[df2["Age"].str.match(r"^\w+=\D+$"), "Name"] = df2.loc[df2["Age"].str.match(r"^\w+=\D+$"), "Age"]

# Do the opposite for the other field.
df.loc[df2["Name"].str.match(r"^\w+=\d+$"), "Age"] = df2.loc[df2["Name"].str.match(r"^\w+=\d+$"), "Name"]

Output of df:

          Name     Age
0     Age=John  Age=25
1     Name=Roy  Age=36
2   Name=Smith  Age=19
3  Name=Donald  Age=12
4   Name=jason  Age=57
5     Name=joe   Age=1
Sign up to request clarification or add additional context in comments.

2 Comments

in the case where the headers are different than the content as the edit i just added what will be the workaround to achieve this?
I'm not sure what you're asking, but I would just replace Name and Age with the relevant header names. If you need something to automatically sense which columns should be Name and Age, then that might be a little more work, than flipping column values :-)
1

If every value begins with Name=... or Age=..., maybe simple .transform() will help:

df.loc[:, ["Name", "Age"]] = df.loc[:, ["Age", "Name"]].transform(
    sorted, axis=1
)
print(df)

Prints:

          Name     Age
0    Name=John  Age=25
1     Name=Roy  Age=36
2   Name=Smith  Age=19
3  Name=Donald  Age=12
4   Name=Jason  Age=57
5     Name=Joe   Age=1

P.S.: I'm assuming first row should be Name=John, not Age=John (but the code should be the same).

4 Comments

Interesting. You're sorting alphabetically and flip-flopping the columns while doing so? How did you get Name=John to appear at the first Name spot?
@MarkMoretto It was Age=John, but I'm assuming it's a typo (every other row has Name and Age)
For sure. I should have added a wink next to my comment to indicate sarcasm, lol. Nice solution, I'm going to have to keep .transform() in mind for future problems.
yes, there is a typo. I also edited that my headers are different than Name and Age but known as the example in the edit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.