0

I'd like to do the following steps: 1. merge all csv in the same directory 2. create as Dataframe 3. assign columns and drop a column then set one of column ('Type') as index 4. for all file,s I would like to melt column D to end column as rows

file_list = glob.glob("*.csv")

for file in file_list:
    merged_file = pd.read_csv(file)
    print(merged_file)

merged_file = pd.DataFrame()
df.columns = ['Type', 'Country', 'Source']
df = df.drop('Source', axis=1)
df = df.set_index('Type',drop=False).stack().reset_index()

agg_df = pd.melt(df, id_vars = [df, 'Source', 'Country', 'Type'])

df = df.sort_values('Type').reset_index(drop=True)
print(df)

The clear expected output (aligned to the left):

Mineral name - Type - Country - Prod_t_2021
Mineral name - Type - Country - Prod_t_2022
Mineral name - Type - Country - Reserves_t
Mineral name - Type - Country - Reserves_notes

Mineral name could be extracted from Type as string

The source is World.zip from URL: https://www.sciencebase.gov/catalog/item/63b5f411d34e92aad3caa57f

1
  • 1
    Can you show a clear expected output of the final df ? Commented Feb 7, 2023 at 0:42

1 Answer 1

0

IIUC, use this :

df = (
        pd.concat([pd.read_csv(f) for f in file_list], ignore_index=True)
            .melt(id_vars = ["Source", "Country", "Type"])
            .set_index("Type")
            .sort_index()
      )
​

Output :

                              Source          Country         variable                   value
Type                                                                                          
 Mine production: Palladium  MCS2023    United States  Prod_t_est_2022                     NaN
 Mine production: Palladium  MCS2023     South Africa  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023           Russia  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023           Canada  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023    United States  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023    United States      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023           Canada      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023           Russia      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023     South Africa      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023  Other countries   Reserves_notes  Included with platinum
Sign up to request clarification or add additional context in comments.

3 Comments

for instance, how can I add value for prod_t_est_2022 and prod_t_est_2021 as new raws for Palladium production in the United States? (applicable to all minerals and countries)
That's why you need to add a clear expected output in your question.
please refer to the edited question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.