I have many CSV files under subdirectories in one folder. They all contain tweets and other metadata. I am interested in removing most of these metadata and keeping the tweets themselves and their time. I used glob to read the files, and the removing part seems to be working fine. However, I am not sure how to save the output so that all files are saved and with their original file name.
import pandas as pd
import glob
path = r'D:\tweets'
myfiles= glob.glob(r'D:\tweets\**\*.csv', recursive=True)
for f in myfiles:
df = pd.read_csv(f)
df = df.drop(["name", "id","conversation_id","created_at","date"], axis=1)
df = df[df["language"].str.contains("bn|ca|ckbu|id||zh")==False]
df.to_csv("output_filename.csv", index=False, encoding='utf8')
myfiles, your code needs to be part of the for loop. i don't think you would need to overwrite the original file, then something like will helpdf.to_csv(os.path.splitext(f)[0]+"_transformed.csv")