I am trying to update a csv that has each line with multiple single quoted strings to one that replaces these strings to literal. but it puts all data in just first line in output. Can someone suggest what is the issue in the below code:
import pandas as pd
import re
df=pd.read_csv("t1.csv");
col1=df['col1']
col2=re.sub(r'\'([^\']*)\'','const',str(col1))
col3 = pd.Series(col2)
df['col1']=col3
df.to_csv('t_u.csv')
exit()
the file t1.csv has data like below:
col1
This one has 'many' 'such' 'quotes' in it.
Now it does not.
But 'this' 'one' does 'have' it 'too'.
The output generated has data like below ...which is wrong since it has only one line:
col1
0 "0 This one has const const const in it.
1 Now it does not.
2 But const const does const it const.
Name: col1, dtype: object"
1
2
What happened here is that all the 3 lines just got combined into a single line in the final output, whereas I am looking to have output of resulting csv to have same format - of 3 lines with the required changes.
df.to_dict('tight')after the import (df=pd.read_csv("t1.csv")) for reproducibilitystr(col1), which converts the whole column into one string. Instead, see pandas applying regex to replace values. See also the docs: 10 minutes to pandas § String Methodsprint(df.to_dict('tight')). Most likely, you needdf['col1'] = df['col1'].str.replace(r'\'([^\']*)\'', 'const', regex=True)