I have data frame in which txt column contains a list. I want to clean the txt column using function clean_text().
data = {'value':['abc.txt', 'cda.txt'], 'txt':['['2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart']',
'['2019/02/01-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart']']}
df = pandas.DataFrame(data=data)
df
value txt
abc.txt ['2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart']
cda.txt ['2019/02/01-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart']
def clean_text(text):
"""
:param text: it is the plain text
:return: cleaned text
"""
patterns = [r"^.{53}",
r"[A-Za-z]+[\d]+[\w]*|[\d]+[A-Za-z]+[\w]*",
r"[-=/':,?${}\[\]-_()>.~" ";+]"]
for p in patterns:
text = re.sub(p, '', text)
return text
My Solution:
df['txt'] = df['txt'].apply(lambda x: clean_text(x))
But I am getting below error: Error
df['txt'] = df['txt'].apply(lambda x: clean_text(x))
AttributeError: 'list' object has no attribute 'apply'
clean_text(df['txt'][1]
TypeError: expected string or bytes-like object
I am not sure how to use numpy.where in this problem.
np.wherein my case?data, this runs fine for me and does not produce an attribute error.dfat. Anyway, not sure what your end-goal is for the data, but this does seem to run and does perform replacementsdf['txt'].apply(lambda x: [clean_text(z) for z in x])