I'm trying to remove certain strings from a data-frame column, just would like to know how to achieve that in a better way , one way is with multiple replace but i want to avoid that.
Raw_Data
ctflex08 | SUCCESS | rc=0 | (stdout) server ntp-tichmond minpoll 4 maxpoll 10\nserver ntp-tichmond-b minpoll 4 maxpoll 10\nserver 127.127.1.0
ctfclx806 | SUCCESS | rc=0 | (stdout) server ntp-mary.example.com
ctfclx802 | SUCCESS | rc=0 | (stdout) server ntp-mary.example.com
ti-goyala | SUCCESS | rc=0 | (stdout) server ntp-tichmond minpoll 4 maxpoll 10\nserver ntp-tichmond-b minpoll 4 maxpoll 10
Data-frame Structure:
import pandas as pd
matchObj = ['(stdout)', 'server', 'minpoll', 'maxpoll' ]
df = pd.read_csv('ntp_server.txt', sep="|" , names=['Linux_Hosts', 'Host_Dist_version'])
df['Host_Dist_version'] = df['Host_Dist_version'].replace("server", '',regex=True).replace("minpoll", '',regex=True)
print(df)
Current Output:
Linux_Hosts Host_Dist_version
ctflex08 SUCCESS rc=0 (stdout) ntp-tichmond 4 maxpoll 10\n ntp-ti...
ctfclx806 SUCCESS rc=0 (stdout) ntp-mary.example.com
ctfclx802 SUCCESS rc=0 (stdout) ntp-mary.example.com
ti-goyala SUCCESS rc=0 (stdout) ntp-tichmond 4 maxpoll 10\n ntp-ti...
Expected Output:
Linux_Hosts Host_Dist_version
ctflex08 ntp-tichmond ntp-tichmond-b
ctfclx806 ntp-mary.example.com
ctfclx802 ntp-mary.example.com
ti-goyala ntp-tichmond ntp-tichmond-b
Is there a efficient way to Just pick the required strings and rest remove or mask them, eg ['ntp-mary', 'ntp-tichmond', 'ntp-tichmond-b'] just see these list values and pick them only and leave the rest.
While replacing the some special chars and strings its not working like..
SUCCESSS treated as a keyword and \n also not being removed.