Python: text file replace different strings in multiple lines HOW?

Question

Basic task: converted a URL request into text, and dumped it to a text file (almost a usable CSV).

Goal: A clean CSV. On multiple lines, I'm trying to replace multiple (different) characters:

brackets, tildes (~), extra commas at the end of each line.

I cannot find any relatively simple-to-follow examples to accomplish this. Looking for something that can cycle line by line and replace.

PLEASE NOTE: I expect this file to be large over time, so not memory friendly.

Below is the code that created the file:

import urllib.request
with urllib.request.urlopen(URL1) as response:
    data = response.read()
decoded_data = data.decode(encoding='UTF-8')

str_data = str(decoded_data)
saveFile = open("test.txt",'w')
saveFile.write(str_data)
saveFile.close()

Here is a simplified sample from the file, the first line has the field names, 2nd and 3rd lines represent records.

[["F1","F2","F3","F4","F5","F6"],

["string11","string12","string13","s~ring14","string15","string16"],

["string21","string22","s~ring23","string24","string25","string26"]]

Highstaker · Accepted Answer · 2017-02-16 06:03:34Z

2

If you want to replace characters in the beginning or end of a string, use strip. If the character you want to remove has an arbitrary position, use replace instead, like this: line.replace("~",""). Note, that, unlike strip, you cannot specify several characters in one replace call, but you can chain them, like this: line.replace("~","").replace(",","").replace("[","")

Just a quick mockup of what might work for you:

with open("text.txt", 'r') as f:
    with open("result.txt", 'w') as new_f:
        for line in f:
            new_line = line.strip(" [],\n\t\r").replace("~","")
            print(new_line)
            new_f.write(new_line+"\n")

since I see that tildes can be anywhere and brackets and commas generally appear at ends. I have also added "\n", "\t", "\r" and a space in strip, because these characters may (at least, "\n" will for sure) appear at the end of each line.

answered Feb 16, 2017 at 6:03

Highstaker

1,0852 gold badges13 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

marucho21 Over a year ago

Yes, this did it. Perfect! THANK YOU!! :-) wow. Handles both the tilde(s) and the brackets.

marucho21 Over a year ago

Found the original reason for the brackets entering the text file. The URL goes to a JSON that is meant to convey a table of data (so columns and rows). The issue was I could not find a solid example that would show it. Below I re-posted my code with corrections. Note the "scrubber" above is not in my corrected code.

Luis Michaelis · Accepted Answer · 2017-02-16 05:56:01Z

0

You could use a simple for-loop to iterate through the file. Then you could replace the characters in each line

file = open("text.txt", "r")
clean_txt = ""
for line in file:
    line = line.replace("~", "").replace("[","").replace("]","")
    line[len(line)-1] = "" #Replace the last character of the line.
file.close
w = open("text.txt", "w")
w.write(clean_txt)
w.close

answered Feb 16, 2017 at 5:56

Luis Michaelis

601 silver badge9 bronze badges

1 Comment

marucho21 Over a year ago

Thanks for the input. It actually deletes all the contents of the file. I tried this approach before I posted. When I did get it to work, it would only "perform surgery" on the first line." Looking for something that will go through the file.

marucho21 · Accepted Answer · 2017-02-16 18:48:44Z

#!/usr/bin/env python3

# Note, I used the print function as a way to visually confirm the code worked.
# the URL_call will yield a byte that has serialized data for a basic table (columns and rows, where first row are column names -- just like Excel or SQL)

URL_call = ("http://www.zzz.com/blabla.html")

# URLIB module & function: the request has to be first decoded from UTF-8
import urllib.request
with urllib.request.urlopen(URL_call) as response:
    URL_data = response.read()

URL_data_decoded = URL_data.decode(encoding='UTF-8')

# use json to convert decoded response into a python structure (from a JSON structure)
import json
URL_data_JSON = json.loads(URL_data_decoded)

# pandas will transition the python data structure from a "list-like" array to a table.
import pandas as pd
URL_data_panda = pd.DataFrame(URL_data_JSON)

# this will create the text (in this case a CSV) file
URL_data_panda.to_csv("test.csv")

# The file will need the first row removed (columns are indexed coming out of the panda)

#determine line count
num_lines = sum(1 for line in open("test.csv"))

print(num_lines)

# the zero position is assigned to the first row of text. Writing from the second row (indexed as 1) get the removal done.
lines = open("test.csv").readlines()
open("test2.csv","w").writelines(lines[1:(num_lines)])


# Changes the name of the first column from zero to a normalized name.

import fileinput

# Note, below you could setup a back-up file, in the file input, by adding an extra argument in the parens ("test2.csv", inplace=True, backup='.bak')
with fileinput.FileInput("test2.csv", inplace=True) as file:
    for line in file:
        print(line.replace("0,", "REC_NUM,"), end='')

Collectives™ on Stack Overflow

Python: text file replace different strings in multiple lines HOW?

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related