1

input.txt is tab-delimited.

I know a simple code to replace.

import fileinput
for line in fileinput.FileInput("input.txt",inplace=1):
    line = line.replace("AA","0")
    print line,

However, I want to replace cells of only the 3rd column of input.txt (not the whole file input.txt), and I want to replace a cell by 0 if it is any one of AA or AAA or BB or BBB, replace a cell by 1 if it is not any one of them.

Here, I am talking about "Match entire cell contents"

By "Match entire cell contents" I mean that, it will be replaced only when a cell (such as (2,3)-element of input.txt) is exactly AA or AAA or BB or BBB. A cell such as "AAs" will not be replaced by anything.

On the contrary if "Match entire cell contents" is not applied, then it will be replaced whenever a cell merely "contains" AA or AAA or BB or BBB. So a cell "AAhaha" will be replaced by "0haha"

Anyhow, to repeat, I want to replace cells of only the 3rd column of input.txt (not the whole file input.txt), and I want to replace a cell by 0 if it is any one of AA or AAA or BB or BBB, replace a cell by 1 if it is not any one of them, in a "Match entire cell contents" way.

5
  • @MartijnPieters: if it's a CSV file (well, TSV). I have sometimes encountered tab-delimited data that isn't TSV. Commented Nov 1, 2013 at 12:54
  • @MartijnPieters My input will be txt, tab-delimited, UTF8 without BOM. A txt file can be csv, too? Then how can I check if my input is csv? Commented Nov 1, 2013 at 13:04
  • @user2604484: CSV is a text format; it is any textual file that contains columns of data delimited by a delimiter, be that a comma, a pipe symbol, a tab or anything else. Commented Nov 1, 2013 at 13:12
  • @user2604484: The csv module lets you read and write your format, simply by setting the delimiter to \t. Commented Nov 1, 2013 at 13:12
  • Well, that's all there is to it if you set csv.QUOTE_NONE on the reader. Otherwise csv is not that simple. The questioner needs to find out what the intended meaning is of any " characters in the file, and parse the file accordingly. Commented Nov 1, 2013 at 13:13

2 Answers 2

2
for line in fileinput.FileInput("input.txt",inplace=1):
    cells = line.split('\t')
    cells[2] = '0' if cells[2] in ('AA', 'AAA', 'BB', 'BBB') else '1'
    print '\t'.join(cells),

Beware, though, that I've taken a simplistic view of tab-delimited data. If your file makes use of the whole CSV/TSV format, with quoted cells containing tab characters and/or newlines, then you need csv, which is a proper CSV parser.

Conversely if you want a cell in column 0 containing for example "a" to be output as "a", then you must not use csv, because it will remove the quote marks when reading and not re-insert them on writing because they aren't needed for that cell.

So, first you must be sure how the file format is defined, then you can choose how to read and write it. Either way though, modifying it will be about the same.

One other niggle: I haven't done anything about the linebreak, so it will just sit in the last cell. Therefore, if the third cell is the last cell it will get removed when the cell is replaced by "0" or "1", which probably isn't what you want. And while we're talking about the number of cells, this code will of course throw an exception if any line has fewer than 3 cells. You should decide how you want to handle that, in particular it's not that uncommon to find a blank line at the end of a text file.

Sign up to request clarification or add additional context in comments.

9 Comments

@Steve_Jessop "if the third cell is the last cell it will get removed when the cell is replaced by "0" or "1", which probably isn't what you want." Oh, the 3rd column is indeed likely to be the last column. What should I do then?
My input will be txt, tab-delimited, UTF8 without BOM. A txt file can be csv, too? Then how can I check if my input is csv?
@user2604484: "What should I do then?" -- probably best to take the linebreak off before splitting on \t, then put it back on when printing.
"how can I check if my input is csv?". You don't check whether it's CSV (noting that "tab-separated values" is a variant of CSV that uses a different delimiter instead of comma, so counts as CSV for these purposes). You need to agree with whoever supplies the file what format it will be in. Two identical files can have different meaning according to whether they are designated as TSV, or designated as simple tab-delimited data with one record per file of the file.
per line of the file, I mean.
|
1

You should be using the csv module for this:

import csv
with open("input.txt", "rb") as infile, open("output.txt", "wb") as outfile:
    reader = csv.reader(infile, delimiter="\t")
    writer = csv.writer(outfile, delimiter="\t")
    for row in reader:
        row[2] = "0" if row[2] in ("AAA", "AA", "BBB", "BB") else "1"
        writer.writerow(row)

2 Comments

I ran your program, and it seems the content of input.txt is erased after I run your program. The output.txt seems correct though. So if your program can keep the input.txt as just as it was, then it will be perfect :)
@user2604484: I can't imagine why this would happen since I'm opening input.txt for reading only. Can you recheck?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.