0

I'm trying to compare two files, and to extract lines in the first file that correspond to the second file for the first column. For example:

File 1:

VarID GeneID TaxName PfamName
3810359 1327    Isochrysidaceae Methyltransf_21&Methyltransf_22
6557609 5442    Peridiniales    NULL
4723299 7370    Prorocentrum    PEPCK_ATP
3019317 10454   Dinophyceae     NULL
2821675 10965   Bacillariophyta PK;PK_C
5559318 12824   Dinophyceae     Cyt-b5&FA_desaturase

File 2:

VarID
3810359
6557609
4723299
5893435
4852156

For the output I want this file :

VarID GeneID TaxName PfamName
3810359 1327    Isochrysidaceae Methyltransf_21&Methyltransf_22
6557609 5442    Peridiniales    NULL
4723299 7370    Prorocentrum    PEPCK_ATP

I tried this code :

f1 = sys.argv[1]
f2 = sys.argv[2]

file1_rows = []
with open(f1, 'r') as file1:
    for row in file1:
        file1_rows.append(row.split())

# Read data from the second file
file2_rows = []
with open(f2, 'r') as file2:    
    for row in file2:
        file2_rows.append(row.split())

# Compare data and compute results
results = []
for row in file2_rows:
    if row[:1] in file1_rows:
        results.append(row[:4])
    else:
        results.append(row[:4])

# Print the results
for row in results:
    print(' '.join(row))

Can you please help me ??? Thank you !!

4
  • Please supply output of your code. Commented Apr 5, 2018 at 15:30
  • Load the id of the second file, read the first line by line, if you find the id of the line on the loaded list print it else continue Commented Apr 5, 2018 at 15:31
  • 1
    replace the line: if row[:1] in file1_rows: with if row[0] in file1_rows:. also delete the else Commented Apr 5, 2018 at 15:33
  • @galfisher The output of my code is the entire first column of my first file Commented Apr 5, 2018 at 15:34

1 Answer 1

2

Your problem is here:

if row[:1] in file1_rows:

row[:1] returns a list with 1 field (the first column in the row). instead, search for that row directly.

this is the new code:

if row[0] in file1_rows:

also, remove the else that is associated to this if (I guess this is mistakly added duo to debugging)

There are few other better practices you can do, I wrote them all here:

f1 = sys.argv[1]
f2 = sys.argv[2]

with open(f1, 'r') as file1:
    file1_rows = file1.read().splitlines()

# Read data from the second file
with open(f2, 'r') as file2:    
    file2_rows = file2.read().splitlines()

# Compare data and compute results
results = []
for row2 in file2_rows:
    for row in file1_rows:
        if row2 in row:
            results.append(row)
            break

print('\n'.join(results))
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your answer, I tried like you said, but it prints me nothing
@Erika, I've updated my post, see if it what you meant

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.