How to split text file into number of text files using python

Question

I have a huge text file which have a data set like this

EOG61ZHH8   ENSRNOG00000004762  627
EOG61ZHH8   ENSRNOG00000004762  627
EOG61ZHH9   ENSG00000249709 1075
EOG61ZHH9   ENSG00000249709 230
EOG61ZHH9   ENSG00000249709 87
EOG61ZHHB   ENSG00000134030 2347
EOG61ZHHB   ENSG00000134030 3658
EOG61ZHHB   ENSRNOG00000018342  241
EOG61ZHHB   ENSRNOG00000018342  241
EOG61ZHHC   ENSBTAG00000006084  1159
EOG61ZHHC   ENSG00000158828 820
EOG61ZHHC   ENSMMUG00000000126  631

and i want to convert or split it like this

EOG61ZHH8.txt
ENSRNOG00000004762  627
ENSRNOG00000004762  627
EOG61ZHH9.txt
ENSG00000249709 1075
ENSG00000249709 230
ENSG00000249709 87

and so on. I have no clue where to start getting new txt file from the text file above , i have done this thing before but that entries have '[' sign before entry start , now i have many files but not having any special sign to convert them This is the code which i had done in python

with open("entry.txt") as f: 
  for line in f:
    if line[0] == "[":
     if out: out.close()
     out = open(line.split()[1] + ".txt", "w")
   else: out.write(line)'

I am using it in windows , so i knw about linux awk command , so kindly need no information about linux

Why do you have the line if line[0] == "[":? None of the lines you show start with a [. Also, is there supposed to be a bit of empty space at the start of each line? — David Robinson
– David Robinson, Commented Mar 21, 2013 at 21:39
I mention it in my question , i have done this thing before but that entries have '[' sign before entry start , now i have many files but not having any special sign to convert them This is the code which i had done in python — user1850156
– user1850156, Commented Mar 21, 2013 at 21:40
Then why not just take out the line and see if that works? (also, is there meant to be empty space at the start of each line? And are those tab characters or multiple spaces?) — David Robinson
– David Robinson, Commented Mar 21, 2013 at 21:41

David Robinson · Accepted Answer · 2013-03-21 21:47:33Z

1

You need only a few adjustments to your script:

out = None
oldfile = None
with open("entry.txt") as f: 
    for line in f:
        newfile = l.split("\t")[0]
        if newfile != oldfile:
            if out: out.close()
            out = open(newfile + ".txt", "w")
            oldfile = newfile
        out.write("\t".join(line.split("\t")[1:]))

answered Mar 21, 2013 at 21:47

David Robinson

78.8k16 gold badges172 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user1467267 · Accepted Answer · 2013-03-21 22:01:19Z

With regular expressions;

import re

string = '    EOG61ZHH8   ENSRNOG00000004762  627    EOG61ZHH8   ENSRNOG00000004762  627    EOG61ZHH9   ENSG00000249709 1075    EOG61ZHH9   ENSG00000249709 230    EOG61ZHH9   ENSG00000249709 87    EOG61ZHHB   ENSG00000134030 2347    EOG61ZHHB   ENSG00000134030 3658    EOG61ZHHB   ENSRNOG00000018342  241    EOG61ZHHB   ENSRNOG00000018342  241    EOG61ZHHC   ENSBTAG00000006084  1159    EOG61ZHHC   ENSG00000158828 820    EOG61ZHHC   ENSMMUG00000000126  631'

result = re.findall('\s+(.*?)\s+(.*?)\s+(\d+)', string, re.S)

buffer = {}

for i in result:
    if not i[0] in buffer:
        buffer[i[0]] = ''

    buffer[i[0]] = buffer[i[0]] + i[1] + '  ' + i[2] + '\n'

for i in buffer.iteritems():
    print i

    filename = i[0] + '.txt'
    content = i[1] # you could remove the unneeded "\n" here with substring if wanted

    # CODE: Create the file with "filename"

    # CODE: Write "content" to the file

Collectives™ on Stack Overflow

How to split text file into number of text files using python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related