I have this data called text.txt. I also have my code below. I want to extract line values and want to make a table out of it. I also wanted to see if there is a better way to do it. Thanks
test.txt
Counting********************File: bbduk_trimmed_Ago2_SsHV2L_1_CATGGC_L003_R1_001
Seq_132582_1: ATCCGAATTAGTGTAGGGGTTAACATAACTCT:
0
Seq_483974_49238: TCCGAATTAGTGTAGGGGTTAACATAACTC:
73764
Counting********************File: bbduk_trimmed_Ago2_SsHV2L_2_CATTTT_L003_R1_001
Seq_132582_1: ATCCGAATTAGTGTAGGGGTTAACATAACTCT:
0
Seq_483974_49238: TCCGAATTAGTGTAGGGGTTAACATAACTC:
78640
Counting********************File: bbduk_trimmed_Ago2_VF_1_CAACTA_L003_R1_001.fastq
Seq_132582_1: ATCCGAATTAGTGTAGGGGTTAACATAACTCT:
0
Seq_483974_49238: TCCGAATTAGTGTAGGGGTTAACATAACTC:
26267
result I want:
File Name Seq_132582_1 Seq_483974_49238
0 bbduk_trimmed_Ago2_SsHV2L_1_CATGGC_L003_R1_001 0 73764
1 bbduk_trimmed_Ago2_SsHV2L_2_CATTTT_L003_R1_001 0 78640
2 bbduk_trimmed_Ago2_VF_1_CAACTA_L003_R1_001.fastq 0 26267
code I tried:
import sys
if sys.version_info[0] < 3:
raise Exception("Python 3 or a more recent version is required.")
import re
import pandas as pd
text = open("text.txt",'r').read()
print(type(text))
results = re.findall(r'(bbduk_trimmed.*.fastq)\nSeq_132582_1: ATCCGAATTAGTGTAGGGGTTAACATAACTCT: \n(\d)\nSeq_483974_49238: TCCGAATTAGTGTAGGGGTTAACATAACTC: \n(\d*)',text)
df=pd.DataFrame(results)
# df.columns=['FileName','Seq_132582_1','Seq_483974_49238'] #This doesn't work
print(df)