Python regex matching problem

Question

I have been parsing a GraphViz file for a specific identifer using Regex. Here is the typical content from this file:

node10 [label="second-messenger-mediated signaling\nGO:0019932", fontname=Courier, ...];

node11 [label="inositol phosphate-mediated signaling\nGO:0048016", fontname=Courier, ...];

node12 [label="activation of phospholipase C activity by G-protein coupled receptor protein signaling pathway coupled to IP3 second messenger\n\

GO:0007200", fontname=Courier, ...];

node13 [label="G-protein coupled receptor protein signaling pathway\nGO:0007186", fontname=Courier, ...];

node14 [label="activation of phospholipase C activity\nGO:0007202", fontname=Courier, ...];

node15 [label="elevation of cytosolic calcium ion concentration involved in G-protein signaling coupled to IP3 second messenger\nGO:0051482", fontname=Courier, pos="798,1162", width="9.56", height="0.50"];

Since I am only interested in the nodeid, label and the GO identifier I have used the following regex to match each line:

(node\d*)\s\[label=\"([\w\s-]*).*(GO:\d*)

I know that it's neither terribly elegant nor very efficient but it got the job done except for the line with node12. I have tried using re.DOTALL and re.MULTILINE but to no avail.

Can anyone help me spot the missing piece of the puzzle to make the regex also work with node12 ?

**EDIT:

Here [1] is a link to the file that contains one of those lines.

[1] http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?session_id=7924amigo1292519756&term=GO:0051482&format=dot

Katriel · Accepted Answer · 2010-12-20 22:57:18Z

3

Don't reinvent the wheel.

pydot is a library which parses dot files using pyparsing.

answered Dec 20, 2010 at 22:57

Katriel

124k19 gold badges141 silver badges172 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ant · Accepted Answer · 2010-12-20 22:23:17Z

2

if you match each line, then node 12 will be splitted in 2 lines...you should read the all file or iter between one node and one other...

answered Dec 20, 2010 at 22:23

Ant

5,4842 gold badges30 silver badges48 bronze badges

3 Comments

luuke Over a year ago

Sorry I don't really understand what you are saying. Any way you can be a bit clearer ?

Ant Over a year ago

my bad, sorry ;) i'm saying that the problem is the new line. in normal cases, the line would be 'node11 label[etc]\\n; in node 12 there is \\n\\n. so the regex dont work. you can do:

Ant Over a year ago

q = open('file').read().replace('\\n\\n', '\\n')...now your regex will work ;)

Mitro · Accepted Answer · 2010-12-20 22:22:05Z

1

Is the \ after the first line of node12 escaping the line ending?

answered Dec 20, 2010 at 22:22

Mitro

1,6462 gold badges14 silver badges11 bronze badges

1 Comment

luuke Over a year ago

I have just added a link to the file so you can take a better look at it.

Collectives™ on Stack Overflow

Python regex matching problem

3 Answers 3

Comments

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related