0

I have been parsing a GraphViz file for a specific identifer using Regex. Here is the typical content from this file:

node10 [label="second-messenger-mediated signaling\nGO:0019932", fontname=Courier, ...];

node11 [label="inositol phosphate-mediated signaling\nGO:0048016", fontname=Courier, ...];

node12 [label="activation of phospholipase C activity by G-protein coupled receptor protein signaling pathway coupled to IP3 second messenger\n\

GO:0007200", fontname=Courier, ...];

node13 [label="G-protein coupled receptor protein signaling pathway\nGO:0007186", fontname=Courier, ...];

node14 [label="activation of phospholipase C activity\nGO:0007202", fontname=Courier, ...];

node15 [label="elevation of cytosolic calcium ion concentration involved in G-protein signaling coupled to IP3 second messenger\nGO:0051482", fontname=Courier, pos="798,1162", width="9.56", height="0.50"];

Since I am only interested in the nodeid, label and the GO identifier I have used the following regex to match each line:

(node\d*)\s\[label=\"([\w\s-]*).*(GO:\d*)

I know that it's neither terribly elegant nor very efficient but it got the job done except for the line with node12. I have tried using re.DOTALL and re.MULTILINE but to no avail.

Can anyone help me spot the missing piece of the puzzle to make the regex also work with node12 ?

**EDIT:

Here [1] is a link to the file that contains one of those lines.

[1] http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?session_id=7924amigo1292519756&term=GO:0051482&format=dot

3 Answers 3

3

Don't reinvent the wheel.

pydot is a library which parses dot files using pyparsing.

Sign up to request clarification or add additional context in comments.

Comments

2

if you match each line, then node 12 will be splitted in 2 lines...you should read the all file or iter between one node and one other...

3 Comments

Sorry I don't really understand what you are saying. Any way you can be a bit clearer ?
my bad, sorry ;) i'm saying that the problem is the new line. in normal cases, the line would be 'node11 label[etc]\\n; in node 12 there is \\n\\n. so the regex dont work. you can do:
q = open('file').read().replace('\\n\\n', '\\n')...now your regex will work ;)
1

Is the \ after the first line of node12 escaping the line ending?

1 Comment

I have just added a link to the file so you can take a better look at it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.