1

Here is my string that I created by parsing data from a file:

723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1

Ideally I would like this output:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1

Since I was not successful parsing the data and appending it dynamically (I am new to python) I understand that I can get the same desired output by transforming this string.

I researched, tested, and am stuck.

Essentially I need to replace every 3rd instance of the delimiter with a new line (or, maybe something better that anyone can suggest).

Any help is greatly appreciated!

Thanks

2
  • Can you give us an example of what the input file looks like? Commented Dec 23, 2017 at 22:19
  • Sure, it was an xml file and I was parsing a nested segment. Natively python did not understand that each nested segment was independent, so I just parsed it to string knowing that every third piece I can split out at the end, effectively creating a file I can load into a table. Commented Dec 23, 2017 at 22:21

3 Answers 3

5

without regex:

like this:

s = "723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1"

items = s.split("|")
print("\n".join(["|".join(items[i:i+3]) for i in range(0,len(items),3)] ))

note that the [] inside the outer join is on purpose, to get better performance (List comprehension without [ ] in Python) (even if I agree that it's ugly :))

result:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1

BTW with regex it's simple too:

re.sub("(.*?\|.*?\|.*?)\|","\\1\n",s)

but it doesn't work very well if the number of items aren't exactly dividable by 3 (this can be done, but in a more complex way)

Sign up to request clarification or add additional context in comments.

5 Comments

Yeah nice, you got an extra [] inside your print statement that is not needed though (outer join). And you could write it as this too: print('\n'.join('|'.join(i) for i in zip(items[::3], items[1::3], items[2::3])))
the [] is on purpose, for better performance: stackoverflow.com/questions/9060653/…
This worked perfectly...I think I was close, and now I should get things to work. Thanks!
@AntonvBR zip(items[::3], items[1::3], items[2::3] would be better using itertools.islice to avoid creating actual lists. And what if you want to group by 10 elements? that would be tedious :)
I thought it was more readable in this particular case. You got a point again.
0

Using a regex solution:

import re

string = """723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1
723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1|123|NM"""

rx = re.compile(r'(?:[^|]+\|?){1,3}')

for line in string.split("\n"):
    parts = "\n".join([part.group(0).rstrip("|") for part in rx.finditer(line)])
    print(parts)

This yields:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1
723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1
123|NM

See a demo on regex101.com.

3 Comments

this drops the last line if the number of elements aren't a multiple of 3.
@Jean-FrançoisFabre: Updated the expression as well as the demo (note the second line is not dividable by three).
hmmm that's using regex & fixing it afterwards with a lots of strings. That means that your regex101 demo doesn't hold anymore BTW. I'm sure it can be done with a smart regex & no post-processing, but I'm too lazy to try.
0

You can use regular expression and can try this pattern :

import re

pattern=r'\d+\w\|\w+\|\d'
with open('file.txt','r') as f:
    for line in f:
        match=re.findall(pattern,line)
        for i in match:
            print(i)

output:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1

Just for fun in one line :

import re

pattern=r'\d+\w\|\w+\|\d'
for i in [re.findall(pattern,line) for line in open('file.txt','r')][0]:
    print(i)

output:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.