0

I have the following csv file:

NAME   DETAILS
abc    type1: Y, Property: p1,p3 , type2:N
def    type1: Y, Property: p2,p3 , type2:N
ghi    type1: N, Property: p1,p2 , type2:Y
jkl    type1: N, Property: p1,p3 , type2:Y

I want to have the ouput file as:

NAME type1 Property type2
abc  Y      p1,p3    N
def  Y      p2,p3    N
ghi  N      p1,p2    Y
jkl  N      p1,p3    Y

Using python and regular expressions, If I split Details column based on ',' the property type splits into separate columns. Is there a way I could deal with this situation?

3
  • 2
    Will the csv module not be useful here? docs are here Commented Apr 6, 2017 at 10:30
  • Are there always 2 properties per line? Commented Apr 6, 2017 at 11:08
  • 1
    Anyway, it does not seem to be a csv, since the first two values per line are not comma-separated. Commented Apr 6, 2017 at 11:11

2 Answers 2

1

There are many way to do this, but I would split each line on any punctuation/whitespace character, and then reconstruct it manually based on what you desire:

import re 

t = """abc    type1: Y, Property: p1,p3 , type2:N
def    type1: Y, Property: p2,p3 , type2:N
ghi    type1: N, Property: p1,p2 , type2:Y
jkl    type1: N, Property: p1,p3 , type2:Y""".split('\n')

for x in t:
    y = re.findall(r"[\w']+", x)
    #print y
    print '\t'.join((y[0],y[2],y[4]+','+y[5],y[7]))

> abc   Y   p1,p3   N
> def   Y   p2,p3   N
> ghi   N   p1,p2   Y
> jkl   N   p1,p3   Y

Another way without regex would be to replace all delimiting characters and then reconstruct automatically. Something like this:

print [x.replace(':','\t').replace(' , ','\t').split() for x in t]
Sign up to request clarification or add additional context in comments.

Comments

0

A sample script that uses regex and group capture to extract data

script.py

#!/usr/bin/env python

import re,sys

def main():
    p = re.compile("([a-z]+).*type1:\s+([A-Z]),\s+Property:\s+?([a-z0-9,]+)\s+,\s+?type2:([A-Z])")

    for line in sys.stdin:
        m = p.match(line)
        if m:   
            print "\t".join([m.group(1), m.group(2), m.group(3), m.group(4)])

if __name__ == "__main__":
    main()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.