Text File data parsing lines and output as columns

Question

I'm trying to parse a test file. the file has username, address and phone in the following format:

Name: John Doe1
address : somewhere
phone: 123-123-1234

Name: John Doe2
address : somewhere
phone: 123-123-1233

Name: John Doe3
address : somewhere
phone: 123-123-1232

Only for almost 10k users: ) what I would like to do is convert those rows to columns, for example:

Name: John Doe1                address : somewhere          phone: 123-123-1234
Name: John Doe2                address : somewhere          phone: 123-123-1233
Name: John Doe3                address : somewhere          phone: 123-123-1232

I would prefer to do it in bash but if you know how to do it in python that would be great too, the file that has this information is in /root/docs/information. Any tips or help would be much appreciated.

Good initial question, @tafiela. But, don't forget to point in the next questions what you have tried to do. — Yamaneko
– Yamaneko, Commented Oct 11, 2012 at 3:25

Steve · Accepted Answer · 2012-10-11 03:00:00Z

5

One way with GNU awk:

awk 'BEGIN { FS="\n"; RS=""; OFS="\t\t" } { print $1, $2, $3 }' file.txt

Results:

Name: John Doe1     address : somewhere     phone: 123-123-1234
Name: John Doe2     address : somewhere     phone: 123-123-1233
Name: John Doe3     address : somewhere     phone: 123-123-1232

Note that, I've set the output file separator (OFS) to two tab characters (\t\t). You can change this to whatever character or set of characters you please. HTH.

answered Oct 11, 2012 at 3:00

Steve

55.1k13 gold badges94 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Steve Over a year ago

@VictorHugo: RS is short for record separator. By default RS is set to \n or newline. This allows awk to process the file line by line. When we set it to nothing (or ""), we're actually changing awk's definition of a line. Since each of the records are separated by empty lines, setting RS="" makes for an easy solution. HTH.

Gilles Quénot · Accepted Answer · 2012-10-11 13:11:46Z

3

With a short Perl one-liner :

$ perl -ne 'END{print "\n"}chomp; /^$/ ? print "\n" : print "$_\t\t"' file.txt

OUTPUT

Name: John Doe1         address : somewhere             phone: 123-123-1234
Name: John Doe2         address : somewhere             phone: 123-123-1233
Name: John Doe3         address : somewhere             phone: 123-123-1232

edited Oct 11, 2012 at 13:11

answered Oct 11, 2012 at 3:02

Gilles Quénot

188k43 gold badges232 silver badges229 bronze badges

Comments

choroba · Accepted Answer · 2012-10-12 14:21:02Z

2

Using paste, we can join the lines in the file:

$ paste -s -d"\t\t\t\n" file
Name: John Doe1 address : somewhere     phone: 123-123-1234
Name: John Doe2 address : somewhere     phone: 123-123-1233
Name: John Doe3 address : somewhere     phone: 123-123-1232

edited Oct 12, 2012 at 14:21

choroba

245k27 gold badges221 silver badges304 bronze badges

answered Oct 11, 2012 at 3:04

Guru

17.1k2 gold badges37 silver badges47 bronze badges

2 Comments

luser droog Over a year ago

@sputnick True, but this does the hard part. There are myriad utilities to expand tabs.

Gilles Quénot Over a year ago

Yes, but in this case, you need 2 pipes ;)

martineau · Accepted Answer · 2012-10-11 03:02:57Z

1

This seems to do basically what you want:

information = 'information'  # file path

with open(information, 'rt') as input:
    data = input.read()

data = data.split('\n\n')

for group in data:
    print group.replace('\n', '     ')

Output:

Name: John Doe1     address : somewhere     phone: 123-123-1234
Name: John Doe2     address : somewhere     phone: 123-123-1233
Name: John Doe3     address : somewhere     phone: 123-123-1232

answered Oct 11, 2012 at 3:02

martineau

124k29 gold badges181 silver badges319 bronze badges

Comments

Hai Vu · Accepted Answer · 2012-10-11 03:06:58Z

1

I know you did not mention awk, but it solves your problem nicely:

awk 'BEGIN {RS="";FS="\n"} {print $1,$2,$3}' data.txt

answered Oct 11, 2012 at 3:06

Hai Vu

41.4k16 gold badges75 silver badges106 bronze badges

Comments

score 1 · Accepted Answer · 2012-10-12 13:41:12Z

Most of the solutions here are just reformatting the data in the file that you are reading. Maybe that is all that you want.

If you actually want to parse the data, put it in a data structure.

This example in Python:

data="""\
Name: John Doe2
address : 123 Main St, Los Angeles, CA 95002
phone: 213-123-1234

Name: John Doe1
address : 145 Pearl St, La Jolla, CA 92013
phone: 858-123-1233

Name: Billy Bob Doe3
address : 454 Heartland St, Mobile, AL 00103
phone: 205-123-1232""".split('\n\n')      # just a fill-in for your file
                                          # you would use `with open(file) as data:`

addr={}
w0,w1,w2=0,0,0             # these keep track of the max width of the field 
for line in data:
    fields=[e.split(':')[1].strip() for e in [f for f in line.split('\n')]]
    nam=fields[0].split()
    name=nam[-1]+', '+' '.join(nam[0:-1])
    addr[(name,fields[2])]=fields
    w0,w1,w2=[max(t) for t in zip(map(len,fields),(w0,w1,w2))]

Now you have the freedom to sort, change the format, put in database, etc.

This prints your format with that data, sorted:

for add in sorted(addr.keys()):
    print 'Name: {0:{w0}} Address: {1:{w1}} phone: {2:{w2}}'.format(*addr[add],w0=w0,w1=w1,w2=w2)

Prints:

Name: John Doe1      Address: 145 Pearl St, La Jolla, CA 92013   phone: 858-123-1233
Name: John Doe2      Address: 123 Main St, Los Angeles, CA 95002 phone: 213-123-1234
Name: Billy Bob Doe3 Address: 454 Heartland St, Mobile, AL 00103 phone: 205-123-1232

That is sorted by the last name, first name used in the dict key.

Now print it sorted by area code:

for add in sorted(addr.keys(),key=lambda x: addr[x][2] ):
    print 'Name: {0:{w0}} Address: {1:{w1}} phone: {2:{w2}}'.format(*addr[add],w0=w0,w1=w1,w2=w2)

Prints:

Name: Billy Bob Doe3 Address: 454 Heartland St, Mobile, AL 00103 phone: 205-123-1232
Name: John Doe2      Address: 123 Main St, Los Angeles, CA 95002 phone: 213-123-1234
Name: John Doe1      Address: 145 Pearl St, La Jolla, CA 92013   phone: 858-123-1233

But, since you have the data in a indexed dictionary, you can print it as a table instead sorted by zip code:

# print table header
print '|{0:^{w0}}|{1:^{w1}}|{2:^{w2}}|'.format('Name','Address','Phone',w0=w0+2,w1=w1+2,w2=w2+2)
print '|{0:^{w0}}|{1:^{w1}}|{2:^{w2}}|'.format('----','-------','-----',w0=w0+2,w1=w1+2,w2=w2+2)
# print data sorted by last field of the address - probably a zip code
for add in sorted(addr.keys(),key=lambda x: addr[x][1].split()[-1]):
    print '|{0:>{w0}}|{1:>{w1}}|{2:>{w2}}|'.format(*addr[add],w0=w0+2,w1=w1+2,w2=w2+2)

Prints:

|      Name      |              Address               |    Phone     |
|      ----      |              -------               |    -----     |
|  Billy Bob Doe3|  454 Heartland St, Mobile, AL 00103|  205-123-1232|
|       John Doe1|    145 Pearl St, La Jolla, CA 92013|  858-123-1233|
|       John Doe2|  123 Main St, Los Angeles, CA 95002|  213-123-1234|

Brendan Long · Accepted Answer · 2012-10-11 02:59:06Z

0

You should be able to parse this using the split() method on a string:

line = "Name: John Doe1"
key, value = line.split(":")
print(key) # Name
print(value) # John Doe1

answered Oct 11, 2012 at 2:59

Brendan Long

54.6k21 gold badges154 silver badges194 bronze badges

Comments

Abhishek Mishra · Accepted Answer · 2012-10-11 03:09:00Z

0

You can iterate over lines and print them in columns like this -

for line in open("/path/to/data"):
    if len(line) != 1:
        # remove \n from line's end and make print statement
        # skip the \n it adds in the end to continue in our column
        print "%s\t\t" % line.strip(),
    else:
        # re-use the blank lines to end our column
        print

edited Oct 11, 2012 at 3:09

answered Oct 11, 2012 at 3:03

Abhishek Mishra

5,0508 gold badges39 silver badges38 bronze badges

Comments

ChenQi · Accepted Answer · 2012-10-11 03:12:29Z

#!/usr/bin/env python

def parse(inputfile, outputfile):
    dictInfo = {'Name':None, 'address':None, 'phone':None}
    for line in inputfile:
    if line.startswith('Name'):
        dictInfo['Name'] = line.split(':')[1].strip()
    elif line.startswith('address'):
        dictInfo['address'] = line.split(':')[1].strip()
    elif line.startswith('phone'):
        dictInfo['phone'] = line.split(':')[1].strip()
        s = 'Name: '+dictInfo['Name']+'\t'+'address: '+dictInfo['address'] \
            +'\t'+'phone: '+dictInfo['phone']+'\n'
        outputfile.write(s)

if __name__ == '__main__':
    with open('output.txt', 'w') as outputfile:
    with open('infomation.txt') as inputfile:
        parse(inputfile, outputfile)

Yamaneko · Accepted Answer · 2012-10-11 04:39:49Z

0

A solution using sed.

cat input.txt | sed '/^$/d' | sed 'N; s:\n:\t\t:; N; s:\n:\t\t:'

First pipe, sed '/^$/d', removes the blank lines.
Second pipe, sed 'N; s:\n:\t\t:; N; s:\n:\t\t:', combines the lines.

Name: John Doe1     address : somewhere     phone: 123-123-1234
Name: John Doe2     address : somewhere     phone: 123-123-1233
Name: John Doe3     address : somewhere     phone: 123-123-1232

edited Oct 11, 2012 at 4:39

answered Oct 11, 2012 at 4:30

Yamaneko

3,5832 gold badges41 silver badges58 bronze badges

Comments

Nathan Villaescusa · Accepted Answer · 2012-10-11 06:13:39Z

0

In Python:

results = []
cur_item = None

with open('/root/docs/information') as f:
    for line in f.readlines():
        key, value = line.split(':', 1)
        key = key.strip()
        value = value.strip()

        if key == "Name":
            cur_item = {}
            results.append(cur_item)
        cur_item[key] = value

for item in results:
    # print item

edited Oct 11, 2012 at 6:13

answered Oct 11, 2012 at 2:57

Nathan Villaescusa

17.7k4 gold badges55 silver badges58 bronze badges

3 Comments

Gilles Quénot Over a year ago

You should precise the language ;)

Nathan Villaescusa Over a year ago

@sputnick I'm not quite I understand what you mean

Matthias Over a year ago

Just say the language: It's Python.

Collectives™ on Stack Overflow

Text File data parsing lines and output as columns

11 Answers 11

1 Comment

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

1 Comment

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related