Parsing a fixed-width file in Python with Big Decimals

Question

I have to parse the following file in python:

20100322;232400;1.355800;1.355900;1.355800;1.355900;0
20100322;232500;1.355800;1.355900;1.355800;1.355900;0
20100322;232600;1.355800;1.355800;1.355800;1.355800;0

I need to end upwith the following variables (first line is parsed as example):

year = 2010
month = 03
day = 22
hour = 23
minute = 24
p1 = Decimal('1.355800')
p2 = Decimal('1.355900')
p3 = Decimal('1.355800')
p4 = Decimal('1.355900')

I have tried:

line = '20100322;232400;1.355800;1.355900;1.355800;1.355900;0'
year = line[:4]
month = line[4:6]
day = line[6:8]
hour = line[9:11]
minute = line[11:13]
p1 = Decimal(line[16:24])
p2 = Decimal(line[25:33])
p3 = Decimal(line[34:42])
p4 = Decimal(line[43:51])

print(year)
print(month)
print(day)
print(hour)
print(minute)
print(p1)
print(p2)
print(p3)
print(p4)

Which works fine, but I am wondering if there is an easier way to parse this (maybe using struct) to avoid having to count each position manually.

You can use the csv package.

Rob
– Rob

2019-08-08 16:39:21 +00:00
Commented Aug 8, 2019 at 16:39 — Rob
– Rob, Commented Aug 8, 2019 at 16:39

Paul M. · Accepted Answer · 2019-08-08 17:48:37Z

2

from decimal import Decimal
from datetime import datetime

line = "20100322;232400;1.355800;1.355900;1.355800;1.355900;0"

tokens = line.split(";")

dt = datetime.strptime(tokens[0] + tokens[1], "%Y%m%d%H%M%S")
decimals = [Decimal(string) for string in tokens[2:6]]

# datetime objects also have some useful attributes: dt.year, dt.month, etc.
print(dt, *decimals, sep="\n")

Output:

2010-03-22 23:24:00
1.355800
1.355900
1.355800
1.355900

edited Aug 8, 2019 at 17:48

answered Aug 8, 2019 at 17:17

Paul M.

10.8k2 gold badges11 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Crivella · Accepted Answer · 2019-08-08 17:12:19Z

0

You could use regex:

import re

to_parse = """
20100322;232400;1.355800;1.355900;1.355800;1.355900;0
20100322;232500;1.355800;1.355900;1.355800;1.355900;0
20100322;232600;1.355800;1.355800;1.355800;1.355800;0
"""

stx = re.compile(
    r'(?P<date>(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2}));'
    r'(?P<time>(?P<hour>\d{2})(?P<minute>\d{2})(?P<second>\d{2}));' 
    r'(?P<p1>[\.\-\d]*);(?P<p2>[\.\-\d]*);(?P<p3>[\.\-\d]*);(?P<p4>[\.\-\d]*)'
    )

f = [{k:float(v) if 'p' in k else int(v) for k,v in a.groupdict().items()} for a in stx.finditer(to_parse)]

print(f)

Output:

[{'date': 20100322,
  'day': 22,
  'hour': 23,
  'minute': 24,
  'month': 3,
  'p1': 1.3558,
  'p2': 1.3559,
  'p3': 1.3558,
  'p4': 1.3559,
  'second': 0,
  'time': 232400,
  'year': 2010},
 {'date': 20100322,
  'day': 22,
  'hour': 23,
  'minute': 25,
  'month': 3,
  'p1': 1.3558,
  'p2': 1.3559,
  'p3': 1.3558,
  'p4': 1.3559,
  'second': 0,
  'time': 232500,
  'year': 2010},
 {'date': 20100322,
  'day': 22,
  'hour': 23,
  'minute': 26,
  'month': 3,
  'p1': 1.3558,
  'p2': 1.3558,
  'p3': 1.3558,
  'p4': 1.3558,
  'second': 0,
  'time': 232600,
  'year': 2010}]

Here i stored everything in a list, but you could actually go through the results of finditer line by line if you don't want to store everything in memory.

You can also replace fload and/or int with Decimal if needed

edited Aug 8, 2019 at 17:12

answered Aug 8, 2019 at 16:58

Crivella

1,0077 silver badges13 bronze badges

1 Comment

M.E. Over a year ago

It might not be the case for others, but for me this is more complex and difficult to read than the original approach. I find the .split approach easier to follow and read.

Collectives™ on Stack Overflow

Parsing a fixed-width file in Python with Big Decimals

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related