0

I have to parse the following file in python:

20100322;232400;1.355800;1.355900;1.355800;1.355900;0
20100322;232500;1.355800;1.355900;1.355800;1.355900;0
20100322;232600;1.355800;1.355800;1.355800;1.355800;0

I need to end upwith the following variables (first line is parsed as example):

year = 2010
month = 03
day = 22
hour = 23
minute = 24
p1 = Decimal('1.355800')
p2 = Decimal('1.355900')
p3 = Decimal('1.355800')
p4 = Decimal('1.355900')

I have tried:

line = '20100322;232400;1.355800;1.355900;1.355800;1.355900;0'
year = line[:4]
month = line[4:6]
day = line[6:8]
hour = line[9:11]
minute = line[11:13]
p1 = Decimal(line[16:24])
p2 = Decimal(line[25:33])
p3 = Decimal(line[34:42])
p4 = Decimal(line[43:51])

print(year)
print(month)
print(day)
print(hour)
print(minute)
print(p1)
print(p2)
print(p3)
print(p4)

Which works fine, but I am wondering if there is an easier way to parse this (maybe using struct) to avoid having to count each position manually.

1
  • You can use the csv package. Commented Aug 8, 2019 at 16:39

2 Answers 2

2
from decimal import Decimal
from datetime import datetime

line = "20100322;232400;1.355800;1.355900;1.355800;1.355900;0"

tokens = line.split(";")

dt = datetime.strptime(tokens[0] + tokens[1], "%Y%m%d%H%M%S")
decimals = [Decimal(string) for string in tokens[2:6]]

# datetime objects also have some useful attributes: dt.year, dt.month, etc.
print(dt, *decimals, sep="\n")

Output:

2010-03-22 23:24:00
1.355800
1.355900
1.355800
1.355900
Sign up to request clarification or add additional context in comments.

Comments

0

You could use regex:

import re

to_parse = """
20100322;232400;1.355800;1.355900;1.355800;1.355900;0
20100322;232500;1.355800;1.355900;1.355800;1.355900;0
20100322;232600;1.355800;1.355800;1.355800;1.355800;0
"""

stx = re.compile(
    r'(?P<date>(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2}));'
    r'(?P<time>(?P<hour>\d{2})(?P<minute>\d{2})(?P<second>\d{2}));' 
    r'(?P<p1>[\.\-\d]*);(?P<p2>[\.\-\d]*);(?P<p3>[\.\-\d]*);(?P<p4>[\.\-\d]*)'
    )

f = [{k:float(v) if 'p' in k else int(v) for k,v in a.groupdict().items()} for a in stx.finditer(to_parse)]

print(f)

Output:

[{'date': 20100322,
  'day': 22,
  'hour': 23,
  'minute': 24,
  'month': 3,
  'p1': 1.3558,
  'p2': 1.3559,
  'p3': 1.3558,
  'p4': 1.3559,
  'second': 0,
  'time': 232400,
  'year': 2010},
 {'date': 20100322,
  'day': 22,
  'hour': 23,
  'minute': 25,
  'month': 3,
  'p1': 1.3558,
  'p2': 1.3559,
  'p3': 1.3558,
  'p4': 1.3559,
  'second': 0,
  'time': 232500,
  'year': 2010},
 {'date': 20100322,
  'day': 22,
  'hour': 23,
  'minute': 26,
  'month': 3,
  'p1': 1.3558,
  'p2': 1.3558,
  'p3': 1.3558,
  'p4': 1.3558,
  'second': 0,
  'time': 232600,
  'year': 2010}]

Here i stored everything in a list, but you could actually go through the results of finditer line by line if you don't want to store everything in memory.

You can also replace fload and/or int with Decimal if needed

1 Comment

It might not be the case for others, but for me this is more complex and difficult to read than the original approach. I find the .split approach easier to follow and read.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.