0

I have the following log files and I want to split it and put it in an ordered data structure(something like a list of list) using Python 3.4

The file follows this structure:

Month #1
1465465464555
345646546454
442343423433
724342342655
34324233454
24543534533
***Day # 1
5465465465465455
644654654654454
4435423534833
***Day #2
24876867655
74654654454
643876867433
***Day #3
445543534655
344876867854
64365465433
Month #2
7454353455
84756756454
64563453433
***Day # 1
44756756655
34453453454
243867867433
***Day #2
64465465455
74454353454
34878733
***Day #3
1449898955
643434354
843090909888433

The aim is to be able to cycle on the number of months and be able to work on each day separately. I should be able to do something like this:

for month in months:
 for day in days:
  for number in day:
   print(number)

The solution I have adopted to extract months from the file is the following, but it's not a smart solution. I need something better

lista = []

in_file = open("log.txt","r")
righe= in_file.readlines()
in_file.close()


for i in range(0,len(righe)):
    if "Month" in righe[i]:
        lista.append(i)


lista.append((len(righe)-1))
counter = 1
for i in range(0,len(lista)-1):
    out_file = open(str(counter)+".txt","w")
    for j in range(lista[i], lista[i+1]):
        out_file.write(righe[j])
    out_file.close()
    counter=counter+1



for i in range(1,counter):
    print("Month: ", i)
    mano = open(str(i)+".txt","r")
    righe= mano.readlines()
    print(righe)
    mano.close()
0

3 Answers 3

3

If you want to go down the nested dict route:

month, day = 0, 0
log = {}
with open("log.txt") as f:
    for line in f:
        if 'Month' in line:
            month += 1
            day = 0
            log[month] = {0:[]}
        elif 'Day' in line:
            day += 1
            log[month][day] = []
        else:
            log[month][day].append(line.strip())

Note that I assumed the entries immediately following a month line are day 0. The structure now looks like:

>>> from pprint import pprint
>>> pprint(log)
{1: {0: ['1465465464555',
         '345646546454',
         '442343423433',
         '724342342655',
         '34324233454',
         '24543534533'],
     1: ['5465465465465455', '644654654654454', '4435423534833'],
     2: ['24876867655', '74654654454', '643876867433'],
     3: ['445543534655', '344876867854', '64365465433']},
 2: {0: ['7454353455', '84756756454', '64563453433'],
     1: ['44756756655', '34453453454', '243867867433'],
     2: ['64465465455', '74454353454', '34878733'],
     3: ['1449898955', '643434354', '843090909888433']}}

And you can iterate over it with:

for month_index in sorted(log):
    month = log[month_index]
    for day_index in sorted(month):
        day = month[day_index]
        for number in day:
            print(number)
Sign up to request clarification or add additional context in comments.

Comments

0

Well, here we have few answers for that question.

Here is my contribution, I solved the issue using some recursive solution. So, for a new way of thinking:

def loop(stopParam, startArr, resultArr=[]):
    if startArr == []:
        return (resultArr, startArr)
    elif stopParam in startArr[0]:
        return (resultArr, startArr)
    else:
        return loop(stopParam, startArr[1:], resultArr + [startArr[0]])

def buildList(arr, testVal={}):
    if 'Month' in (arr[0] if arr != [] else ''):
        res = loop('Month', arr[1:])
        testVal[arr[0]] = res[0]
        return buildList(res[1], testVal)
    else:
        return testVal


in_file = open("test.txt","r")
righe= in_file.readlines()
in_file.close()

print buildList(righe)

This is a solution.

Comments

0

itertools.groupby from the standard lib is a powerful function for this kind of work. The code below finds groups of lines by month, and then within the month by day, building up a nested data structure. Once done, then you can iterate over that structure by month, and within each month by day.

data = """\
Month #1
1465465464555
345646546454
442343423433
724342342655
34324233454
24543534533
***Day # 1
5465465465465455
644654654654454
4435423534833
***Day #2
24876867655
74654654454
643876867433
***Day #3
445543534655
344876867854
64365465433
Month #2
7454353455
84756756454
64563453433
***Day # 1
44756756655
34453453454
243867867433
***Day #2
64465465455
74454353454
34878733
***Day #3
1449898955
643434354
843090909888433""".splitlines()
# or data = open(data_file).read().splitlines()

from itertools import groupby

# some simple boolean functions to detect Month and Day marker lines
is_month_line = lambda s: s.startswith("Month")
is_day_line = lambda s: s.startswith("***Day")

grouped_data = []
for is_month, month_lines in groupby(data, key=is_month_line):
    if is_month:
        # detected a 'Month' marker - save it and create placeholder in grouping structure
        current_month = list(month_lines)[0]
        current_month_data = []
        grouped_data.append([current_month, current_month_data])

        # set up blank day for month-level data lines
        current_day = ''
        current_day_data = []
        current_month_data.append([current_day, current_day_data])
    else:
        # found group of non-'Month' lines, group by days
        for is_day, day_lines in groupby(month_lines, key=is_day_line):
            if is_day:
                # day marker detected, save it for storing day values
                current_day = list(day_lines)[0][3:]
                current_day_data = []
                current_month_data.append([current_day, current_day_data])
            else:
                # all non-day lines, add to current day's data
                current_day_data.extend(day_lines)

Use pprint to dump out the nested lists:

from pprint import pprint
pprint(grouped_data, width=120)

gives:

[['Month #1',
  [['', ['1465465464555', '345646546454', '442343423433', '724342342655', '34324233454', '24543534533']],
   ['Day # 1', ['5465465465465455', '644654654654454', '4435423534833']],
   ['Day #2', ['24876867655', '74654654454', '643876867433']],
   ['Day #3', ['445543534655', '344876867854', '64365465433']]]],
 ['Month #2',
  [['', ['7454353455', '84756756454', '64563453433']],
   ['Day # 1', ['44756756655', '34453453454', '243867867433']],
   ['Day #2', ['64465465455', '74454353454', '34878733']],
   ['Day #3', ['1449898955', '643434354', '843090909888433']]]]]

2 Comments

Another option could be to use a ConfigParser and reformat your log file (if that is an available route).
Thx for replying Paul. It 's almost exactly what I was looking for. The problem is that the nested structure created makes me a little bit of confusion. For example how do I access iteratively all the days in a month (considering that one day there may not be in a month)?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.