Python script to extract data from txt to csv

Question

I'm trying to write a Python script to extract the Wi-Fi data from txt file to csv

Here is the txt data:

Wed Oct  7 09:00:01 UTC 2020

BSS 02:ca:fe:ca:ca:40(on ap0_1)
freq: 2422
capability: IBSS (0x0012)
signal: -60.00 dBm
primary channel: 3
last seen: 30 ms ago
BSS ac:86:74:0a:73:a8(on ap0_1)
TSF: 229102338752 usec (2d, 15:38:22)
freq: 2422
capability: ESS (0x0421)
signal: -62.00 dBm
primary channel: 3

I need to extract the txt data to csv file in this format:

 Time                        | BSS                       | freq |capability   |signal| primary channel |                                                
 ----------------------------+---------------------------+------+-------------+------+-----------------+                  
 Wed Oct  7 09:00:01 UTC 2020|02:ca:fe:ca:ca:40(on ap0_1)| 2422 |IBSS (0x0012)|-60.00|             3   |
                             |ac:86:74:0a:73:a8(on ap0_1)| 2422 |IBSS (0x0012)|-62.00|             3   |

This is my unfinished code:

import csv
import re

fieldnames = ['TIME', 'BSS', 'FREQ','CAPABILITY', 'SIGNAL', 'CHANNEL']

re_fields = re.compile(r'({})+:\s(.*)'.format('|'.join(fieldnames)), re.I)

with open('ap0_1.txt') as f_input, open('ap0_1.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames= fieldnames)
    csv_output.writeheader()
    start = False

    for line in f_input:
        line = line.strip()

        if len(line):
            if 'BSS' in line:
                if start:
                    start = False
                    block.append(line)
                    text_block = '\n'.join(block)

                    for field, value in re_fields.findall(text_block):
                        entry[field.upper()] = value

                    if line[0] == 'on ap0_1':
                        entry['BSS'] = block[0]

                    csv_output.writerow(entry)

                else:
                    start = True
                    entry = {}
                    block = [line]
            elif start:
                block.append(line)

When I run it, the data isn't placed correctly.

Please let me know how to fix this. I'm just a beginner in programming and would appreciate any advice. Thank you.

Please add the desired and observed output for the input samples to your question. — Klaus D.
– Klaus D., Commented Oct 19, 2020 at 4:11
The question is confusing. You say "here is the data", and you also say "the data is in this format", and those two examples are wildly different. What does the input data actually look like? — John Gordon
– John Gordon, Commented Oct 19, 2020 at 4:40
Hi John Gordon, I'm sorry for confusing you. i've edited the question — Henry Nguyen
– Henry Nguyen, Commented Oct 19, 2020 at 5:00

Rakesh · Accepted Answer · 2020-10-19 13:35:15Z

1

Using str.startswith

Ex:

import csv

fieldnames = ('TIME', 'BSS', 'freq','capability', 'signal', 'primary channel')
with open(filename) as f_input, open(outfile,'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames= fieldnames)
    csv_output.writeheader()
    result = {"TIME": next(f_input).strip()}   #Get Time, First Line
    for line in f_input:
        line = line.strip()
        if line.startswith(fieldnames):
            if line.startswith('BSS'):
                key, value = line.split(" ", 1)
            else:
                key, value = line.split(": ")
            result[key] = value
            
    csv_output.writerow(result)

EDIT as per comment

If you have multiple blocks of the above text

import re
import csv

week_ptrn = re.compile(r"\b(" + "|".join(('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')) + r")\b")
fieldnames = ('TIME', 'BSS', 'freq','capability', 'signal', 'primary channel')

with open(filename) as f_input, open(outfile,'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames= fieldnames)
    csv_output.writeheader()
    result = []    #Get Time, First Line
    for line in f_input:
        line = line.strip()
        week = week_ptrn.match(line)
        if week:
            result.append({"TIME": line})
            
        if line.startswith(fieldnames):
            if line.startswith('BSS'):
                key, value = line.split(" ", 1)
            else:
                key, value = line.split(": ")
            result[-1][key] = value
            
    csv_output.writerows(result)

edited Oct 19, 2020 at 13:35

answered Oct 19, 2020 at 4:49

Rakesh

82.9k17 gold badges85 silver badges122 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Henry Nguyen Over a year ago

Hi Rakesh, the code only extract the 1st BSS but not the 2nd BSS

Rakesh Over a year ago

Sorry i do not understand...Do you have multiple blocks of the above string in the text file?

Henry Nguyen Over a year ago

Yes, each block start with BSS and end with primary channel

Canasta · Accepted Answer · 2020-10-19 05:03:53Z

0

You tried to search time with "TIME". But there is no "TIME" in input data. So output with empty time is a natural.

And I think follow lines also have problem.

            if line[0] == 'on ap0_1':
                entry['BSS'] = block[0]

In my guess, you tried to find on ap0_1 of BSS ac:86:74:0a:73:a8(on ap0_1). But line[0] is 'BSS', first of ['BSS', 'ac:86:74:0a:73:a8(on', 'ap0_1)']. It should changed like this:

            if 'on ap0_1' in block[0]:
                entry['BSS'] = block[0][4:].lstrip()

answered Oct 19, 2020 at 5:03

Canasta

2281 silver badge6 bronze badges

Comments

Kate Melnykova · Accepted Answer · 2020-10-19 17:50:08Z

0

Here is my version of the code.

import csv, re

fieldnames = ['TIME', 'BSS', 'FREQ','CAPABILITY', 'SIGNAL', 'CHANNEL']
re_fields = re.compile(r'({})+:\s(.*)'.format('|'.join(fieldnames)), re.I)

with open('ap0_1.txt') as f_input, open('ap0_1.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames= fieldnames)
    csv_output.writeheader()
    start = False
 
    time_condition = lambda @l: l.startswith('Mon') or l.startswith('Tue') or \ 
                     l.startswith('Wed') or l.startswith('Thu') or l.startswith('Fri') \ 
                     or l.startswith('Sat') or l.startswith('Sun')
    
    row = dict{}
    for line in f_input:
        line = line.strip()
        if not line:
            continue
        elif time_condition(line):
            row['TIME'] = line
        else:
            # not sure how you define the start of a new block, say, it is by 'BSS' string
            key, value = line.split(' ', 1) # split one time exactly
            key = key.rstrip(':').upper()
            if key == 'BSS' and row:
                row = (row.get(k, '') for k in fieldnames)
                csv_output.writerow(row)
                row = dict()
  
            row[key.upper()] = value
    row = (row.get(k, '') for k in fieldnames)
    csv_output.writerow(row)

It looks like '\n' creates blank rows.

edited Oct 19, 2020 at 17:50

answered Oct 19, 2020 at 4:51

Kate Melnykova

1,8731 gold badge7 silver badges17 bronze badges

1 Comment

Henry Nguyen Over a year ago

Hi @kate-melnykova, when I tried running the code it said block['TIME'] = line is not define.

Collectives™ on Stack Overflow

Python script to extract data from txt to csv

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related