0

I have a CSV file structured like this:

# Samples 1
1,58
2,995
3,585

# Samples 2
15,87
16,952
17,256

# Samples 1
4,89
5,63
6,27

Is there any way in Python 3.x, how to parse a file structured like this without having to manually go through it line-by-line?

I'd like to have some function, which will automatically parse it considering the labels, like this:

>> parseLabeledCSV(['# Samples 1', '# Samples 2'], fileName)
[{1:58,2:995,3:585,4:89,5:63,6:27}, {15:57, 16:952, 17:256}]
5
  • What do you mean parse, split into columns? There are many python packages specialising in reading in csv data. Commented Jun 23, 2016 at 17:25
  • 1
    What did you mean by non-homogeneous? The rows look homogeneous to me: each has two integers. Please update your post with what the expected output are. Have you looked into the csv library module? Commented Jun 23, 2016 at 17:26
  • The edit significantly changes the meaning of the question. It was absolutely unclear these were key-value pairs initially. Commented Jun 23, 2016 at 17:31
  • @Kupiakos I'm sorry, I hope its clearer now. Commented Jun 23, 2016 at 17:32
  • 1
    @Eenoku Considering this seems to be a custom format, I'd say the safest bet is to just go line-by-line. Commented Jun 23, 2016 at 17:33

2 Answers 2

1

Something like this?

input="""# Samples 1
1,58
2,995
3,585

# Samples 2
15,87
16,952
17,256

# Samples 1
4,89
5,63
6,27"""


def parse(input):
    parsed = {}
    lines = input.split("\n")
    key = "# Unknown"
    for line in lines:
        if line is None or line == "": #  ignore empty line
            continue
        if line.startswith("#") :
            if not parsed.has_key(line):
                parsed[line] = {}
            key = line
            continue
        left, right = line.split(",")
        parsed[key][left] = right
    return parsed


if __name__ == '__main__':
    output = parse(input)
    print(output)

will output to:

{'# Samples 1': {'1': '58', '3': '585', '2': '995', '5': '63', '4': '89', '6': '27'}, '# Samples 2': {'15': '87', '17': '256', '16': '952'}}
Sign up to request clarification or add additional context in comments.

Comments

0

groupby will do all the iterating and grouping for you. In this case, you only care about those contiguous groups of lines that contain a ',' (or are composed only of ',' and digits, or whatever other filter predicate you care to define):

input="""# Samples 1
1,58
2,995
3,585

# Samples 2
15,87
16,952
17,256

# Samples 1
4,89
5,63
6,27""".splitlines()

from itertools import groupby
import csv

results = []
for has_comma, data_lines in groupby(input, key=lambda s: ',' in s):
    if has_comma:
        results.append(dict(csv.reader(data_lines)))

This can even be collapsed to a single Python list comprehension statement:

results = [dict(csv.reader(data_lines)) 
            for has_comma, data_lines in groupby(input, key=lambda s: ',' in s) 
                if has_comma]

In both cases, print the results using:

for dd in results:
    print(dd)

to get:

{'1': '58', '3': '585', '2': '995'}
{'15': '87', '17': '256', '16': '952'}
{'5': '63', '4': '89', '6': '27'}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.