2

I'm running a utility that parses the output of the df command. I capture the output and send it to my parser. Here's a sample:

Filesystem                512-blocks      Used  Available Capacity iused      ifree %iused  Mounted on
/dev/disk2                1996082176 430874208 1564695968    22% 2429281 4292537998    0%   /
devfs                            668       668          0   100%    1156          0  100%   /dev
map -hosts                         0         0          0   100%       0          0  100%   /net
map auto_home                      0         0          0   100%       0          0  100%   /home

Here's the function:

def parse_df(self, content):
    """Parse the `df` content output

    :param content: The command content output
    :return: (list) A list of objects of the type being parsed
    """
    entries = []
    if not content:
       return entries
    # Split the content by line and check if we should ignore first line
    for line in content.split("\n"):
        if line.startswith("Filesystem"):
            continue
        tokens = line.split()
        print tokens

However I'm getting the following output:

['/dev/disk2', '1996082176', '430876480', '1564693696', '22%', '2429288', '4292537991', '0%', '/']
['devfs', '668', '668', '0', '100%', '1156', '0', '100%', '/dev']
['map', '-hosts', '0', '0', '0', '100%', '0', '0', '100%', '/net']
['map', 'auto_home', '0', '0', '0', '100%', '0', '0', '100%', '/home']

The issue is map -host is supposed to be a single element (for the Filesystem column). I've tried to apply a regex tokens = re.split(r'\s{2,}', line) but the result was still not correct:

['/dev/disk2', '1996082176 430869352 1564700824', '22% 2429289 4292537990', '0%', '/']

What would be the correct way to parse the output?

9
  • You need to use a different delimiter maybe like \t? Even multiple spaces should work. Commented Jan 9, 2017 at 5:17
  • Each column has a fixed width. You could try splitting based on that Commented Jan 9, 2017 at 5:18
  • @Nishant: Splitting by \t: ['/dev/disk2 1996082176 430874728 1564695448 22% 2429300 4292537979 0% /'] Commented Jan 9, 2017 at 5:20
  • 1
    Sounds like a job for regular expressions; or os.statvfs. Commented Jan 9, 2017 at 5:41
  • 1
    Unrelated, but there are system calls (e.g. statvfs) that will probably get what you want more directly. Commented Jan 9, 2017 at 6:10

3 Answers 3

2

Just split on one or more spaces which was followed by a digit or /

>>> import re
>>> s = '''/dev/disk2                1996082176 430874208 1564695968    22% 2429281 4292537998    0%   /
devfs                            668       668          0   100%    1156          0  100%   /dev
map -hosts                         0         0          0   100%       0          0  100%   /net
map auto_home                      0         0          0   100%       0          0  100%   /home'''.splitlines()
>>> for line in s:
    print re.split(r'\s+(?=[\d/])', line)


['/dev/disk2', '1996082176', '430874208', '1564695968', '22%', '2429281', '4292537998', '0%', '/']
['devfs', '668', '668', '0', '100%', '1156', '0', '100%', '/dev']
['map -hosts', '0', '0', '0', '100%', '0', '0', '100%', '/net']
['map auto_home', '0', '0', '0', '100%', '0', '0', '100%', '/home']
>>> 
Sign up to request clarification or add additional context in comments.

Comments

1

If that is the behavior that you want, the easiest way I can see is to join the first element of the array until you reach a numeric element.

So something like this:

tokens = line.split()
n = 1
while n < len(tokens) and not tokens[n].isdigit():
    n += 1
tokens[0] = ' '.join(tokens[:n])
tokens = [ tokens[0] ] + tokens[n:]

Alternatively you could try @cricket_007’s suggestion:

first_token = line[:15].strip()
tokens = [ first_token ] + [ x.strip() for x in line[15:].split() ]

Comments

0

Since FS is going to probably have multiple spaces and as long as you can pre-determine that you can split using different delimiters and combine them eventually.

fs, rest = re.split(r'\s{2,}', line, 1)
result = [fs] + rest.split()

But this won't work is fs is separated by a single space like a big one.

Agree with comments that using os.statvfs(path) is a better tool for this. df would be a subprocess call.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.