3

Given a csv describing firstname and lastname of parent-child relationship

$ cat /var/tmp/hier
F2 L2,F1 L1
F3 L3,F1 L1
F4 L4,F2 L2
F5 L5,F2 L2
F6 L6,F3 L3

I want to print:

F1 L1
    F2 L2
        F4 L4
        F5 L5
    F3 L3
        F6 L6

I wrote a script like below:

#!/bin/bash
print_node() {
        echo "awk -F, '\$2=="\"$@\"" {print \$1}' /var/tmp/hier"
        for node in `eval "awk -F, '\$2=="\"$@\"" {print \$1}'     /var/tmp/hier"`
        do
                echo -e "\t"$node
                print_node "$node"
        done
}
print_node "$1"

run the script:

$ ./print_tree.sh "F1 L1"
awk -F, '$2=="F1 L1" {print $1}' /var/tmp/hier
awk: syntax error near line 1
awk: bailing out near line 1

It seemed that the awk command was malformed. but if I run the command shown in the debug output, it works:

$ awk -F, '$2=="F1 L1" {print $1}' /var/tmp/hier
F2 L2
F3 L3

What might be causing this error?

6
  • 1
    I'm not seeing the connection to python here to justify the tag, am I missing something? Commented Aug 15, 2015 at 0:31
  • 2
    Anything containing for node in `eval "awk ...` is sufficiently complex that you should be rethinking from the ground up. I for one simply decline to spend time working out what you're trying to do, and why your code isn't achieving it. You should be able to run awk once, and it should do all the processing. (Or Perl, or Python, or …) Commented Aug 15, 2015 at 0:44
  • 2
    Please either explain why perl and python are relevant tags, or stop adding irrelevant tags. Commented Aug 15, 2015 at 0:50
  • Whenever you get that specific error message it means you are trying to use old, broken awk (/bin/awk on Solaris). On Solaris use /usr/xpg4/bin/awk (or nawk as a second best choice) Commented Aug 15, 2015 at 1:25
  • replacing with /usr/xpg4/bin/awk didn't work. Commented Aug 15, 2015 at 1:40

3 Answers 3

7

With GNU awk for multi-dimensional arrays:

$ cat tst.awk
BEGIN { FS="," }
function descend(node) {
    printf "%*s%s\n", indent, "", node
    if ( isarray(map[node]) ) {
        indent += 3
        for (child in map[node]) {
            descend(child)
        }
        indent -= 3
    }
    return
}
NR==1 { root = $2 }
{ map[$2][$1] }
END { descend(root) }

$ awk -f tst.awk file
F1 L1
   F2 L2
      F4 L4
      F5 L5
   F3 L3
      F6 L6
Sign up to request clarification or add additional context in comments.

1 Comment

Just brilliant.
3

I would personally reach for Perl here; you could also do Python (or any other similar-level language that happens to be there, like Ruby or Tcl, but Perl and Python are almost universally preinstalled). I would use one of them since they have built-in nested data structures, which make it easy to cache the tree in navigable form, instead of re-parsing the parent links every time you want to fetch a node's children. (GNU awk has arrays of arrays, but BSD awk doesn't.)

Anyway, here's one perl solution:

#!/usr/bin/env perl
use strict;
use warnings;

my %parent;

while (<>) {
  chomp;
  my ($child, $parent) = split ',';
  $parent{$child} = $parent;
}

my (%children, %roots);

while (my ($child, $parent) = each %parent) {
  push @{$children{$parent} ||= []}, $child;
  $roots{$parent} = 1 unless $parent{$parent};
}

foreach my $root (sort keys %roots) {
  show($root);
}

sub show {
  my ($node, $indent) = (@_,'');
  print "$indent$node\n";
  foreach my $child (sort(@{$children{$node}||[]})) {
    show($child, "    $indent");
  }
}

I saved the above as print_tree.pl and ran it like this on your data:

$ perl print_tree.pl *csv

You could also make it executable with chmod +x print_tree.pl and run it without explicitly calling perl:

$ ./print_tree.pl *csv

Anyway, on your sample data, it produces this output:

F1 L1
    F2 L2
        F4 L4
        F5 L5
    F3 L3
        F6 L6

Comments

2

alternative solution without multi dimensional awk arrays which works for this hierarchy

join -t, -1 1 -2 2 inputfile{,} | awk -F, -f tree.awk

and awk script is as follows

$ cat tree.awk 
    {
            s=$1;$1=$2;$2=s;
            t=""
            for (i=1;i<=NF;i++) {
                    if (! ($i in n)) {
                            print t $i
                            n[$i]
                    }
                    t=t "\t"
            }
    }

1 Comment

Correct, you have keep joining to create the full path for each node.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.