1
\$\begingroup\$

i am testing an application that i wrote and want to test the solution my algorithm produces to a Monte carlo solution. I use the harddisk a lot and i was wondering if there was a solution that uses writing data to a file a lot less, since it is really slowing the process down.

The solutions are computed on the nodes of a cluster and examined using this script ( that runs on a node): Parameter $1 is an outputfile that the program wrote.

file=$1
script=/home/hefke/ov_paper/scripts
mv $file.out $file.out.old
grep "Overlapscore:" $file.monte > $file.grepped
awk '/./{print $2}' $file.grepped > $file.overlap
print "$script/std_dev.sh $file.overlap > $file.out"
$script/std_dev.sh $file.overlap > $file.out
cat $file.analy >> $file.out
cat "DONE" >> $file.out

Here is the script that collects the data on the main node. Analy and Monte files are my output files.

echo "Processing outputfiles for the mc_stdev_of_ov"
script=/home/hefke/ov_paper/scripts
curdir=`pwd`
folder=filedata
for file in `ls -1 $curdir/temp_output/$folder/*.analy| sed 's/\(.*\)\..*/\1/'|uniq`
do 
    echo $file
    $script/submitter.sh $curdir "processonefile.sh $file.out"
done
echo "$file.out now contains what stdtev spat out."
cat $curdir/temp_output/$folder/*.out >> $curdir/temp_output/tmp.out 
awk -f keys.awk $curdir/temp_output/tmp.out >> table.out
cat table.out

How can i optimize this procedure for speed?

\$\endgroup\$

2 Answers 2

1
\$\begingroup\$

You don't need to store in files between each command. Instead, just redirect the output:

$script/std_dev.sh < <(grep "Overlapscore:" $file.monte | awk '/./{print $2}') > $file.out

The Bash Guide has an excellent article about I/O.

There's only one place where you write to tmp.out, and awk can take more than one file, so you can simplify those lines similarly:

awk -f keys.awk $curdir/temp_output/$folder/*.out

There's no need to redirect to table.out and cating it afterwards.

You shouldn't use ls in scripts; you can simply loop over a glob:

for file in $curdir/temp_output/$folder/*.analy
    file="${file%.*}" # Remove extension
\$\endgroup\$
5
  • \$\begingroup\$ when i use script/std_dev.sh < <(grep "Overlapscore:" $file.monte | awk '/./{print $2}') > $file.out, it tells me :Missing name for redirect. \$\endgroup\$ Commented Mar 14, 2012 at 13:28
  • \$\begingroup\$ Are you sure you're actually running Bash? \$\endgroup\$ Commented Mar 14, 2012 at 13:50
  • \$\begingroup\$ l0b0 you sir are a genius. as a matter of fact i am not :(. I am running the cshell. Thank you very much for your answer anyways :) \$\endgroup\$ Commented Mar 15, 2012 at 6:44
  • \$\begingroup\$ not relating to the question any more, but is there a way to group commands with the () as in bash in cshell? \$\endgroup\$ Commented Mar 15, 2012 at 8:03
  • \$\begingroup\$ Sorry @tarrasch, csh is one beast I've never had to handle, so I really don't know. Maybe food for a separate question on USE? \$\endgroup\$ Commented Mar 15, 2012 at 8:53
2
\$\begingroup\$

It's not related, but please don't mind if I use an "answer" to just comment : it seems I can't comment, maybe because I don't have enough points yet to do so...

Tarrasch, if you still use csh for your shell, please do not script in it.

Please read: http://www.faqs.org/faqs/unix-faq/shell/csh-whynot/

Use instead sh, bash (or even ksh). And better to stick to sh-only because that's what's all unix system rely on (and rc scripts, for example, are based on).

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.