Running a Haskell script on a machine without GHC

Question

This question may or may not be truly Haskell-specific, but it concerns a slight annoyance that I am facing with a certain programming task.

I have written a program in Haskell which is mostly universal for the type of problem I am trying to solve, but includes two dependent components: a run-time estimation function for a script, calculated based on trial runs at a certain benchmark, and a file-name conversion function, which is tailored to the naming scheme of the files I was working with. Naturally, if I want to use the script with performances other than the benchmark, or I find that the estimates are too conservative, I would like to change the function used to estimate the run-time, and likewise I would like to be able to modify the file-name conversion function if I ever need to work with different files with different naming schemes.

However, the (remote) computer that I am running my scripts on does not have GHC or runhaskell installed, so I am having to modify, compile, and re-upload the code from my local machine, which is a bit of a hassle. My question is, is there an easy way to implement changes in some components of my code without having to recompile in order for the changes to be reflected at call-time?

I apologize if my description is vague, and have included the gory details below, as I do not want to clutter my question with unnecessary details from the outset, should the details prove unnecessary.

I am writing this code in Haskell mainly because that is the language that I best know how to implement the methods in; while I understand that other languages might be more portable, I am not sufficiently familiar with other languages in order to implement this without having to read a lot of documentation and go through multiple revisions in order to get it to work. If achieving the flexibility I want with Haskell is impractical, I can appreciate that, but I would rather know that Haskell cannot do it than receive suggestions of other languages that can.

Specific Details

I am writing code to run independent jobs on a load-sharing cluster, and I therefore want to most closely estimate the time required for a particular job, without under-shooting and causing the job to be terminated, and without over-shooting and thereby lowering the priority of the jobs. I am basing my time estimate on the size of the inputs to the job program, and I have gathered enough real-world data to derive an approximate quadratic relation between size and time.

The way I am currently assigning time-estimates, and thereby establishing a job order, for the inputs is by parsing the output of du with a Haskell script, performing a computation, and writing the time results to a new file, which is then read in a loop by the job-assignment script.

The job is being run for paired files, which share a common name up to a certain point, where the last common element I wish to retain is an 's', with no further 's' characters in either name from then on. Therefore, I am traversing the names backwards and dropping until I reach an 's'. My code is included below. It is liberal with comments, which might help or might confuse. Some of them are highly specific to the task I am working with.

-- size2time.hs
-- A Haskell script to convert file sizes into job-times, based on observed job-times for
-- various file sizes
--
--
-- This file may be compiled via the following command:
-- > ghc size2time.hs
--
-- Should any edits be made, ensure that the compiled executable is updated accordingly
--
-- The executable is to be run with the following usage
--
-- > ./size2time inputfile outputfile
--
-- where inputfile is the name of a file whose first column contains the sizes, in MB, of each fq.gz 
-- (including both paired-end reads), and whose second column contains the corresponding file names, as
-- generated by
-- 
-- > du -m $( ls DIR/*.fq.gz ) >inputfile
--
-- where DIR is the directory containing the fq.gz files
--
-- output is the name of a file that will be created by the execution of this script, whose first
-- column will contain the run-time, in minutes, of the corresponding job (the times are based on
-- jobs run on Intel CPUs with 12 cores and 2GB of RAM, and therefore will potentially be
-- inapplicable to jobs run on CPUs of different manufacturers, with different numbers of cores,
-- and/or with different allocated RAM), and whose second column contains the scrubbed names of
-- the jobs to be run. The greater time-value for any given pair is used, with only one member of
-- each pair retained, as the file-names of each member of a pair are identical after scrubbing
--

-- import modules for command line arguments, list operations, map operations
import System.Environment
import Data.List
import qualified Data.Map as Map


main = do
    args <- getArgs -- parse command line arguments: inputfile, outputfile, <ignored>
    let infile = head args
    outfile = head . tail $ args
    contents <- readFile infile -- read the inputfile
    let sf = lines contents -- split into lines
        tf = map size2time sf -- peform size2time mapping
        st = map sample tf -- scrub filename
        stu = Map.toList . Map.fromListWith (max) $ st -- take only the longer of the two times of the paired reads
        tsu = map flip2 stu -- put time first
        stsu = sort tsu -- sort by time, ascending
        tsustr = map unwords . map (\(x,y) -> [show x, y]) $ stsu -- convert back to string
        tsulns = unlines tsustr -- join individual lines
    writeFile outfile tsulns -- write to the outputfile


{- given a string, with the size of a file and the name of the file,
 - returns a tuple with the estimated job-time and the unmodified name
 - of the file.
 -
 - The size-time conversion is extrapolated from experimental data,
 - with only the upper extremes considered in order to prevent timeout,
 - rounding in the quadratic term, and a linear-degree time padding added
 - to allow for upper extremes. If modifications are to be made to any
 - coefficients, it is recommended that only linear and constant terms be increased,
 - and decreases should only be made after performing sufficient alignments to collect
 - enough (file size)--(actual computation time) pairs to verify that the padding is excessive,
 - and to determine coefficients that more closely follow the trend of the actual data, with
 - the conditions that no data point must exceed the approximation curve, and that sufficient padding
 - must be provided to allow for potential inconsistency in the time required for any given size of alignment.
 -}
size2time :: String -> (Int,String)
size2time sfstring = let (size:file:[]) = words sfstring -- parses out size and filename
                         x = fromIntegral (read size :: Int) -- floating point from numeric string
             time = floor $ 0.000025 * x ^ 2 + 0.03 * x + 10 -- apply floored conversion
             tfstring = (time,file)
             in tfstring



{-
 - removes all characters in the file-name after 's', which properly scrubs files of the format
 - *--Hs--R?.fq.gz, where the ? is either 1 or 2. For filenames formatted in different ways,
 - or for alternative naming of the BAM file to be generated, this function must be modified
 - to suit the scenario.
 -}
sample :: (a,String) -> (String,a)
sample (x,f) = let s = reverse . dropWhile (/= 's') . reverse $ f
               in (s,x)

{-
 - Reverses the order of a tuple, e.g. so that a Map may be made with a key to be found in the 
 - current second position of the tuple.
 -}
flip2 :: (a,b) -> (b,a)
flip2 (x,y) = (y,x)

You could look into including the hint library in your project, then using it to load haskell modules and interpret them as scripts. — bheklilr
– bheklilr, Commented Sep 26, 2014 at 14:55
"is there an easy way to implement changes in some components of my code without having to recompile in order for the changes to be reflected at call-time?" Put things that are likely to change in configuration files/command line options? — user2407038
– user2407038, Commented Sep 26, 2014 at 21:14
As far as I understand it, the part you most frequently tweak is the time estimation function, so you need to be able to edit 0.000025 * x ^ 2 + 0.03 * x + 10. There are a lot of expression-parsing tutorials out there, and probably a well-built existing one, so you could pass the name of a file containing such an expresssion as a command line to your function, parse it at runtime and apply it to the data. You would only need to recompile if something other than this function were changed. — AndrewC
– AndrewC, Commented Sep 26, 2014 at 22:59
don't get me wrong - the problem seems interesting - but isn't that a job for a simple bash(whatever) script? Instead of uploading your new .hs file have a script compile and upload — Random Dev
– Random Dev, Commented Sep 27, 2014 at 11:23

user2752467 · Accepted Answer · 2014-10-18 05:41:51Z

1

I don't think there's a clear solution to your problem.

Without an interpreter or compiler on the remote machine, it's not possible to modify your Haskell source on that machine and then convert it into a machine-readable form.

As others have said, perhaps you could implement configuration files or command line options that allow likely-to-be-modified data to be specified at run time.

Or, assuming your remote machine has gcc installed, you could have GHC compile your Haskell code into C on your local machine, transfer it to the remote machine, try your best to make sense of how it translated your code, and make changes to the C code and recompile on the remote machine.

answered Oct 18, 2014 at 5:41

user2752467

8456 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sclv · Accepted Answer · 2015-02-20 05:08:50Z

0

If GHC is too heavyweight to install, nhc (https://www.haskell.org/nhc98/) and hugs (https://www.haskell.org/hugs/) are both classic Haskell systems that have not been under active development in some time, but which still are kept in working order as far as I know. They have small binaries, and you could use one or either of those, assuming you have minimal, simple dependencies in your code that all can also run on such systems.

answered Feb 20, 2015 at 5:08

sclv

39k7 gold badges102 silver badges208 bronze badges

Collectives™ on Stack Overflow

Running a Haskell script on a machine without GHC

Specific Details

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Specific Details

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related