1

I'm trying to scrape data from a word document with node.js.

My current problem is that the below console log will return the value inside the juice block as the appropriate varaible. If I move that to outside the juice block it is completely lost. I tried putting return

function getMargin(id, content){

    var newMargin = content.css("margin-left");

    if(newMargin === undefined){
        var htmlOfTarget = content.toString(),
            whereToCut = theRaw.indexOf("<div class=WordSection1>");

        fs.writeFile("bin/temp/temp_" + id + ".htm", theRaw.slice(0, whereToCut) + htmlOfTarget + "</body> </html>", function (err){
            if (err) {
                throw err;
            }
        });

        juice("bin/temp/temp_" + id + ".htm", function (err, html) {
            if (err) {
                throw err;
            }
            var innerLoad = cheerio.load(html);
            newMargin = innerLoad("p").css("margin-left");
            console.log(newMargin); // THIS newMargin AS VALUE 
        });
    }
    console.log(newMargin);//THIS RETURNS newMargin UNDEFINED
    return newMargin;
}

I think the problem lies with fs.write and juice being Asyc functions. I just have no idea how to get around it. I have to be able to call getMargin at certain points, in a sequential order.

1
  • 1
    The way to get around it is to for the flow of execution to continue from within the callback. In other words, any code that relies on the response, must be inside the callback, or must be in a function that is executed from inside the callback. Commented Feb 5, 2014 at 20:10

1 Answer 1

2

As mentioned in comment, change your program flow to run in callbacks, after async code has completed...

// accept callback as parameter, and run it after async methods complete...
function getMargin(id, content, callback){

    var newMargin = content.css("margin-left");

    if(newMargin === undefined){
        var htmlOfTarget = content.toString(),
            whereToCut = theRaw.indexOf("<div class=WordSection1>");

        fs.writeFile("bin/temp/temp_" + id + ".htm", theRaw.slice(0, whereToCut) + htmlOfTarget + "</body> </html>", function (err){
            if (err) {
                throw err;
            }

            // move the juice call inside the callback of the file write operation
            juice("bin/temp/temp_" + id + ".htm", function (err, html) {
                if (err) {
                    throw err;
                }
                var innerLoad = cheerio.load(html);
                newMargin = innerLoad("p").css("margin-left");
                console.log(newMargin); // THIS newMargin AS VALUE 

                // now run the callback passed in the beginning...
                callback();

            });

        });

    }
}


// call getMargin with callback to run once complete...
getMargin("myId", "myContent", function(){
    // continue program execution in here....
});
Sign up to request clarification or add additional context in comments.

6 Comments

Doesn't this make getMargin Aysnc now? The thing is I have a function running on each p tag that is found in the word document. With global counters that need to be fired on each one. I have to keep track of that outside these functions. So there is no way to pass that without this?
@Dirly: If Billy doesn't mind me cutting in here, you could call the synchronous version of writeFile, but it does block execution of any other code. Sometimes in a node app this is alright, but it sounds as though this is called repeatedly, and so I'd advise against it. Not to mention that you'd also need to make juice() synchronous. There's almost always a solution. Often it just requires different coding patterns. If you're stuck on this issue, you may want to research async patterns, and then ask another question if you can't figure out a way to do what you need.
...and don't forget that JavaScript functions are closures, so there shouldn't be any reason the callback you pass wouldn't be able to update your counter variable.
My first time running into anything Async so it threw a wrench in what I was doing. So to pass something in the callback it would basically be: callback(newMargin); then where I want to run that it would be getMargin("myID", "myContent",function(updatedMargin){ what I want to pass newMargin too }.
@Dirly although it seems odd at first, async is the way js works. Fighting against it would be harder than learning how to use it. There is no return value in async code. Blocks of code are separated into event based units. This is necessary for events that can take unknown time, like for example when a request is sent to a server - without async method, js would be blocked until the server responds - which would block the UI, creating bad experience for user. Async method is to fire off some function call, and register a callback or listen to event. Godspeed :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.