5

I'm trying to run below query in bigquery using standard SQL and javascript UDF. The query takes forever to run, thus I'm not even able to verify if the function is working or not. Can you please let me know if there is anything wrong with the query that makes it run forever? I tried to change the function call from IRRCalc(Array<FLOAT64> [cash_flow], ARRAY<INT64> [date_delta]) as IRR to IRRCalc(array(select cash_flow from input),array(select date_delta from input)) as IRR and it resolved the issue. Though I don't understand what's wrong with IRRCalc(Array<FLOAT64> [cash_flow], ARRAY<INT64> [date_delta]) as IRR. Can someone please have a look and shed some light? Many thanks.

Here's the query:

CREATE TEMPORARY FUNCTION IRRCalc(cash_flow ARRAY<FLOAT64>, date_delta ARRAY<INT64>)
RETURNS FLOAT64
LANGUAGE js AS """
  min = 0.0;
  max = 1.0;
  do {
    guess = (min + max) / 2;
    NPV = 0.0;
    for (var j=0; j<cash_flow.length; j++){
      NPV += cash_flow[j]/Math.pow((1+guess),date_delta[j]/365);
    }
    if (NPV > 0){
      min = guess;
    }
    else {
      max = guess;
    }
  } while (Math.abs(NPV) > 0.00000001);
  return guess * 100;

""";

WITH Input AS
(
select
  cash_flow_date,
  date_diff(cash_flow_date, min(cash_flow_date) over (),day) as date_delta,
  cash_flow as cash_flow
from cash_flow_table
)

SELECT 
  cash_flow,
  date_delta,
  IRRCalc(Array<FLOAT64> [cash_flow], ARRAY<INT64> [date_delta]) as IRR
FROM Input;

And here's the table containing the raw data:

Row cash_flow_date date_delta cash_flow
1 2017-09-08 0 -159951.78265102694
2 2017-09-08 0 -9.272567110204461
3 2017-09-08 0 -1000.0
4 2017-09-08 0 -159951.78265102694
5 2017-09-27 19 3552.8711640094157
6 2017-09-27 19 -544.122218768042
7 2018-03-28 201 -576.4290755116443
8 2018-03-28 201 3763.8202775817454
9 2018-04-02 206 437225.5536144294

1 Answer 1

4

Can someone please have a look and shed some light?

to see the difference - just run your SELECT w/o UDF

SELECT 
  cash_flow,
  date_delta,
  ARRAY<FLOAT64> [cash_flow], 
  ARRAY<INT64> [date_delta]
FROM Input

As you can see here - for each row you create array with just one element in it - so actually two arrays with one element in each - that element that respectively belong to same row

when you do ARRAY(SELECT cash_flow FROM input), ARRAY(SELECT date_delta FROM input) you actually create arrays which with respective elements from all rows

finally - when you pass ARRAY with just one element in it - it looks like your while (Math.abs(NPV) > 0.00000001) always true thus loop runs forever

Something along these lines I think

Note: above answers your exact question - but you still most likely have issue with logic - if so - ask new specific question

Sign up to request clarification or add additional context in comments.

5 Comments

Your answer makes a lot of sense Mikhail. Thanks! With regard to the loop, i think all i need to do is define a iter_cnt and add it to do...while..., thus it's forced to stop when it runs certain numbers of loops. Something like while (Math.abs(NPV) > 0.00000001 && iter_cnt<10000);. Does that suffice? If not, I will open a new question so we can discuss in more details.
sorry i had a follow-up after i added comment.
I think this should be a new question as the one you initially asked is answered (at least from my prospective)
But, from the other hand - introducing counter - simple and should work for you - so you might want to simply try and there will be no needs in asking it :o)
Thanks. The counter worked though i wasn't sure how it would work in terms of massive data. I will post a new one if further issue. Thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.