0

I am trying to rewrite a linear regression script (that I found i a thread here) to become a function, and I get the following error when I run the script:

Msg 156, Level 15, State 1, Procedure fn_LinearRegression, Line 9 Incorrect syntax near the keyword 'WITH'. Msg 319, Level 15, State 1, Procedure fn_LinearRegression, Line 9 Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon. Msg 156, Level 15, State 1, Procedure fn_LinearRegression, Line 12 Incorrect syntax near the keyword 'AS'. Msg 102, Level 15, State 1, Procedure fn_LinearRegression, Line 18 Incorrect syntax near ','. Msg 102, Level 15, State 1, Procedure fn_LinearRegression, Line 28 Incorrect syntax near ','. Msg 102, Level 15, State 1, Procedure fn_LinearRegression, Line 36 Incorrect syntax near ','.

Here is the function:

    CREATE Function dbo.fn_LinearRegression 
(@groupID varchar(50), @x int, @y float)
RETURNS @regtable TABLE(a FLOAT, b FLOAT)
AS 
--
WITH some_table as (
select @groupID, @x, @y from TABLENAME -- replace table),

/*WITH*/ mean_estimates AS
(   SELECT GroupID
          ,AVG(x)                                                  AS xmean
          ,AVG(y)                                                  AS ymean
    FROM some_table pd
    GROUP BY GroupID
),
stdev_estimates AS
(   SELECT pd.GroupID
          -- T-SQL STDEV() implementation is not numerically stable
          ,CASE      SUM(SQUARE(x - xmean)) WHEN 0 THEN 1 
           ELSE SQRT(SUM(SQUARE(x - xmean)) / (COUNT(*) - 1)) END AS xstdev
          ,     SQRT(SUM(SQUARE(y - ymean)) / (COUNT(*) - 1))     AS ystdev
    FROM some_table pd
    INNER JOIN mean_estimates  pm ON pm.GroupID = pd.GroupID
    GROUP BY pd.GroupID, pm.xmean, pm.ymean
),
standardized_data AS                   -- increases numerical stability
(   SELECT pd.GroupID
          ,(x - xmean) / xstdev                                    AS xstd
          ,CASE ystdev WHEN 0 THEN 0 ELSE (y - ymean) / ystdev END AS ystd
    FROM some_table pd
    INNER JOIN stdev_estimates ps ON ps.GroupID = pd.GroupID
    INNER JOIN mean_estimates  pm ON pm.GroupID = pd.GroupID
),
standardized_beta_estimates AS
(   SELECT GroupID
          ,CASE WHEN SUM(xstd * xstd) = 0 THEN 0
                ELSE SUM(xstd * ystd) / (COUNT(*) - 1) END         AS betastd
    FROM standardized_data
    GROUP BY GroupID
)
SELECT pb.GroupID
      ,ymean - xmean * betastd * ystdev / xstdev                   AS Alpha
      ,betastd * ystdev / xstdev                                   AS Beta
      ,CASE ystdev WHEN 0 THEN 1 ELSE betastd * betastd END        AS R2
      ,betastd                                                     AS Correl
      ,betastd * xstdev * ystdev                                   AS Covar

into TT_Auto_Temp_LM -- REPLACE TABLE
FROM standardized_beta_estimates pb
INNER JOIN stdev_estimates ps ON ps.GroupID = pb.GroupID
INNER JOIN mean_estimates  pm ON pm.GroupID = pb.GroupID;

--
Insert into @regtable ([A],[B]) VALUES (Alpha, Beta)

RETURN

I only have two outputs, as I only need Alpha and Beta.

2
  • 2
    ), missing after first CTE and as per my knowledge DML statements not allow in function if you are using SQL Server Commented Sep 22, 2016 at 8:24
  • I don't se a ), missing and I have a function in use that has a INSERT INTO statement Commented Sep 22, 2016 at 8:38

1 Answer 1

1

First and foremost you have syntax errors generated by the commenting out of the close bracket and comma on line which need to be on a new line:

select @groupID, @x, @y from TABLENAME -- replace table),

More importantly though, this needs to be a stored procedure as you are doing an insert into a table and then trying to select data from it (? this isn't actually clear from your code) which you can't do in a function.

Per the documentation: https://technet.microsoft.com/en-us/library/ms191320.aspx

User-defined functions cannot be used to perform actions that modify the database state.

Essentially, in a function you can only select data.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks iamdave. You are right. I actually made the script work but then I found out that it calculates wrong :(.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.