Researching Linear Regression with MySQL (SQL Query)

Something I've never had to do before is put a trend line into a Cognos chart (actually it's going to be a future trend line). The Cognos chart is constructed from custom data in the OnCommand Insight DWH (Data WareHouse) MySQL database.

This is probably the most useful article I came across:

How to calculate the slope in SQL
https://stackoverflow.com/questions/9084761/how-to-calculate-the-slope-in-sql

Image: Markus Jarderots clean SQL linear regression solution

This looked like a very good article. Just a bit over my head (I'll try and expand my mind to accommodate it.)

Statistics in SQL: Simple Linear Regressions
https://www.red-gate.com/simple-talk/blogs/statistics-sql-simple-linear-regressions/


Random Other Things I Thought Interesting

MySQL insert only if a condition is true
https://stackoverflow.com/questions/39758422/mysql-insert-only-if-a-condition-is-true

Declare variable syntax invalid in MySQL Workbench?
https://stackoverflow.com/questions/21464786/declare-variable-syntax-invalid-in-mysql-workbench

The Winner: A Linear Regression SQL Query that Served My Purposes!

This came via a colleague. It works with the NetApp OnCommand Insight Data Warehouse (DWH) dwh_capacity schema and data_dimension table. I've sanitized (and highlighted) the name of the custom table we're pulling values from, and value name.

SELECT
  DATE(d.`fullDate`) AS `date`,
  COALESCE( UNIX_TIMESTAMP( d.`fullDate`) * t2.`slope` + t2.`intercept`, 0) AS 'SOMEVALUE'
FROM
  dwh_capacity.date_dimension AS d,
(
  SELECT
    (COUNT(*) * SUM(`x` * `y`) - SUM(`x`) * SUM(`y`)) / (COUNT(*) * SUM(POWER(`x`, 2)) - POWER(SUM(`x`),2)) AS 'slope',
    (SUM(`y`) * SUM(POWER(`x`, 2)) - SUM(`x`) * SUM(`x` * `y`)) / (COUNT(*) * SUM(POWER(`x`, 2)) - POWER(SUM(`x`), 2)) AS 'intercept'
  FROM
  (
    SELECT
      UNIX_TIMESTAMP(d.`fullDate`) AS 'x',
      SUM(dss.`SOMEVALUE`) AS 'y'
    FROM
      dwh_custom.SOMETABLE AS dss
    JOIN dwh_capacity.date_dimension AS d ON dss.dateTk = d.tk
    WHERE
      d.fullDate > DATE_SUB(NOW(), INTERVAL 12 MONTH)
      AND d.repWeek = 1
      AND dss.`COLOR` = 'SOMECOLOR'
    GROUP BY 1
  ) AS t1
) AS t2
WHERE
  d.fullDate > DATE_SUB(NOW(), INTERVAL 12 MONTH)
  AND d.`fullDate` <= '2020-12-01'
  AND d.repWeek = 1
GROUP BY 1

Comments