I've been asked to add a future trend line into some custom reporting I'm doing with NetApp OnCommand Insight (OCI). I've not dug around much so far, but the keyword appears to be "Linear Regression."
According to this post:
https://news.ycombinator.com/item?id=16194163
"MySQL is perfectly capable of calculating a linear regression..."
The SQL is a bit beyond me at the moment though:
SELECT
@a_count := avg(count) as mean_count,
@a_weeks := avg(`week`) as mean_weeks,
@covariance := (sum(`week` * `count`) - sum(`week`) * sum(`count`) / count(`week`)) / count(`week`) as covariance,
@stddev_count := stddev(`count`) as stddev_count,
@stddev_week := stddev(`week`) as stddev_week,
@r := @covariance / (@stddev_count * @stddev_week) as r,
@slope := @r * @stddev_count / @stddev_week as slope,
@y_int := @a_count - (@slope * @a_weeks) as y_int,
@this_week_no := timestampdiff(WEEK, (select min(`date`) from dataset), curdate()) as this_week_no,
@predicted := round(greatest(1, @y_int + (@slope * @this_week_no))) as predicted
FROM (SELECT timestampdiff(WEEK, (select min(`date`) from dataset), `date`) as week, count(date) as count FROM dataset group by WEEK(date)) series;
One other option might be pre-processing data before it is injested to the OCI DWH (I already use a bit of PowerShell to process the data). This looks promising/interesting:
Calculate linear trend from an number array in Powershell
https://gallery.technet.microsoft.com/scriptcenter/Calculate-trend-from-an-ca09a0ca
It's not a future trend line but I might be able to turn it into that. I didn't think the code looked that scary either (mostly comment):
function Get-Trendline
{
# SYNOPSIS
# Calculate the linear trend line from a series of numbers
# DESCRIPTION
# Assume you have an array of numbers
# $inarr = @(15,16,12,11,14,8,10)
# and the trendline is represented as
# y = a + bx
# where:
# y is the element in the array
# x is the index of y in the array.
# a is the intercept of trendline at y axis
# b is the slope of the trendline
#
# Calling the function with
# PS> Get-Trendline -data $inarr
# will return an array of a and b.
# PARAMETER data
# A one dimensional array containing the series of numbers
# EXAMPLE
# Get-Trendline -data @(15,16,12,11,14,8,10)
param ($data)
$n = $data.count
$sumX=0
$sumX2=0
$sumXY=0
$sumY=0
for ($i=1; $i -le $n; $i++) {
$sumX+=$i
$sumX2+=([Math]::Pow($i,2))
$sumXY+=($i)*($data[$i-1])
$sumY+=$data[$i-1]
}
$b = ($sumXY - $sumX*$sumY/$n)/($sumX2 - $sumX*$sumX/$n)
$a = $sumY / $n - $b * ($sumX / $n)
return @($a,$b)
}
To be continued...
Image: Linear Trend
According to this post:
https://news.ycombinator.com/item?id=16194163
"MySQL is perfectly capable of calculating a linear regression..."
The SQL is a bit beyond me at the moment though:
SELECT
@a_count := avg(count) as mean_count,
@a_weeks := avg(`week`) as mean_weeks,
@covariance := (sum(`week` * `count`) - sum(`week`) * sum(`count`) / count(`week`)) / count(`week`) as covariance,
@stddev_count := stddev(`count`) as stddev_count,
@stddev_week := stddev(`week`) as stddev_week,
@r := @covariance / (@stddev_count * @stddev_week) as r,
@slope := @r * @stddev_count / @stddev_week as slope,
@y_int := @a_count - (@slope * @a_weeks) as y_int,
@this_week_no := timestampdiff(WEEK, (select min(`date`) from dataset), curdate()) as this_week_no,
@predicted := round(greatest(1, @y_int + (@slope * @this_week_no))) as predicted
FROM (SELECT timestampdiff(WEEK, (select min(`date`) from dataset), `date`) as week, count(date) as count FROM dataset group by WEEK(date)) series;
Calculate linear trend from an number array in Powershell
https://gallery.technet.microsoft.com/scriptcenter/Calculate-trend-from-an-ca09a0ca
It's not a future trend line but I might be able to turn it into that. I didn't think the code looked that scary either (mostly comment):
function Get-Trendline
{
# SYNOPSIS
# Calculate the linear trend line from a series of numbers
# DESCRIPTION
# Assume you have an array of numbers
# $inarr = @(15,16,12,11,14,8,10)
# and the trendline is represented as
# y = a + bx
# where:
# y is the element in the array
# x is the index of y in the array.
# a is the intercept of trendline at y axis
# b is the slope of the trendline
#
# Calling the function with
# PS> Get-Trendline -data $inarr
# will return an array of a and b.
# PARAMETER data
# A one dimensional array containing the series of numbers
# EXAMPLE
# Get-Trendline -data @(15,16,12,11,14,8,10)
param ($data)
$n = $data.count
$sumX=0
$sumX2=0
$sumXY=0
$sumY=0
for ($i=1; $i -le $n; $i++) {
$sumX+=$i
$sumX2+=([Math]::Pow($i,2))
$sumXY+=($i)*($data[$i-1])
$sumY+=$data[$i-1]
}
$b = ($sumXY - $sumX*$sumY/$n)/($sumX2 - $sumX*$sumX/$n)
$a = $sumY / $n - $b * ($sumX / $n)
return @($a,$b)
}
To be continued...
Image: Linear Trend
Comments
Post a Comment