R
If staff capacity You're referring. scipy and numpythe confidence intervals for linear regression can be found in several ways.Whatever method you choose, you need to know the parameter itself to calculate the confidence interval (e.g., aand its quadramatical deviation (pest to be a_err)Confidence- with confidence-building alpha Calculate by the distribution of the Suitor:conf_int = scipy.stats.t.interval(1-alpha, df=n-2, loc=a, scale=a_err)
If only a half-century of the interval is to be found, the value that is raised after ±, it shall be calculated as follows:plus_minus = abs(sps.t.ppf(alpha/2, n-2))*a_err
Now how to find the parameters and their mistakes.Means linregress https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html#scipy.stats.linregress Specialized method for calculating linear regression.import scipy.stats as sps
n = len(x)
lin_model = sps.linregress(x, y)
a,b = lin_model.slope, lin_model.intercept
оценка ср.кв. ошибки для a и b
a_err, b_err = lin_model.stderr, lin_model.intercept_stderr
Доверительный интервал для alpha=5%
a_conf = sps.t.interval(0.95, df = n-2, loc=a, scale=a_err)
b_conf = sps.t.interval(0.95, df = n-2, loc=b, scale=b_err)
print(f"a = {a:0.4f}, α=5% [{a_conf[0]:0.4f} - {a_conf[1]:0.4f}]")
print(f"b = {b:0.4f}, α=5% [{b_conf[0]:0.4f} - {b_conf[1]:0.4f}]")
Result for 100 points in direct direction y=0.5x+2 accidental error sigma=0.5:a = 0.5291, α=5% [0.4971 - 0.5610]
b = 1.8568, α=5% [1.6718 - 2.0418]
Universal instrument curve_fit♪ scipy There is a universal instrument to approach the set of points of the given model https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html ♪ This function seeks the best set of parameters by the least squares for any type of model, not just linear. In addition to the optimum values of the parameters, the function reverts to the convulsion matrix, whose diagonal elements assess the dispersion of the parameters.import numpy as np
import scipy.stats as sps
import scipy.optimize as spo
def linear(x, a,b):
return a*x+b
((a,b), cov) = spo.curve_fit(linear, xdata=x, ydata=y)
a_err, b_err = np.sqrt(np.diag(cov))
a_conf = sps.t.interval(0.95, df = n-2, loc=a, scale=a_err)
b_conf = sps.t.interval(0.95, df = n-2, loc=b, scale=b_err)
Output for the same data set:a = 0.5291, α=5% [0.4971 - 0.5610]
b = 1.8568, α=5% [1.6718 - 2.0418]
As can be seen, the result is the same as for specialized equipment.Direct calculationLinear regression parameters by formulas can be directly calculated# Вычисление параметров модели
sum_x = x.sum()
sum_y = y.sum()
sum_xy = (xy).sum()
sum_x_sq = (xx).sum()
a = (nsum_xy - sum_xsum_y)/(nsum_x_sq - sum_xsum_x)
b = (sum_ysum_x_sq - sum_xsum_xy)/(nsum_x_sq - sum_xsum_x)
вычисление ошибки параметров
u = y - (a*x+b)
u_avg = np.mean(u)
sigma_square = 1.0/(n-2)*np.sum((u - u_avg)**2)
x_mean = np.mean(x)
dx_square = np.sum((x-x_mean)**2)
a_err = np.sqrt(sigma_square/dx_square)
b_err = np.sqrt(sigma_square*(1.0/n + np.mean(x)**2/dx_square))
Productivity measurementsLinear regression for 100 pointsDirect calculation: 51.6 μs ± 2.47linregress : 207 μs ± 2.67 μscurve_fit: 237 μs ± 8.29 μsLinear regression on schedule. Example https://github.com/pakuula/StackOverflow/blob/main/python/1283898/linear_reg.ipynb Like jupyter notebook.