Description of sgd

 Least-square stochastic gradient-descend fit.
 
 sgd(y,X,numtoselect,finalstepsize,convergencecriterion,checkerror,nonneg)
 
 <y> is p x 1 with the data

 <X> is p x q with the regressors

 <numtoselect> is the number of data points to randomly select on each
 iteration

 <finalstepsize> is like 0.05

 <convergencecriterion> is [A B C] where A is in (0,1), B is a positive
 integer, and C is number of percentages.
   
   We stop if we see a series of max(B,round(A*[current total-error-check
   number])) total-error-check iterations that do not improve performance
   on the estimation set, where improvement must be better by at least 1%
   of the previously marked R^2.
 
 <checkerror> is the number of iterations between total-error-checks

 <nonneg> if set to 1 costrains the solution to be positive
 
 For reference see : Kay et al. 2008 (Supplemental material)
 
 <alpha, lambda> ElasticNet parameters (optional, defaults to 1 and 0).
 The ElasticNet is a reguarization and variable selection algorithm. 
 The EN penalty is: 
 
 (y - X*w).^2) + lambda * sum(alpha * w.^2 + (1-alpha) * abs(w)) 

 Such that lambda sets the slope of the additional regularization error
 surface and alpha balances between the L1 and L2 constraints. When alpha
 is 1, the algorithm reduces to ridge regression. When alpha is 0, the
 algorithm reduces to the Lasso.
 
 Reference: Zou and Hastie (Zou & Hastie, (2005) Journal of the Royal
 Statistical Society B 67, Part 2, pp. 301-320)
 See also: Friedman, Hastie and Tibshirani (2008). The elements of
 statistical learning, chapter 3 (page 31, equation 3.54)


 Copyright Franco Pestilli and Kendrick Kay (2013) Vistasoft Stanford University.

0001 function [w, R2] = sgd(y,X,numtoselect,finalstepsize,convergencecriterion,checkerror,nonneg,alpha,lambda)
0002 % Least-square stochastic gradient-descend fit.
0003 %
0004 % sgd(y,X,numtoselect,finalstepsize,convergencecriterion,checkerror,nonneg)
0005 %
0006 % <y> is p x 1 with the data
0007 %
0008 % <X> is p x q with the regressors
0009 %
0010 % <numtoselect> is the number of data points to randomly select on each
0011 % iteration
0012 %
0013 % <finalstepsize> is like 0.05
0014 %
0015 % <convergencecriterion> is [A B C] where A is in (0,1), B is a positive
0016 % integer, and C is number of percentages.
0017 %
0018 %   We stop if we see a series of max(B,round(A*[current total-error-check
0019 %   number])) total-error-check iterations that do not improve performance
0020 %   on the estimation set, where improvement must be better by at least 1%
0021 %   of the previously marked R^2.
0022 %
0023 % <checkerror> is the number of iterations between total-error-checks
0024 %
0025 % <nonneg> if set to 1 costrains the solution to be positive
0026 %
0027 % For reference see : Kay et al. 2008 (Supplemental material)
0028 %
0029 % <alpha, lambda> ElasticNet parameters (optional, defaults to 1 and 0).
0030 % The ElasticNet is a reguarization and variable selection algorithm.
0031 % The EN penalty is:
0032 %
0033 % (y - X*w).^2) + lambda * sum(alpha * w.^2 + (1-alpha) * abs(w))
0034 %
0035 % Such that lambda sets the slope of the additional regularization error
0036 % surface and alpha balances between the L1 and L2 constraints. When alpha
0037 % is 1, the algorithm reduces to ridge regression. When alpha is 0, the
0038 % algorithm reduces to the Lasso.
0039 %
0040 % Reference: Zou and Hastie (Zou & Hastie, (2005) Journal of the Royal
0041 % Statistical Society B 67, Part 2, pp. 301-320)
0042 % See also: Friedman, Hastie and Tibshirani (2008). The elements of
0043 % statistical learning, chapter 3 (page 31, equation 3.54)
0044 %
0045 %
0046 % Copyright Franco Pestilli and Kendrick Kay (2013) Vistasoft Stanford University.
0047 
0048 % Set the default for input params:
0049 if notDefined('convergencecriterion'), convergencecriterion = [.15 3 5]; end 
0050 if notDefined('numtoselect'), numtoselect = 0.1 * size(y,1); end
0051 if notDefined('checkerror'), checkerror=40; end
0052 if notDefined('finalstepsize'), finalstepsize=0.05; end
0053 if notDefined('nonneg'), nonneg = false; end
0054 if notDefined('coordDescent'), coordDescent = false; end
0055 
0056 % Set default values for ElasticNet (defaults to regular OLS):
0057 if notDefined('alpha'), alpha   = 0; end 
0058 if notDefined('lambda'), lambda = 0; end
0059 
0060 p = size(y,1);  % number of data points
0061 q = size(X,2);  % number of parameters
0062 orig_ssq = full(sum((y).^2)); % Sum of Squres fo the data
0063 
0064 % initialize various variables used in the fitting:
0065 w          = 0 .* rand(q,1); % the set of weights, between 0 and .1
0066 w_best     = w;          % The best set of weights.
0067 est_ssq_best = inf;      % minimum estimation error found so far
0068 estbadcnt  = 0;          % number of times estimation error has gone up
0069 iter       = 1;          % the iteration number
0070 cnt        = 1;          % the total-error-check number
0071   
0072 % report
0073 fprintf('[%s] Performing fit | %d measurements | %d parameters | ',mfilename,p,q);
0074 
0075 % Start computing the fit.
0076 while 1
0077   % Indices to selected signal and model
0078   ix      = randi(p,1,numtoselect); % Slower indexing method
0079   ix2     = false(p,1);
0080   ix2(ix) = true;
0081 
0082   % select the subset of signal and model to use for fitting
0083   y0 = y(ix2);
0084   X0 = X(ix2,:);
0085   
0086   % if not the first iteration, adjust parameters
0087   if iter ~= 1
0088     % Calculate the gradient (change in error over change in parameter):
0089     grad = -((y0 - X0*w)' * X0)' + lambda * (alpha + 2*(1 - alpha)*w);
0090     
0091     % This computes the coordinate descent instead of the gradient descent.
0092     if coordDescent
0093         % Coordinate descent
0094         m    = min(grad);
0095         grad = (grad==m)*m;
0096     end
0097     
0098     % Unit-length normalize the gradient
0099     grad = unitlengthfast(grad);
0100     
0101     % Perform gradient descent
0102     w = w - finalstepsize*grad;
0103         
0104     % Non-negative constrain, we set negative weights to zero
0105     if ( nonneg ), w(w<0) = 0;end
0106   end
0107   
0108   % check the total error every so often
0109   if mod(iter,checkerror) == 1
0110     % Curent estimated sum of the squares of the residuals (SSQ)
0111     est_ssq = sum((y - X*w).^2) + lambda*(alpha*sum(w) + (1-alpha)*sum(w.^2));
0112      
0113     % Check if the SSQ improved
0114     isimprove = est_ssq < est_ssq_best; 
0115             
0116     % We keep fitting if the SSQ is not Inf OR some percent smaller than
0117     % the best SSQ obtained so far
0118     %keepfitting = isinf(est_ssq_best) | (est_ssq < ((est_ssq_best - min_ssq)));
0119     keepfitting = isinf(est_ssq_best) | (est_ssq < ((est_ssq_best * (1-convergencecriterion(3)/100))));
0120 
0121     % do we consider this iteration to be the best yet?
0122     if isimprove
0123       % The SSQ was smaller, the fit improved.
0124       w_best       = w;       % Set the current to be the best so far
0125       est_ssq_best = est_ssq; % The min error
0126       
0127       % OK we improved, but check whether improvement is too small to be
0128       % considered useful.
0129       if keepfitting
0130         % THe fit improved more than the minimum accptable improvement.
0131         % Reset the counter fo rthe bad fits, so that we start over
0132         % checking for stopping.
0133         estbadcnt  = 0;
0134         %est_ssq_best = est_ssq; %
0135       else
0136         estbadcnt = estbadcnt + 1;
0137       end
0138     else
0139       % The fit actually was bad, SSQ increases count how many bad fit we had. Stop after a centrain number.
0140       estbadcnt = estbadcnt + 1;
0141     end
0142     
0143     % stop if we haven't improved in a while
0144     if estbadcnt >= max(convergencecriterion(2),round(convergencecriterion(1)*cnt))
0145       R2 = 100*(1-(est_ssq_best/orig_ssq));
0146       fprintf(' DONE fitting | SSQ=%2.3f (Original SSQ=%2.3f) | Rzero-squared %2.3f%%.\n',...
0147               est_ssq_best,orig_ssq,R2);
0148       break;
0149     end
0150     
0151     % Update the counter
0152     cnt = cnt + 1;
0153   end
0154   iter = iter + 1;
0155 end
0156 
0157 % prepare output
0158 w = w_best;
0159 
0160 function [v,len] = unitlengthfast(v,dim)
0161 
0162 % function [v,len] = unitlengthfast(v,dim)
0163 %
0164 % <v> is a vector (row or column) or a 2D matrix
0165 % <dim> (optional) is dimension along which vectors are oriented.
0166 %   if not supplied, assume that <v> is a row or column vector.
0167 %
0168 % unit-length normalize <v>.  aside from input flexibility,
0169 % the difference between this function and unitlength.m is that
0170 % we do not deal with NaNs (i.e. we assume <v> does not have NaNs),
0171 % and if a vector has 0 length, it becomes all NaNs.
0172 %
0173 % we also return <len> which is the original vector length of <v>.
0174 % when <dim> is not supplied, <len> is a scalar.  when <dim> is
0175 % supplied, <len> is the same dimensions as <v> except collapsed
0176 % along <dim>.
0177 %
0178 % note some weird cases:
0179 %   unitlengthfast([]) is [].
0180 %   unitlengthfast([0 0]) is [NaN NaN].
0181 %
0182 % example:
0183 % a = [3 0];
0184 % isequalwithequalnans(unitlengthfast(a),[1 0])
0185 
0186 if nargin==1
0187   len = sqrt(v(:).'*v(:));
0188   v = v / len;
0189 else
0190   if dim==1
0191     len = sqrt(sum(v.^2,1));
0192     v = v ./ repmat(len,[size(v,1) 1]);  % like this for speed.  maybe use the indexing trick to speed up even more??
0193   else
0194     len = sqrt(sum(v.^2,2));
0195     v = v ./ repmat(len,[1 size(v,2)]);
0196   end
0197 end

sgd

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SUBFUNCTIONS

SOURCE CODE