Simple Statistical Functions - Mean, S.D., Covariance, Difference

A brief discussion of what's being done follows the code listing...

% File: data_for_stat.dat 1 1 1 1 2 1 1 2 2 1 2 5 2 1 4

% Hint: use ; at end of line to prevent debug'ish output from being printed load data_for_stat.dat % Number of rows and columns [n, p] = size(data_for_stat); t = 1:n; % Where you might use such information % Really simple stat functions % Max/Min -- Returns a row of max/min elements X = max(data_for_stat) X = min(data_for_stat) % Max/Min -- One value from entire dataset % data_for_stat(:) rearranges the m x n matrix into a mn x 1 column vector. X = max(data_for_stat(:)) % Mean and Standard Deviation mu = mean(data_for_stat) sigma = std(data_for_stat) % Further??? % For example if you wanted (x-mu) for each element, say to compute a normal distr unity = ones(n,1) %% <---- Note Use of ones( , ) x = data_for_stat - unity * mu % Correlation and Covariance covar = cov(data_for_stat(:,1)) corr = corrcoef(data_for_stat)

As mentioned earlier we load some data from a file called ``data_for_stat.dat''. Note that once a .dat file is loaded, we drop the extension (.dat) in future references and use ``data_for_stat'' just like we would do in the case of a variable.

The size() function returns the number of rows and columns present in the data. (Consider the data points to form rows and columns of a matrix).

Then we have shown some basic functions like Min(), Max() and variants in their usage. Just using the data file name (yeah without the extension) returns a row of min/max values. If we need the min/max value in the entire data set then we need to use data_for_stat(:) meaning all rows and columns. In general (int:int) corresponds to rows:columns. If either integer is left blank then it means ``all'' in the corresponding dimension.

After that we have shown how the functions mean(), std(), covar() and corrcoef() can be used to compute mean, standard deviation, covariance and the correlation coefficient respectively.

A couple of points to be noted:

- The use of data_for_stat(int:int) is again shown in a couple of places
- A function ones(rows, columns) has been used - it generates a matrix with ones in it. Umm just in case you are thinking, how is this different from an identity matrix? :) Think again!
- I have also shown the use of multiplying a scalar to a vector to generate a new vector (which is weighted according to the mean in this case)

X = 2 2 5 X = 1 1 1 X = 5 mu = 1.2000 1.6000 2.6000 sigma = 0.4472 0.5477 1.8166 unity = 1 1 1 1 1 x = -0.2000 -0.6000 -1.6000 -0.2000 0.4000 -1.6000 -0.2000 0.4000 -0.6000 -0.2000 0.4000 2.4000 0.8000 -0.6000 1.4000 covar = 0.2000 corr = 1.0000 -0.6124 0.4308 -0.6124 1.0000 0.0503 0.4308 0.0503 1.0000