next up previous
Next: More Statistical Functions - Up: Data Analysis Previous: Data Analysis


Simple Statistical Functions - Mean, S.D., Covariance, Difference

One of the first things you need to do while working with large number of data points is to define the data else where i.e., in a different file. This not only helps in having cleaner and more readable code but also modularizes the whole process! Here we show how you can store tabular data to be used as a matrix (first code listing) and also how you can load that kinda data (First line in the second code listing).

A brief discussion of what's being done follows the code listing...

% File: data_for_stat.dat
1 1 1
1 2 1
1 2 2 
1 2 5
2 1 4


% Hint: use ; at end of line to prevent debug'ish output from being printed

load data_for_stat.dat

% Number of rows and columns
[n, p] = size(data_for_stat);
t = 1:n;                % Where you might use such information


% Really simple stat functions

% Max/Min -- Returns a row of max/min elements
X = max(data_for_stat)
X = min(data_for_stat)

% Max/Min -- One value from entire dataset
%   data_for_stat(:) rearranges the m x n  matrix into a mn x 1 column vector.
X = max(data_for_stat(:))           

% Mean and Standard Deviation
mu = mean(data_for_stat)
sigma = std(data_for_stat)

% Further??? 
% For example if you wanted (x-mu) for each element, say to compute a normal distr 
unity = ones(n,1)          %% <---- Note Use of ones( , )
x = data_for_stat - unity * mu


% Correlation and Covariance

covar = cov(data_for_stat(:,1))
corr  = corrcoef(data_for_stat)

As mentioned earlier we load some data from a file called ``data_for_stat.dat''. Note that once a .dat file is loaded, we drop the extension (.dat) in future references and use ``data_for_stat'' just like we would do in the case of a variable.

The size() function returns the number of rows and columns present in the data. (Consider the data points to form rows and columns of a matrix).

Then we have shown some basic functions like Min(), Max() and variants in their usage. Just using the data file name (yeah without the extension) returns a row of min/max values. If we need the min/max value in the entire data set then we need to use data_for_stat(:) meaning all rows and columns. In general (int:int) corresponds to rows:columns. If either integer is left blank then it means ``all'' in the corresponding dimension.

After that we have shown how the functions mean(), std(), covar() and corrcoef() can be used to compute mean, standard deviation, covariance and the correlation coefficient respectively.

A couple of points to be noted:

X =
     2     2     5

X =
     1     1     1

X =
     5

mu =
    1.2000    1.6000    2.6000

sigma =
    0.4472    0.5477    1.8166

unity =
     1
     1
     1
     1
     1

x =
   -0.2000   -0.6000   -1.6000
   -0.2000    0.4000   -1.6000
   -0.2000    0.4000   -0.6000
   -0.2000    0.4000    2.4000
    0.8000   -0.6000    1.4000

covar =
    0.2000

corr =

    1.0000   -0.6124    0.4308
   -0.6124    1.0000    0.0503
    0.4308    0.0503    1.0000


next up previous
Next: More Statistical Functions - Up: Data Analysis Previous: Data Analysis
Arvind Gopu 2006-03-24