Keith Briggs

This page was last modified 2013-05-12  


·maths notes 
·software « 
·ex libris 
·site map 




This package is intended for processing of very large data sets via shell pipelines. The programs do not store the data. They are responses to the challenge: can one perform some of the standard computations of statistical data analysis (autocorrelation of a scalar time-series, covariance matrix of a set of vectors, and least-squares polynomials) if one receives the data points one at a time, and must process them and throw them away before receiving the next data point? Of course, all this must be done while preserving numerical stability. The three C programs I provide seem to achieve these aims for the three specific problems mentioned. The ideas could be relevant more generally to stream computing and distributed data analysis; see e.g.

Version 1.2 is 64-bit clean. A new feature is that the covariance program takes no arguments.

quick start

tar zvxf pipemath-1.2.tgz; cd pipemath-1.2; make


Lines in the data file starting with # are ignored.

  • autocorrelation:
    Computes the autocorrelation function of a scalar time series. Usage: cat datafile | autocorrelation [maxlag=20 [stride=1 [dt=1]]]
  • covariance:
    Computes the covariance matrix of a set of n-vectors. Usage: cat datafile | covariance or: covariance < datafile Each line of datafile has an n-vector. The value of n is determined by the number of items on the first line. All subsequent lines must have the same number of items.
  • lsqpoly:
    Fits a least-squares polynomial. Usage: cat datafile | lsqpoly [degree=1]. Each line of datafile has an x,y pair and an optional weight




   make test
   sudo make install
This website uses no cookies. This page was last modified 2013-05-12 10:17 by Keith Briggs.