Mathematical Methods:
pfh id @ synapse9.com  10/1998  05/08

This is about using time series data, shapes of change, as an aid to identifying individual underlying systems and their behaviors.   It's math to help you see what you're looking at, lenses of different kinds.   The two basic kinds of processes, developmental and cyclic, have different shapes with multiple kinds of internal and extraneous fluctuation.  Those tend to be poorly represented and to be superimposed with effects of various other distracting processes any series of measures.   Most time series data will display a variety of all kinds, on various scales, with different timing, like a junk yard of shapes.  It's all 'evidence', though, and none of it 'noise' from any extraneous source.   Filtering out the useless information from the useful calls for different kinds of treatments to help reveal what behavior made the shapes of interest.   That's sort of the 'null hypothesis', giving the analysis something like a forensic approach.   The usual scientific method where the object is producing a 'formula' to use in place of the data, is a step you might or might not do toward the very end of the process.   The main job in studying individual events is to identify what combination of behaviors you're looking at, and a way to tell their story.
 
Often times the most important 'analytical' step is to play with the data, trying one thing after another until, say, a particular scale or type of smoothing or type of median curve, suggests an underlying process that makes sense in the particular environment.   Then you start building a case, knowing very well that just most curve shapes could be produced by combinations of lots of different things.   For example, it's very common what you start with is miss-aggregated data.  For example you may have data for a city that merges all the neighborhoods, each of which may have significantly different behavior.   Maybe you see it makes no sense and then go get the individual neighborhood data and combine the ones that look a bit alike giving you resulting larger community curves that seem to have explanatory power for the perceived events.   Often I'm not that smart, and my best discoveries come about by accident.   With my gamma ray burst data the files were so large that I reduced the data by extracting every 6th point.   When I found a remarkable complex combination of developmental and cyclic fluctuation, I realized I had an excellent ready way to see which ones were real by overlaying each filtering to see which shapes turned up in all the six subsets.   Whatever helps you tease the story out of the history is the main purpose of the curve study methods.  

Sometimes you'll find data that is clearly well localized to a single developmental process, and then careful local smoothing that does not alter the shape is an great assist in making the curve differentiable and precisely locating the first and second derivative inflection points.    

(note: these notes are mostly from a 1998 edit and needs work, I removed the faulty 'DAR' test mentioned in some of the application studies)

Methods:
  1. Introduction to the statistical method
  2. Dynamic Mean underlying process curves
  3. Defining Derivatives for functions and sequences
  4. Proportional Walks differentiable curves without formulas
  5. Basic Technique of Derivative Reconstruction (DR)
  6. Tests of Confidence
  1. Notes on the Craft of Modeling

Introduction to DR statistical & analysis methods
Derivative Reconstruction is presently limited to the investigation of single-valued sequences , a list of number pairs, treated as solution sets of implied continuous curves. These are usually time-series. Most mathematical work on sequences treats them as representing statistical behaviors using equations with stochastic variables. Here, after passing the tests, sequences are treated as a sampling of measures of a continuous behavior and represented mathematically as having derivative continuity with a structure called a proportional walk.

The principle benefit of representing a data sequence with a proportional walk is that less of the information in the data is lost in translation. A large amount of information is lost in statistical curve fitting because only the constant features the researcher thinks of including are present in the structure of the formula. Proportional walks include all kinds of dynamics, of known and unknown origin, including transients and behavioral transitions on multiple scales. It represents them in a form that is more readily comprehended by direct inspection, though an experienced observer will be then able to see many of the same features by direct inspection of the data itself.

By including the transients and behavioral transitions proportional walks generated from a sequence serve to identify behavioral changes that would require new equations to describe. This provides an efficient kind of hypothesis generator regarding the structures of the natural phenomena being observed.

DR functions take a numerical sequence as input and produce a corresponding sequence of values as output, drawing a curve from a curve. The analytical package is available as a collection of AutoLISP routines called CURVE for use in the graphical database AutoCAD. The principle difference from other curve fitting techniques, such as the least squares autoregressions, is that DR fits the curve according only to the smoothness of the path, and ignores entirely its distance from some preconceived mathematical curve. Thus it produces curves approximating both the scale and the dynamics of the data, not just getting to similar points, but also getting there in similar ways.

To do this one needs to learn how a derivative is defined in functions and how to adapt that definition to sequences. In the absence of other reason to believe that a sequence reflects the derivative continuities of a physically continuous process, one needs statistical measures to determine if a sequence displays that pattern, and that the presence of a physically continuous process is implied. Once a sequence is represented by a proportional walk various tests can be used to measure how well it represents the data or determine its dynamic and scalar similarity to other results.

  • Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP

  • Dynamic Mean (Underlying Process Curves) Experiments in progress
  • It is often helpful to strip away distracting fluctuations, rather than suppress them by smoothing.    Sometimes the fluctuations are the process, but often they're processes of entirely different kinds superimposed on, and quite separate from, the process you want to expose and study.    If the fluctuations are not part of the process, just a distraction, then it really distorts the subject of study if you just merge them into it.     Typical mathematical regression curve fitting does accomplish some of this intent too, of course, though you loose entirely any of the local developmental or individualistic shapes in the data that correspond directly to the processes that produce them.  Two main routines in c:tlin for Autolisp worked well in various instances.

    1. tracing the peaks or troughs for upper and lower bound curves or to strip one sided fluctuations
    2. tracing the inflection points of a given scale of symmetric fluctuation.  That threads through the center of shapes that represented waves in or superimposed on a developmental process, for example.   Derivative smoothing was often used beforehand to more accurately locate the inflection points if the fluctuations when the shapes to be stripped clearly had flowing shape.    An example of this is in the reverse test example.
    3. manually guiding either of the above to skip over outliers or  skip over false peaks, troughs, or inflection points the simple rules of the routine would pick.

    Typically after stripping fluctuations the number of points in the sequence is greatly reduced.   Depending on the subject, points were often interpolated and derivative smoothing then done to produce a curve that visually passed through the original fluctuations on their apparent neutral path.   This is subjective, of course, and a little pains taking, but seems to be a more accurate method of removing distracting shapes that no mathematical routing could define.


  • Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP

  •  
    Defining derivatives for functions and sequences Experiments in progress
    The key for reconstructing an image of events used here is derivative continuity, the relationship of neighboring values required for functions of real numbers that satisfy the fundamental theorem of calculus.

    For mathematical functions having a derivatives is well defined, based on whether the rates of change of a function approach the same value at points successively closer to a given point from both sides. This is also known as the test for derivative continuity. The criteria for derivative continuity is much more restrictive than for simple continuity. The latter only requires that a function be defined for all values of the variables (not having gaps in the coordinates of the variable) and that the values of the function approach each other when its variables do (not having gaps in the coordinates of the function). Derivative continuity in a function also means not having abrupt rates of change (not having gaps in the accelerations), i.e. following a smooth curve. These things have been very well worked out for a long time (Courant & Robbins 1941).

    The problem with extending this concept to either physical processes or sequential measurements of them is with the gaps in nature and in measurement. Both data and physical processes are completely fragmented. Every measurement is an isolated value with no ultimate near-by values approaching from any direction. Physical processes are much the same. Surfaces are mostly composed of holes, lines of spaces, and regular behaviors of intermittent smaller scale processes. Nature and all our information about it is largely composed of gaps, broken chains presenting the regularities of the world as completely discontinuous.

    There are also lots of sequences that appear to flow so smoothly it's hard to see it anything else, like a movie. There is also the marvel of classical physics, that nature's apparent fragmentation can be considered as if following perfectly continuous differentiable functions. It is even possible to derive from the conservation laws a principle that all physical processes must, at root, satisfy differential continuity (Henshaw 1995). Even for quantum mechanics, discounting that the principle concerns of QM are probabilistic events beyond the realm of physical process, it now seems that the quantum mechanical events that do materialize may still conform to classical mechanics (Lindley 1997). This suggests that not only is QM perhaps consistent with the continuous world, but might also require it, and the differential continuity of physical properties that classical mechanics implies.

    Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP


    Defining derivatives for sequences
    There are a limited variety of ways to define a rules for adding points to a sequence allowing them to satisfy the same requirements for derivatives as functions.   One requirement would be to not add to the complexity of the shape.   In a sense smooth curves are the opposite of 'random walks' in that within a small enough neighborhood successive points have proportionally progressing step differences rather than randomly changing step differences.    Using the term 'proportional walk' for such curves suggests this important property, but is not ideal.   The mathematics has to do with finding points on a smooth path joining sequences of other points, but nothing to do constructing a 'walk' from the end of the curve as the term 'random walk' refers to.   For 'proportional walks' it's the way the physical system picks a 'next point' of change while 'standing' at the end of the historic curve that constitutes the 'walk', generating shapes that either are or are not proportionally related to those of the past and future.    'Proportional walk', then, really refers to what continuous physical systems seem to do, by means unknown, not the mathematical construction used here to make improved approximations of their shapes.    Perhaps they could be called parsimonious derivative interpolations (PDI's) to be more accurate.    Methods of Derivative & Integral Interpolation.Download Adobe Reader

    Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP


    Implementation Procedure & Technique
    The implimentation of the concepts described above was developed using AutoLisp programming in Autocad.  From within AutoCAD, a data sequence is read from a text file and plotted as a curve, creating an entity type called a 'polyline'. As with all DR operators, reference information from the source is attached to the output curve as 'xdata' so that all source information operations are recorded. After visual inspection and consideration of the subject an experimental strategy is planned, tested and then refined.

    The principle strategic task is to correctly identify where the rates of change of the underlying behavior reverse. If the behavior is expected to have been smoothly changing, but there are few data points, most of the inflection points in the behavior will have occurred somewhere in-between. The first curve to construct would then be one keeping the original data points and adding new points were a they would be predicted given the assumption of there being a regular progression of derivative rates. The function that does this is called DIN, for derivative interpolation.

    If, on the other hand, there is an abundance of data containing fairly clear trends but small scale erratic variation hides all the larger scale inflection points, then either one or another kind of local averaging might be used as the first step. The least distorting kind of local averaging is double derivative smoothing, DDSM. Both DIN and DDSM work by comparing the third derivatives calculated from the first four of five adjacent points with that calculated from the last four of the same five points, and adjusting the middle point to make the two third derivatives equal.

    Once the best possible representation of the smallest scale of regular fluctuation is constructed the next larger scale of fluctuations in the data is isolated by using TLIN to draw a curve through the inflection points of the small scale fluctuations, constructing a dynamic trend line. This might be followed by subsequent use of DIN and DDSM, and then repeated, until the resultant is a smooth monotonic centroid, a curve without fluctuations that closely approximates both the scale and dynamics of the original data, its central dynamic trend. That completes the first major step. The derivatives of this curve will display a number of definite predictions about the nature of the physical behavior being studied.

    More information on the individual command operators is found in drtools.pdf Download Adobe Reader, a selection of DR commands in AutoLISP are available in Curve.zip . (for AutoCad 13 or earlier)

    Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP


    Statistical Tests & Confidence Measures
    There are three kinds specific scientific confidence required to accept analytical results of this kind, 1) confidence in associating curve shapes with physical processes, 2) confidence in the accuracy of the shape, and 3) confidence that observations of a single occurrence are relevant elsewhere.  There is also a matter of human comfort with new ideas, that is harder to address.  Because this approach departs from long standing methods, and tends to suggest entirely unexpected physical structures and relationships.  Consequently it is quite natural to be feel uncertain about the validity of the interpretation, whether all the know requirements for validity are satisfied or not.  I'm reminded of the leading authority on time series modeling who usually accepts results with a standard 95% confidence level, who said " I can’t see any way to be convinced by such a demonstration (of continuity in evolution) unless an outcome of astonishing improbability has happened."

    Experience with the method can certainly help, beginning with a study of the available examples. Two techniques for gaining confidence in the statistical accuracy of the results are below.

    Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP


    Step Variance Test, A Statistical Indicator of Continuity,

    showing how much variance in step size increases for larger step size
    In brief, if variation in a sequence is symmetric about a norm then adjacent changes will cancel each other out, and the variance of differences between widely spaced points will tend to be same as for more closely spaced points.  In a random walk the step variance will tend to increase as fast as the increase in step length.    If the number of steps in the sequence aggregated is k and the variance v then for a random walk the variance of the aggregated steps will average v*k . For a sequence with symmetric random noise, the variance v of the aggregated steps will tend to remain constant. For homeostatic variance it may decline at various rates.

    For example, a clear difference is seen between the log/log plot of step variance to step length for random walks and the Malmgren data on plankton size.  The numerical tests indicate that about 95% of Random walks will have between .65 and 1.25 for the slope of step variance to step length, and the malmgren data has a value of .3.  this indicates that, in this case, the trippling of plankton size which the data records, in all likelihood, progressed by non-random steps.  This test was developed in the JMP statistical package (Jr. SAS) and the set of functions  are availale in JMP format from StepVar.zip

    Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP


    Comparison of subseries: For the 95 points of highly irregular plankton profile area data, the data was divided into five subsets of points 5 points apart.   This reduced the number of data points for analysis from 95 to 23, a relatively severe reduction.   If the shape study depended on random features they should not be repeated.    The results of applying the identical procedure to each does show considerable variation in detail, but seems to show very little variation in kind compared to the reconstruction for full data set.  Only one of the five did not reflect the apparent single evolutionary event which is clearly evident in the others. Close examination suggests that that was because this was the sub-set that started at point 5 in the sequence. With only two points preceding the acceleration of growth there was insufficient data to define a base-line steady state from which an acceleration of growth would be implied.  (full figure Paleo3.gif )


    A second example of using sub-sets to validate results is available from another study, to see the effect of the imperfect treatment of end points in the sequence. DR routines usually retain end points on a curve, with lower confidence, by making assumptions about imaginary data points beyond. In modeling of the history of economic growth presented in "Reconstructing the Physical Continuity of Events", ( GNP ) about 10 data points from the end of each curve segment were shown to be have low significance (figure sE.5 GNP10.gif ) but these end condition effects had no impact whatever on the central portions of the curve.

    Another example is provided by the comparison of different subsets for the gamma ray burst data.

    Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Craft,  TOP


    Some Hints about the Craft
    DR is a craft, using carefully justified assumptions and specific technique. As for the practice of mathematical modeling in general, it also relies on "a personal touch" involving "an element of risk" requiring "judgment, dexterity and care" to shape "several parts of his work and fit them together"(Rutherford Aris "The Mere Notion of a Model" Mathematical Modeling V1 1980.)
    Some examples: 
    In the study of Gamma Ray bursts the preliminary results appeared to display a failure of the analytical tools, displaying a jumble of forms with no clear pattern. After carefully filtering the data for the components with the least noise and beginning the analysis on a much shorter time period than initially expected to be useful, it was found that the regular dynamic events were of very short duration, and that the composite, looking like closely spaced separate bursts, going off like popcorn, was the real finding. The results were verified by performing the same steps on six independent sub-sets of the data with very closely matching results.
     

    Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Craft,  TOP


    physics of happening