This
is about using time series data, shapes of change, as an aid to identifying
individual underlying systems and their behaviors. It's math to help
you see what you're looking at, lenses of different kinds. The two
basic kinds of processes, developmental and cyclic, have different shapes with
multiple kinds of internal and extraneous fluctuation. Those tend to be
poorly represented and to be superimposed with effects of various other
distracting processes any series of measures. Most time series data
will display a variety of all kinds, on various scales, with different timing,
like a junk yard of shapes. It's all 'evidence', though, and none of it
'noise' from any extraneous source. Filtering out the useless
information from the useful calls for different kinds of treatments to help
reveal what behavior made the shapes of interest. That's sort of the
'null hypothesis', giving the analysis something like a forensic approach.
The usual scientific method where the object is producing a 'formula' to use in
place of the data, is a step you might or might not do toward the very end of
the process. The main job in studying individual events is to
identify what combination of behaviors you're looking at, and a way to tell
their story.Sometimes you'll find data that is clearly well localized to a single developmental process, and then careful local smoothing that does not alter the shape is an great assist in making the curve differentiable and precisely locating the first and second derivative inflection points.
(note: these notes are mostly from a 1998 edit and needs work, I removed the faulty 'DAR' test mentioned in some of the application studies)
The principle benefit of representing a data sequence with a proportional walk is that less of the information in the data is lost in translation. A large amount of information is lost in statistical curve fitting because only the constant features the researcher thinks of including are present in the structure of the formula. Proportional walks include all kinds of dynamics, of known and unknown origin, including transients and behavioral transitions on multiple scales. It represents them in a form that is more readily comprehended by direct inspection, though an experienced observer will be then able to see many of the same features by direct inspection of the data itself.
By including the transients and behavioral transitions proportional walks generated from a sequence serve to identify behavioral changes that would require new equations to describe. This provides an efficient kind of hypothesis generator regarding the structures of the natural phenomena being observed.
DR functions take a numerical sequence as input and produce a corresponding sequence of values as output, drawing a curve from a curve. The analytical package is available as a collection of AutoLISP routines called CURVE for use in the graphical database AutoCAD. The principle difference from other curve fitting techniques, such as the least squares autoregressions, is that DR fits the curve according only to the smoothness of the path, and ignores entirely its distance from some preconceived mathematical curve. Thus it produces curves approximating both the scale and the dynamics of the data, not just getting to similar points, but also getting there in similar ways.
To do this one needs to learn how a derivative is defined in functions and how to adapt that definition to sequences. In the absence of other reason to believe that a sequence reflects the derivative continuities of a physically continuous process, one needs statistical measures to determine if a sequence displays that pattern, and that the presence of a physically continuous process is implied. Once a sequence is represented by a proportional walk various tests can be used to measure how well it represents the data or determine its dynamic and scalar similarity to other results.
It is often helpful to strip away distracting fluctuations, rather than suppress them by smoothing. Sometimes the fluctuations are the process, but often they're processes of entirely different kinds superimposed on, and quite separate from, the process you want to expose and study. If the fluctuations are not part of the process, just a distraction, then it really distorts the subject of study if you just merge them into it. Typical mathematical regression curve fitting does accomplish some of this intent too, of course, though you loose entirely any of the local developmental or individualistic shapes in the data that correspond directly to the processes that produce them. Two main routines in c:tlin for Autolisp worked well in various instances.
Typically after stripping fluctuations the number of points in the sequence is greatly reduced. Depending on the subject, points were often interpolated and derivative smoothing then done to produce a curve that visually passed through the original fluctuations on their apparent neutral path. This is subjective, of course, and a little pains taking, but seems to be a more accurate method of removing distracting shapes that no mathematical routing could define.
For mathematical functions having a derivatives is well defined, based on whether the rates of change of a function approach the same value at points successively closer to a given point from both sides. This is also known as the test for derivative continuity. The criteria for derivative continuity is much more restrictive than for simple continuity. The latter only requires that a function be defined for all values of the variables (not having gaps in the coordinates of the variable) and that the values of the function approach each other when its variables do (not having gaps in the coordinates of the function). Derivative continuity in a function also means not having abrupt rates of change (not having gaps in the accelerations), i.e. following a smooth curve. These things have been very well worked out for a long time (Courant & Robbins 1941).
The problem with extending this concept to either physical processes or sequential measurements of them is with the gaps in nature and in measurement. Both data and physical processes are completely fragmented. Every measurement is an isolated value with no ultimate near-by values approaching from any direction. Physical processes are much the same. Surfaces are mostly composed of holes, lines of spaces, and regular behaviors of intermittent smaller scale processes. Nature and all our information about it is largely composed of gaps, broken chains presenting the regularities of the world as completely discontinuous.
There are also lots of sequences that appear to flow so smoothly it's hard to see it anything else, like a movie. There is also the marvel of classical physics, that nature's apparent fragmentation can be considered as if following perfectly continuous differentiable functions. It is even possible to derive from the conservation laws a principle that all physical processes must, at root, satisfy differential continuity (Henshaw 1995). Even for quantum mechanics, discounting that the principle concerns of QM are probabilistic events beyond the realm of physical process, it now seems that the quantum mechanical events that do materialize may still conform to classical mechanics (Lindley 1997). This suggests that not only is QM perhaps consistent with the continuous world, but might also require it, and the differential continuity of physical properties that classical mechanics implies.
Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP
Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP
The principle strategic task is to correctly identify where the rates of change of the underlying behavior reverse. If the behavior is expected to have been smoothly changing, but there are few data points, most of the inflection points in the behavior will have occurred somewhere in-between. The first curve to construct would then be one keeping the original data points and adding new points were a they would be predicted given the assumption of there being a regular progression of derivative rates. The function that does this is called DIN, for derivative interpolation.
If, on the other hand, there is an abundance of data containing fairly clear trends but small scale erratic variation hides all the larger scale inflection points, then either one or another kind of local averaging might be used as the first step. The least distorting kind of local averaging is double derivative smoothing, DDSM. Both DIN and DDSM work by comparing the third derivatives calculated from the first four of five adjacent points with that calculated from the last four of the same five points, and adjusting the middle point to make the two third derivatives equal.
Once the best possible representation of the smallest scale of regular fluctuation is constructed the next larger scale of fluctuations in the data is isolated by using TLIN to draw a curve through the inflection points of the small scale fluctuations, constructing a dynamic trend line. This might be followed by subsequent use of DIN and DDSM, and then repeated, until the resultant is a smooth monotonic centroid, a curve without fluctuations that closely approximates both the scale and dynamics of the original data, its central dynamic trend. That completes the first major step. The derivatives of this curve will display a number of definite predictions about the nature of the physical behavior being studied.
More information on the individual command operators
is found in drtools.pdf
,
a selection of DR commands in AutoLISP are available in Curve.zip
.
(for AutoCad 13 or earlier)
Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP
Experience with the method can certainly help, beginning with a study of the available examples. Two techniques for gaining confidence in the statistical accuracy of the results are below.
Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP
For
example, a clear difference is seen between the log/log plot of step variance
to step length for random walks and the Malmgren data on plankton size.
The numerical tests indicate that about 95% of Random walks will have between
.65 and 1.25 for the slope of step variance to step length, and the malmgren
data has a value of .3. this indicates that, in this case, the trippling
of plankton size which the data records, in all likelihood, progressed
by non-random steps. This test was developed in the JMP statistical
package (Jr. SAS) and the set of functions are availale in JMP format
from StepVar.zip
Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Sub-Series, Craft,.... TOP

A second example of using sub-sets to validate
results is available from another study, to see the effect of the imperfect
treatment of end points in the sequence. DR routines usually retain end
points on a curve, with lower confidence, by making assumptions about imaginary
data points beyond. In modeling of the history of economic growth presented
in "Reconstructing the Physical Continuity of Events", (
GNP
)
about 10 data points from the end of each curve segment were shown to be
have low significance (figure sE.5 GNP10.gif
) but these end condition effects had no impact whatever on the central
portions of the curve.
Another example is provided by the comparison of different subsets for the gamma ray burst data.
Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Craft, TOP
Intro, DynMeans, Derivatives, P-Walks, Basics, Tests, StepVar, Craft, TOP