CompGeoJS-ADV and Data Visualization
The CompGeoJS package is intended to be an open-source resource for web developers to learn and apply principles from numerical analysis and basic computational geometry. Clients sometimes do not wish to have applications developed using open-source libraries and often desire capabilities well beyond the intent of the basic CompGeoJS library.
I forked the source code to create CompGeoJS-ADV, which is a more advanced version of the library that will include considerable new capability over time and be a closed-source product that can be licensed to clients. CompGeoJS-ADV was recently used to create a data visualization comp. that is a prototype of a dashboard component.
Many companies have metrics that can be interpreted as efficiency or effectiveness measures. These metrics are often computed from disparate data sources and synthesized into a time series with intervals that are relevant to the particular measure. A common time interval is days since the business exists on a day-to-day basis even though the raw inputs to compute the effectiveness measure are rarely available except at sparse times during the window of analysis.
Once computed, the effectiveness metric is normalized to the range [-1,1], although it could exceed those bounds. A measure of 1 means that the business is as effective at the measurement point as it was during the historical interval for comparison (i.e. the past year). A measure greater than 1 indicates moving into new highs in terms of relative effectiveness. A similar interpretation in terms of historic lows is applied to measures at or below -1.
Although the entire set of raw inputs to compute the metric are available only at a small number of unevenly-spaced time steps, the business believes that the general flow of effectiveness (in the area being measured) is continuous and would like that continuity displayed in a dashboard component. Furthermore, they wish to identify areas of the effectiveness curve above and below zero with different colors. The area of each section (above or below zero) is considered important and should be computed and compared to both the total area and the sum of area of similar sections.
The desire for a smooth graph between the unevenly spaced data points is an interpolation problem and practically screams out for a cubic spline. This is provably the smoothest curve that can be fit between the points. Since we have no insight into any functional definition of this effectiveness measure (only the raw data from which a set of interpolation points are computed), the business desires only a smooth fit curve between the points.
A cubic spline solves the interpolation (and hence smooth-fit) criteria but increases the complexity of identifying the areas above and below zero. To solve this, I dusted off the old polynomial curve -> quad bezier approximation code I wrote for Degrafa and took the time to implement some improvements I’ve been wanting to add in for some time. This utility currently works only for cartesian curves (that satisfy a prescribed interface) and converts the curve to a sequence of quadratic bezier segments. Since it is often necessary to identify the beginning and end of each interval in the original data set, the quad. sequence always begins and ends at an interpolation point. This s less optimal from a fit perspective but easier to work with in terms of actual applications.
This process reduces drawing the spline to a sequence of move-to and curve-to commands that can be easily implemented in any of the Canvas engines (or directly if you so prefer). I wrote a renderer on top of EaselJS. While the original intervals may be easily identified and even rendered with different colors and rollover effects, this does not solve the problem of identifying areas above and below zero. There is no guarantee that any particular quadratic curve will initiate or terminate at y=0. Generally, the quads will cross the x-axis.
So, I have to scan each quad to determine if it crosses the x-axis. If so, the natural parameter at the crossing is computed by solving the t-at-y problem for y = 0 (with the CompGeoJS-ADV BezierUtils class). That quad is subdivided into two quads and some bookkeeping is applied to dynamically create the areas above/below zero and populate them with the sequence of quad. Beziers that approximates the EF-curve for that area.
The area under each section can be computed by numerical integration and the standard CompGeoJS library already has a Gauss-Legendre class. However, I’m still not finished. The client also wants to animate the curve sweeping from left-to-right. This requires two items, an area renderer that renders any portion of a total area in [0,1] and a means to parametrize the entire curve. Since areas under each section are important to the client, an area-parameterization was applied. The sequence of quads in a single area is rendered left-to-right until the t-parameter is reached in which only a partial quad is rendered. That render is accomplished with yet another subdivision. Mapping from the global parameterization to a natural parameter inside each area is used along with requestAnimationFrame to render the display over the entire time sequence in any desired number of seconds.
And, I’m still not finished. The x- and y-bounds inside each area are important, which means I need bounding-box information. This is accomplished with some new methods in the BezierUtils class that compute an axis-aligned quad. Bezier bounding-box. This information is aggregated across each area above and below zero.
And yet, I’m still not finished. After the animation, the client wishes to be able to click on each colored area to display information about that area in a tooltip-style display. I added a CanvasTooltip class to CompGeoJS-ADV that is written on top of EaselJS. To make everything smooth and efficient, the colored areas are drawn directly into a container in order to minimize the total number of objects in the display list. So, they can not be moused over or clicked on directly.
Since I’m taking a game-style approach to the rendering, a game-style approach to the click problem was used to identify a particular area. The control points or geometric constraints for each quad. Bezier happen to be a pretty tight fit to the curve itself, diverging noticeably only at areas of high curvature. For purposes of basic click capability, the polygon formed by the axes and each control point of each quad is taken as a mathematical proxy for the colored area itself. The point-in-polygon algorithm (already implemented in CompGeoJS) can be used to identify whether the mouse point on click is inside a particular area. The necessary information can be extracted from the data structure used to store the colored areas and rendered into the tooltip.
Am I finished? Hell yes! I’m not taking any more specs on this one
So, here is roughly how it looks on some synthetic data that represents pipeline shipping effectiveness over an 80-day test period, compared to the prior year.
The cubic spline is fit to the synthetic data and graphed point-to-point in yellow. The CompGeoJS-ADV quadratic Bezier approximation to this curve is plotted in red.
After the long process described above, I can display green sections above zero and red sections below zero.
The graph slice tool can be used to drag horizontally and quickly inspect EF-curve values at any time offset. The original data points may be displayed and then rolled over to read the exact raw data from which the curve was derived.
Here is a look at how the control points bound the original curve.
Notice that the polygon formed by the control points and axes is a pretty tight approximation to each colored area, or at least close enough for proverbial government work It’s also quite fast to compute and check the point-in-polygon and there are a minimal number of objects on the display list. This is important since in longer-term applications, there is a desire to make this component pannable in the horizontal axis. This is already supported and can be seen from the CompGeoJS NumberLine demo.
Here is a screen shot of the animation in progress, about halfway through a two-second interval. It is possible to animate the display over any arbitrary number of seconds.
And, finally, here is a screen shot of the prototype display for clicking on any colored area (in this case, the second green area from the left). The CanvasTooltip is rendered according to user-supplied offsets from the current mouse position.
If you would like to discuss custom dashboard components, please email me at theAlgorithmist [at] gmail [dot] com .
Thanks for your time.