## 2017-02-24

### A Scatter Metric for the RBN

I recently posted a series of maps that showed the highly non-uniform geographic distribution of posting stations on the RBN as a function of time and frequency. While informative, these maps are very inconvenient, and do not lend themselves to any kind of quantitative analysis of the RBN's posting stations. What is needed is some kind of scatter metric, S, that in some way describes the non-uniform distribution of posting stations.

At least three basic approaches to creating such a metric suggest themselves (to me, anyway; there are probably others):
1. Metrics based directly on a distance metric;
2. Metrics based on grid occupancy;
3. Metrics based indirectly on a distance metric.

### Metrics based Directly on a Distance Metric

Several scatter metrics directly on a distance metric suggest themselves, perhaps the most obvious of which would be some number that compares the actual geographic dispersion of stations reporting at a particular frequency and over some period of time (such as a year or a month) to an "ideal" dispersion of the same number of stations equally spread around the world. This would allow calculation of an efficiency metric that could then be used to compare the geographic efficiency of the RBN as a function of time and frequency.

#### Definition of a Distance Metric

But before we go too far down this path, we need to consider what distance metric to us. For two points on the surface of a sphere (we'll assume that the Earth is a sphere -- and hence its surface is a 2-sphere -- which keeps things simple at little cost in accuracy), there are two obvious reasonable ways of defining the distance between them: the ordinary 3-space cartesian distance, and the length of the shortest path on the 2-sphere surface. We will denote these metrics by ℓC and ℓS respectively.

If we denote the radius of the Earth by RE, then it is easy to derive the following (equivalent) relationships between ℓC and ℓS in the domain 0 ≤ ℓC ≤ 2RE:

C2 = 2RE2(1 - cos (ℓS / RE))
S = RE × acos(1 - (ℓC2 / 2RE2))

We can plot this relationship with a trivial gnuplot program:
R = 6371
lc(ls) = sqrt(2) * R * sqrt(1 - cos(ls / R))

set xlabel "ℓ_S 2-sphere distance (km)"
set ylabel "ℓ_C 3-space distance (km)"
set title "ℓ_C as a Function of ℓ_S for Spherical Earth"

plot [ls=0:pi*R] lc(ls) notitle
which produces:

No surprise there, of course: ℓC is a strictly monotonically increasing non-linear function of ℓS. (Well, to be more precise, that is true throughout the domain except at the point ℓS = π × RE .)

Anyway, this tells us that if a scatter metric is to be based on distances between posting stations, it makes no fundamental difference whether we measure distances by ℓC or ℓS: the details of the calculations might (and in general would) change because of the non-linear relationship between the two ways of measuring distance, but there is no intrinsic reason to prefer one measurement over the other since there is a one-to-one mapping between the two measures.

#### Efficiency

Once a distance metric is defined, one can define some function based on that metric to express the amount of scatter of RBN posting stations across the globe. Ideally, as mentioned above, one could then compare that scatter to the scatter of the same number of stations uniformly spread over the surface of the Earth.

The most obvious scatter function defined along these lines is the mean separation between posting stations, using one of the two distance metrics described above. Since it makes no substantive difference which we distance metric choose, we will use ℓS, simply because that corresponds to the most common meaning of "distance" as used in amateur radio.

For N points (i.e., posting stations) P1, P2, ... PN we can define a plausible scatter function S by:

$$S = {2 \over {N \times (N - 1) }} \sum_{i=1}^{N-1}\sum_{j=i+1}^{N} {_i}{\Delta}_j$$
where  iΔj is the value of the distance metric ℓS for the points Pi and Pj.

We can then compare this to the ideal value obtained from points uniformly spread across the surface of the Earth.

Unfortunately, there is no known algebraic method for determining such an ideal "uniformly spread" dispersion, which is equivalent to finding a solution to what is known as the Tammes problem, which is a particular (unsolved) problem in the theory of spherical codes. There are various ways of computing solutions that are likely to be equal, or very close to, the ideal dispersion (for example: place N points on a sphere each the source of a repulsive force that decays with distance; add some friction, and then determine the location of all the points when they have all ceased to move). However, one could hardly regard this as a clean process, especially as it is not guaranteed to provide the true "most efficient" distribution.

Instead of a "uniformly spread" dispersion (which would maximise the minimum distance calculated across the set of points), though, we can compare the value of the metric S for the RBN to the value of the same metric for a random distribution of points. (By "a random distribution of points" here I mean that the probability of finding some number P points within any particular area A on the 2-sphere is independent of the position and shape of A.)

Symmetry arguments lead us to the (perhaps surprising) conclusion that the expectation value of the mean separation of such a random distribution of points must be independent of the number of points in the network, and will have a value equal to half the maximal value of the distance metric on the 2-sphere. (To put it in terms of the Earth: this means that the expectation value will be one quarter the length of the planet's circumference, or almost exactly 10,000 km.) Thus, we can immediately compare the scatter metric S of the RBN defined above to the "ideal" value of 10,000 -- or, since the ideal value is independent of the number of points, we can simply use the value of the scatter metric as-is, and mentally divide by 10,000 to obtain an "efficiency percentage".

Mensal values of the metric S show a gradual but sustained increase (the Pearson correlation coefficient is 0.78):
The red line is the best-fit linear regression to the data, which suggests that this scatter metric is increasing at the rate of about 145 per year.

Instead of plotting the metric as a function of time, it probably makes more sense to plot it as a function the number of posting stations (excluding those posting stations for which location information is unavailable from the RBN):
The correlation coefficient of the values on this plot is 0.75 -- almost identical to that of the plot based on time (the slope is about 4.8 per poster).

The story of both these plots is that the scatter metric has really changed very little since the RBN's inception: it has increased by only about 30%, even though the number of posters has increased by several hundred percent. This suggests that while the organisers of the RBN have been successful in persuading many additional stations to join the network, there has been but little improvement in the geographical diversity of those stations (as indeed is apparent if one looks at the maps; this in turn suggests that this metric is a not-unreasonable quantitative reflection of the amount of geographic scatter in the underlying network).

This post is now long enough... I'll move on to metrics based on grid occupancy in another post.