D. R. Evans (N7DR): A Scatter Metric for the RBN

I recently posted a series of maps that showed the highly non-uniform geographic distribution of posting stations on the RBN as a function of time and frequency. While informative, these maps are very inconvenient, and do not lend themselves to any kind of quantitative analysis of the RBN's posting stations. What is needed is some kind of scatter metric, S, that in some way describes the non-uniform distribution of posting stations.

At least three basic approaches to creating such a metric suggest themselves (to me, anyway; there are probably others):

Metrics based directly on a distance metric;
Metrics based on grid occupancy;
Metrics based indirectly on a distance metric.

Metrics based Directly on a Distance Metric

Several scatter metrics directly on a distance metric suggest themselves, perhaps the most obvious of which would be some number that compares the actual geographic dispersion of stations reporting at a particular frequency and over some period of time (such as a year or a month) to an "ideal" dispersion of the same number of stations equally spread around the world. This would allow calculation of an efficiency metric that could then be used to compare the geographic efficiency of the RBN as a function of time and frequency.

Definition of a Distance Metric

But before we go too far down this path, we need to consider what distance metric to us. For two points on the surface of a sphere (we'll assume that the Earth is a sphere -- and hence its surface is a 2-sphere -- which keeps things simple at little cost in accuracy), there are two obvious reasonable ways of defining the distance between them: the ordinary 3-space cartesian distance, and the length of the shortest path on the 2-sphere surface. We will denote these metrics by ℓ_C and ℓ_S respectively.

If we denote the radius of the Earth by R_E, then it is easy to derive the following (equivalent) relationships between ℓ_C and ℓ_S in the domain 0 ≤ ℓ_C ≤ 2R_E:

ℓ_C² = 2R_E²(1 - cos (ℓ_S / R_E))
ℓ_S = R_E × acos(1 - (ℓ_C² / 2R_E²))

We can plot this relationship with a trivial gnuplot program:

R = 6371
lc(ls) = sqrt(2) * R * sqrt(1 - cos(ls / R))

set xlabel "ℓ_S 2-sphere distance (km)"
set ylabel "ℓ_C 3-space distance (km)"
set title "ℓ_C as a Function of ℓ_S for Spherical Earth"

plot [ls=0:pi*R] lc(ls) notitle

which produces:

No surprise there, of course: ℓ_C is a strictly monotonically increasing non-linear function of ℓ_S. (Well, to be more precise, that is true throughout the domain except at the point ℓ_S = π × R_E .)

Anyway, this tells us that if a scatter metric is to be based on distances between posting stations, it makes no fundamental difference whether we measure distances by ℓ_C or ℓ_S: the details of the calculations might (and in general would) change because of the non-linear relationship between the two ways of measuring distance, but there is no intrinsic reason to prefer one measurement over the other since there is a one-to-one mapping between the two measures.

Efficiency

Once a distance metric is defined, one can define some function based on that metric to express the amount of scatter of RBN posting stations across the globe. Ideally, as mentioned above, one could then compare that scatter to the scatter of the same number of stations uniformly spread over the surface of the Earth.

The most obvious scatter function defined along these lines is the mean separation between posting stations, using one of the two distance metrics described above. Since it makes no substantive difference which we distance metric choose, we will use ℓ_S, simply because that corresponds to the most common meaning of "distance" as used in amateur radio.

For N points (i.e., posting stations) P₁, P₂, ... P_N we can define a plausible scatter function S by:

$$ S = {2 \over {N \times (N - 1) }} \sum_{i=1}^{N-1}\sum_{j=i+1}^{N} {_i}{\Delta}_j$$
where _iΔ_j is the value of the distance metric ℓ_S for the points P_i and P_j.

We can then compare this to the ideal value obtained from points uniformly spread across the surface of the Earth.

Unfortunately, there is no known algebraic method for determining such an ideal "uniformly spread" dispersion, which is equivalent to finding a solution to what is known as the Tammes problem, which is a particular (unsolved) problem in the theory of spherical codes. There are various ways of computing solutions that are likely to be equal, or very close to, the ideal dispersion (for example: place N points on a sphere each the source of a repulsive force that decays with distance; add some friction, and then determine the location of all the points when they have all ceased to move). However, one could hardly regard this as a clean process, especially as it is not guaranteed to provide the true "most efficient" distribution.

Instead of a "uniformly spread" dispersion (which would maximise the minimum distance calculated across the set of points), though, we can compare the value of the metric S for the RBN to the value of the same metric for a random distribution of points. (By "a random distribution of points" here I mean that the probability of finding some number P points within any particular area A on the 2-sphere is independent of the position and shape of A.)

Symmetry arguments lead us to the (perhaps surprising) conclusion that the expectation value of the mean separation of such a random distribution of points must be independent of the number of points in the network, and will have a value equal to half the maximal value of the distance metric on the 2-sphere. (To put it in terms of the Earth: this means that the expectation value will be one quarter the length of the planet's circumference, or almost exactly 10,000 km.) Thus, we can immediately compare the scatter metric S of the RBN defined above to the "ideal" value of 10,000 -- or, since the ideal value is independent of the number of points, we can simply use the value of the scatter metric as-is, and mentally divide by 10,000 to obtain an "efficiency percentage".

Mensal values of the metric S show a gradual but sustained increase (the Pearson correlation coefficient is 0.78):

The red line is the best-fit linear regression to the data, which suggests that this scatter metric is increasing at the rate of about 145 per year.

Instead of plotting the metric as a function of time, it probably makes more sense to plot it as a function the number of posting stations (excluding those posting stations for which location information is unavailable from the RBN):

The correlation coefficient of the values on this plot is 0.75 -- almost identical to that of the plot based on time (the slope is about 4.8 per poster).

The story of both these plots is that the scatter metric has really changed very little since the RBN's inception: it has increased by only about 30%, even though the number of posters has increased by several hundred percent. This suggests that while the organisers of the RBN have been successful in persuading many additional stations to join the network, there has been but little improvement in the geographical diversity of those stations (as indeed is apparent if one looks at the maps; this in turn suggests that this metric is a not-unreasonable quantitative reflection of the amount of geographic scatter in the underlying network).

This post is now long enough... I'll move on to metrics based on grid occupancy in another post.

D. R. Evans (N7DR)

2017-02-24

A Scatter Metric for the RBN

Metrics based Directly on a Distance Metric

Definition of a Distance Metric

Efficiency

No comments:

Post a Comment