2020-03-16

RBN-based Proxy Metric for Relative CW Activity

The Reverse Beacon Network (RBN) makes available an Internet-based historical record of all CW stations on the HF amateur bands detected, decoded and posted by the stations that participate in the RBN. This suggests the possibility of using those data in some form to provide a more or less reliable indicator of the number of CW-active stations since early 2009, when the RBN came into being.

Performing such an analysis, though, is not a trivial matter because of several characteristics of the RBN. In particular, one cannot simply count the number of callsigns detected over a period and assume that that number may reasonably be simply compared to the number of callsigns detected over any other period of similar duration. The following characteristics (at least) argue against such a naïve comparison:
  1. The RBN comprises, at any one moment, a number of receiving stations that post detected CW callsigns to the Internet. However, no record is kept of the number of stations actively recording and posting at any given moment; that number varies widely over time.
  2. The geographical distribution of posting stations varies from moment to moment.
  3. Even at the best of times, large swathes of the Earth's surface contain no posting stations.
  4. Different posting stations listen to different sets of HF bands, and a single posting station may monitor different bands at different times.
  5. The software that drives the RBN is tuned with the intent of eliminating callsigns of stations that are calling other stations. That is, the RBN is intended to post only the callsigns of stations seeking contacts (generally by calling CQ).
  6. The software may decode callsigns incorrectly, and thus post a callsign that is not in fact present on the air.
  7. Hardly the RBN's fault, but some stations send so poorly that several variants of their call may be decoded and posted.
The effect of these characteristics is to pollute any list of posted callsigns, and also to cause some callsigns that were actually used on the air not to appear in any such list. The question is, then: is it possible still to produce a useful metric from the RBN data?

First, let's look at issue number 5 above: namely that callsigns only of CQing stations are posted. In the past this might have been regarded as a serious shortcoming; it is less clear that the same is true today, especially if we are attempting to measure (in some sense) those stations whose operators actively prefer and enjoy CW. With the modern proliferation of huge CW pile-ups on major DXpeditions there are now many stations whose goal is simply to work the DXpedition (perhaps for a "slot"; perhaps for a new one) and operators of such stations may well have next to no knowledge of CW at all, using pre-recorded macros for transmission, and either code readers or simple aural recognition of the pattern of their callsign to decode the high-speed CW from the DX station. It would not seem right to include such stations in any listing of CW activity, since their use of CW is mere happenstance. So issue number 5 merely serves to cause us to define "CW activity" in a particular manner that might not have been appropriate in the past but nowadays, in the current milieu, seems quite defensible, and perhaps even preferable to merely listing all callsigns heard on the air. Thus, "CW activity" herein means "CW activity by stations calling CQ".

Having finessed issue #5, the others remain; to some extent they must all affect the calls posted by the RBN. Any one of them might be amenable to reasonable analysis or at least intelligent guesswork as to its effect; taken together, it seems prudent to simply say that in any group of calls posted over a particular time, some calls are likely to be in error, and some genuine calls that were active may well have been missed. That much is obvious, but it does not necessarily mean that a reasonable metric cannot be derived with pragmatic simplicity from the actual postings made by the RBN.

We can begin by restricting the domain of the problem, for now, to a single band. The obvious choice is 20m, as it has a reasonably consistent high level of activity throughout a solar cycle. So let's look at some numbers and graphs created from the raw RBN postings over the period from its inception in early 2009 to the end of 2019, using the calendar year as the basic element of time (thus, the values for 2009 will be affected by the shorter period for which samples are available, in addition to any effects from the  considerations listed above).

To get some feel for the data, we begin by defining a value $V(n)$ that is the number of callsigns that appear exactly $n$ times within a particular year. We can then plot the values of $V(1)$ to $V(100)$ for each of the years for which we have data:


Despite the lack of detail at this scale, a couple of interesting things are immediately apparent:
  1. The plots of $V(n)$ are remarkably consistent from year to year;
  2. The transition between large negative gradient and shallow negative gradient occurs over a relatively short range of $n$.
It is perhaps useful to put into words the most obvious feature of the plot: a large percentage of the calls posted by the RBN occur just a very few times. If one thinks about this, if a call appears only once (or just a small number of times) in the course of a year, it would seem overwhelmingly likely that it is a bust of some kind: the circumstances that would lead to a valid call being correctly decoded and posted only once (or a small number of times) in an entire year would seem to be overwhelmingly unlikely. Similarly, if a call appears a relatively large number of times, it is highly unlikely to be a bust. We shall return to this simple fact below.

By switching to a logarithmic scale, we can see more detail in the data:


Apart from 2009, when, as mentioned above, there are fewer data, the plots for each year essentially lie on top of each other; further, there are no obvious characteristics other than a more or less monotonic decrease in $V(n)$ as $n$ increases. This is good, as it suggests that there is some robustness to this kind of analysis, and therefore it may be possible to use it as a basis for calculation of an activity metric.

What happens if we look at a band with great variation over the course of a solar cycle; 10m, for instance?

No new features appear, further suggesting that this is a robust approach.

The graphs suggest that an activity metric MCW may be defined over an interval of time by:
$$
MCW = \sum_{n=1}\alpha(n) V(n)
$$
where the form of $\alpha(n)$ is to be determined.

For small values of $n$, all or almost all calls will be busts -- that is, $\alpha(\rm small) \approx 0$; for large values of $n$ no or almost no calls will be busts  -- that is, $\alpha(\rm large) \approx 1$.

We can reasonably define "large" to be the lowest value $n$ for which $V(n)$ is statistically indistinguishable from $V(n+1)$ and (to be safe) for which $V(n+1)$ is statistically indistinguishable from $V(n+2)$. And we can reasonably define "statistically indistinguishable" as meaning that $V(m+1)$ lies in the range:
$$
V(m) \pm 2\sqrt{(V(m))}
$$

We can then define "small" simply as unity, and $\alpha(n)$ as a linear function with the value zero for $n=1$ and unity for $n \ge L$, where $L$ is the smallest "large" number defined by the procedure described above. [NB there is nothing magic about these precise definitions; reasonable variations on this theme provide essentially identical relative results from year to year -- another indication of robustness. One simply needs to be careful to maintain the same definition of $\alpha(n)$ for all the periods under examination. [Actually, it works even with the same meta-definition, but let's not bother with that]] By this definition, "large" will vary from year to year and from band to band, meaning that $\alpha(n)$ will similarly vary. And that is one way to proceed, but it means calculating the meaning of "large" for each year and each band. A simpler way to analyze the data is to choose a universal value for "large" that is large enough to encompass all years and bands being analysed.

The actual values of "large" as defined above, for all the years and bands for which we have annual data, are given in the following table:

Band20092010201120122013201420152016201720182019
10m711121313141414999
12m48161012101288710
15m1011161717171213161013
17m7139112110129131811
20m2116161820211927242120
30m1217141518131318121514
40m1418172224212523202015
80m1622121815131414192222
160m912111011111218121414
HF1819272324233223202222

The last line, marked "HF" is derived from all the postings for the individual bands for a given year. [Note that this bears no simple relationship to the other lines; if this isn't obvious, just think of a call that is posted once on 20m and once on 160m in the course of a year: this will contribute to $V(1)$ on 20m and 160m, but to $V(2)$ on HF.]

The largest value in the above table is 32 (HF, 2015). Therefore one plausible $\alpha(n)$ function would go linearly from a value of zero at $n=1$ to a value of 1 at $n=35$, and have the value 1 at all values >35.

Applying this procedure, we finally arrive at the following figure:


The CW Activity Metric as defined above is arguably a poor choice for showing small changes in activity over time (although it is not obvious how to create a better metric for such purposes given the limitations of the RBN listed above); but if the metric shows little variation over a period, it is hard to see how the underlying CW activity can have varied much either, absent a rather precisely anti-correlated change in the factors that lead to busts in the RBN postings.

The effect of the solar cycle is readily apparent in the figure (see the lines for the higher frequency bands, where activity does indeed change as a function of the cycle, and the metric reflects this change); however, taking HF activity as a whole, it is remarkable how consistent activity has been for many years. There is certainly no evidence from this analysis of the RBN data that CW activity taken as a whole has shown any consistent wane over the past decade. 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.