D. R. Evans (N7DR): 2021

2021-12-16

CQ WW SSB logs for 2021

CQ WW SSB logs for 2021 have been added to this repository of public CQ WW logs.

2021-12-02

Debian Bullseye, Samba and WIndows XP

Following an upgrade here from Debian Buster to Bullseye, I discovered that a Windows XP computer on my home network could no longer access a samba share on the upgraded system, even though (or possibly because, although I doubt it) I did not allow the upgrade to change the /etc/samba/smb.conf file. After a bit of messing around, I discovered that the solution is easy, although apparently undocumented as yet, at least officially. I added the line:

server min protocol = NT1

to the [global] portion of the file, and then restarted samba:

systemctl restart smbd.service

From that point on, I found that the XP machine was once more able to reach the samba shares correctly.

I note that the inability to browse Bullseye samba shares from XP machines that previously could browse Buster shares on the same server machine has been reported as Debian bug 948671, with the above workaround now documented in an associated submission by me.

2021-10-12

CQ WPX SSB and CW logs for 2021

CQ WPX SSB and CW logs for 2021 have been added to this repository of public CQ WPX logs.

2021-08-26

Unofficial Station Reports, ARRL DX SSB and CW, 2018 to 2021

Using the public logs, it is rather easy to generate unofficial station-by-station reports for the entrants in the ARRL DX contests.

The ARRL generates official reports and generally makes these reports available individually to each entrant. But these are not made public. The unofficial reports, while not necessarily identical to the official ones, may therefore hold some interest.

The unofficial reports may differ from the official ones because the contest committee has access to checklogs, which are not made public. Also, there are various pathological occurrences in logs that require a decision to be made as to how to classify one or more QSOs; the rules by which such decisions are made are not public, so the decisions that I made when constructing the unofficial reports may well be different from those made by the ARRL. Nevertheless, pathological logs (or pathological QSOs within a log) are relatively rare, so these decisions should affect a relatively small percentage of logs and QSOs. (Typical examples [there are many more] of circumstances in which decisions must made be are: by how much may clocks be skewed and a QSO still be considered valid? what to do if the transmitted callsign changes for some number of QSOs in the contest? what do to if more than one entrant claims to have used the same transmitted callsign?)

The complete set of unofficial reports for the CW and SSB versions of the ARRL DX contest for the years 2018 to 2021 may be found in appropriately named files in this directory.

I note that despite explicitly informing me that they would do so (in 2017), the ARRL have never made public the logs that they hold for the ARRL DX contests for years prior to 2018.

One note regarding interpretation of the information in these unofficial reports: all the fields should be self-explanatory, except that in the listing for EXCHANGE BUSTS, some values are enclosed in parentheses: this indicates that the worked station did not submit a log, and the value of the exchange sent by that station was deduced from QSO: lines in the logs of other entrants.

For example, the report for HL2ZN in the 2021 ARRL DX CW contest contains the line (the line below may be wrapped on your display):

QSO: 14000 CW 2021-02-21 2245 HL2ZN 599 0500 N7DR 599 KY [ (CO) ]

This indicates that we can deduce that HL2ZN probably bust N7DR's exchange, even though N7DR did not send in a log: HL2ZN recorded N7DR's state as KY, even though N7DR probably sent CO (indeed, I did send CO).

2021-08-17

Cleaned and Augmented Logs for ARRL DX CW and SSB contests, 2018 to 2021

Raw logs

The raw logs for the ARRL DX CW and SSB contests are now available for 2018, 2019, 2020 and 2021 in this directory.

Cleaned Logs

Cleaned logs for the same years may also be downloaded from the directory. The cleaned logs are combined into a single file; but data for individual stations and years may trivially be extracted from the combined file.

The cleaned logs are the result of processing the QSO: lines from the entrants' submitted Cabrillo files (as [gratuitously] modified by the ARRL) to ensure that all fields contain valid values and all the data match the column-specific standard format for this contest.

Any line containing illegal data in a field has simply been removed. Also, only the QSO: lines are retained, so that each line in the file can be processed easily. All QTH multipliers are rendered as two letters, and the power is rendered as four digits, regardless of how the submitted log recorded these two fields; this should simplify processing the logs by scripts or programs, as should the use of fixed-length records in these cleaned files.

Augmented Logs

Links to the augmented logs for the same years may likewise be downloaded from the directory. The augmented logs are combined into a single file; but data for individual stations and years may trivially be extracted from the combined file.

The augmented logs for the ARRL DX contests contain the same information as the cleaned logs, but with the addition of some useful (derived) information on each line. The information added to each line comprises:

The sequence of four characters that are the same for each entry in a particular log:

a. letter "A" or "U" indicating "assisted" or "unassisted"
b. letter "Q", "L", "H" or "U", indicating respectively QRP, low power, high power or unknown power level
c. letter "S", "M", "C" or "U", indicating respectively a single-operator, multi-operator, checklog or unknown operator category
d. character "1", "2", "+" or "U", indicating respectively that the number of transmitters is one, two, unlimited or unknown

A four-digit number representing the time if the contact in minutes measured from the start of the contest. (I realise that this can be calculated from the other information on the line, but it saves subsequent processors of the file considerable time to have the number readily available in the file without having to calculate it each time.)
Band
A set of fourteen flags, each -- apart from column k and column n -- encoded as T/F:
- a. QSO is confirmed by a log from the second party
- b. QSO is a reverse bust (i.e., the second party appears to have bust the call of the first party)
- c. QSO is an ordinary bust (i.e., the first party appears to have bust the call of the second party)
- d. the call of the second party is unique
- e. QSO appears to be a NIL
- f. QSO is with a station that did not send in a log, but who did make 20 or more QSOs in the contest
- g. QSO appears to be a country mult (may be T for W/VE stations only)
- h. QSO appears to be a state/province mult (may be T for DX stations only)
- i. QSO is an exchange bust (i.e., the received exchange appears to be a bust)
- j. QSO is a reverse exchange bust (i.e. the second party appears to have bust the exchange of the first party)
- k. This entry has three possible values rather than just T/F:
  - T: QSO appears to be made during a run by the first party
  - F: QSO appears not to be made during a run by the first party
  - U: the run status is unknown because insufficient frequency information is available in the first party's log
- l. QSO is a dupe
- m. QSO is a dupe in the second party's log
- n. RBN information (see below)
If the QSO is a reverse bust, the call logged by the second party; otherwise, the placeholder "-"
If the QSO is an ordinary bust, the correct call that should have been logged by the first party; otherwise, the placeholder "-"
If the QSO is a reverse exchange bust, the exchange logged by the second party; otherwise, the placeholder "-"
If the QSO is an ordinary exchange bust, the correct exchange that should have been logged by the first party; otherwise, the placeholder "-"

RBN Information

In CW contests from 2009 onwards, the RBN has been active, automatically spotting the frequency at which any station calling CQ was transmitting. To reflect possible use of RBN information, the augmented files include a fourteenth column. For the sake of uniformity, this column is present in all the augmented files, regardless of whether the RBN actually contributed useful information to a particular contest.

Each QSO has one of several characters in the fourteenth column of flags. These characters should be interpreted as follows:

'-'
No useful RBN-derived information is available for this QSO.

'0'
The worked station (i.e., the second call on the log line) appears to have begun to CQ on this frequency within (roughly) 60 seconds prior to the QSO.

'A' to 'Z'
For the nth letter of the alphabet: the worked station appears to have been CQing on this frequency for (roughly) n minutes prior to the QSO.

'+'
The worked station appears to have been CQing for more than 26 minutes on this frequency.

'<'
Because the the RBN is distributed, and because each contest entrant station has its own clock, there is generally a skew between the reading of the clock of the station making the QSO and the timestamp from the RBN at which it believes a posting was made (indeed, it's unclear from the RBN's [lack of] documentation exactly how the timestamp on an individual RBN posting is to be interpreted). If the character '<' appears in the the RBN column, it indicates that the raw values of the clocks suggest that the QSO took place up to two minutes before the RBN reported the worked station commencing to CQ at this frequency. When this occurs, the most likely interpretation is that there is non-negligible skew between the two clocks, and the station was actually worked almost as soon as a CQ was posted by the RBN. But it might also mean that the entrant was simply lucky and found the CQing station just as it fired up on a new frequency.

Notes:

The encoding of some of the flags requires subjective decisions to be made as to whether the flag should be true or false; consequently, and because the ARRL has yet to understand the importance of making the scoring code public, the value of a flag for a specific QSO line in some circumstances might not match the value that the ARRL has assigned. (Also, the ARRL has additional, non-public, data available.)
I made no attempt to deduce or infer the run status of a QSO in the second party's log (if such exists), regardless of the status in the first party's log. This allows one cleanly to perform correct statistical analyses anent the number of QSOs made by running stations merely by excluding QSOs marked with a U in column k.
No attempt is made to detect the case in which both participants of a QSO bust the other station's call. This is a problematic situation because of the relatively high probability of a false positive unless both stations log the frequency as opposed to the band. (Also, on bands on which split-frequency QSOs are common, the absence of both transmit and receive frequency is a problem.) Because of the likelihood of false positives, it seems better, given the presumed rarity of double-bust QSOs, that no attempt be made to mark them.
The entries for the exchanges in the case of exchange or reverse exchange busts are normalised to two-letter or four-digit values in the same manner as described above for the exchanges in the cleaned logs.

2021-06-07

Creating Local FCC Database From Version 6 Data

This is a followup to this post, required because the FCC has changed the format of their files from version 5 to version 6.

The contents of the eight FCC files are now (as described in the version 6 document):

Amateur
[AM] -- unchanged from version 5
1   Record Type [AM]            char(2)
2   Unique System Identifier    numeric(9,0)
3   ULS File Number             char(14)
4   EBF Number                  varchar(30)
5   Call Sign                   char(10)
6   Operator Class              char(1)
7   Group Code                  char(1)
8   Region Code                 tinyint
9   Trustee Call Sign           char(10)
10 Trustee Indicator           char(1)
11 Physician Certification     char(1)
12 VE Signature                char(1)
13 Systematic Call Sign Change char(1)
14 Vanity Call Sign Change     char(1)
15 Vanity Relationship         char(12)
16 Previous Call Sign          char(10)
17 Previous Operator Class     char(1)
18 Trustee Name                varchar(50)

Comments
[CO] -- unchanged from version 5
1   Record Type [CO]            char(2)
2   Unique System Identifier    numeric(9,0)
3   ULS File Number             char(14)
4   Call Sign                   char(10)
5   Comment Date                mm/dd/yyyy
6   Description                 varchar(255)
7   Status Code                 char(1)
8   Status Date                 mm/dd/yyyy

Entity
[EN] -- unchanged from version 5
1   Record Type [EN]                char(2)
2   Unique System Identifier        numeric(9,0)
3   ULS File Number                 char(14)
4   EBF Number                      varchar(30)
5   Call Sign                       char(10)
6   Entity Type                     char(2)
7   Licensee ID                     char(9)
8   Entity Name                     varchar(200)
9   First Name                      varchar(20)
10 MI                              char(1)
11 Last Name                       varchar(20)
12 Suffix                          char(3)
13 Phone                           char(10)
14 Fax                             char(10)
15 Email                           varchar(50)
16 Street Address                  varchar(60)
17 City                            varchar(20)
18 State                           char(2)
19 Zip Code                        char(9)
20 PO Box                          varchar(20)
21 Attention Line                  varchar(35)
22 SGIN                            char(3)
23 FCC Registration Number (FRN)   char(10)
24 Applicant Type Code             char(1)
25 Applicant Type Code Other       char(40)
26 Status Code                     char(1)
27 Status Date                     mm/dd/yyyy
28 3.7 GHz License Type            char(1)
29 Linked Unique System Identifier numeric(9,0)
30 Linked Call Sign                 char(10)

Application/License Header
[HD]
1   Record Type [HD]                            char(2)
2   Unique System Identifier                    numeric(9,0)
3   ULS File Number                             char(14)
4   EBF Number                                  varchar(30)
5   Call Sign                                   char(10)
6   License Status                              char(1)
7   Radio Service Code                          char(2)
8   Grant Date                                  mm/dd/yyyy
9   Expired Date                                mm/dd/yyyy
10 Cancellation Date                           mm/dd/yyyy
11 Eligibility Rule Num                        char(10)
12 Reserved                                    char(1)
13 Alien                                       char(1)
14 Alien Government                            char(1)
15 Alien Corporation                           char(1)
16 Alien Officer                               char(1)
17 Alien Control                               char(1)
18 Revoked                                     char(1)
19 Convicted                                   char(1)
20 Adjudged                                    char(1)
21 Reserved                                    char(1)
22 Common Carrier                              char(1)
23 Non Common Carrier                          char(1)
24 Private Comm                                char(1)
25 Fixed                                       char(1)
26 Mobile                                      char(1)
27 Radiolocation                               char(1)
28 Satellite                                   char(1)
29 Developmental or STA or Demonstration       char(1)
30 InterconnectedService                       char(1)
31 Certifier First Name                        varchar(20)
32 Certifier MI                                char(1)
33 Certifier Last Name                         varchar(20)
34 Certifier Suffix                            char(3)
35 Certifier Title                             char(40)
36 Female                                      char(1)
37 Black or African-American                   char(1)
38 Native American                             char(1)
39 Hawaiian                                    char(1)
40 Asian                                       char(1)
41 White                                       char(1)
42 Hispanic                                    char(1)
43 Effective Date                              mm/dd/yyyy
44 Last Action Date                            mm/dd/yyyy
45 Auction ID                                  integer
46 Broadcast Services - Regulatory Status      char(1)
47 Band Manager - Regulatory Status            char(1)
48 Broadcast Services - Type of Radio Service char(1)
49 Alien Ruling                                char(1)
50 Licensee Name Change                        char(1)
51 Whitespace Indicator                        char(1)
52 Operation/Performance Requirement Choice    char(1)
53 Operation/Performance Requirement Answer    char(1)
54 Discontinuation of Service                  char(1)
55 Regulatory Compliance                       char(1)
56 900 MHz Eligibility Certification           char(1)
57 900 MHz Transition Plan Certification       char(1)
58 900 MHz Return Spectrum Certification       char(1)
59 900 MHz Payment Certification               char(1)

History
[HS] -- unchanged from version 5
1   Record Type [HS]            char(2)
2   Unique System Identifier    numeric(9,0)
3   ULS File Number             char(14)
4   Call Sign                   char(10)
5   Log Date                    mm/dd/yyyy
6   Code                        char(6)

License Attachment
[LA] -- unchanged from version 5
1   Record Type [LA]            char(2)
2   Unique System Identifier    numeric(9,0)
3   Call Sign                   char(10)
4   Attachment Code             char(1)
5   Attachment Description      varchar(60)
6   Attachment Date             mm/dd/yyyy
7   Attachment File Name        varchar(60)
8   Action Performed            char(1)

Special Condition
[SC] -- unchanged from version 5
1   Record Type [SC]            char(2)
2   Unique System Identifier    numeric(9,0)
3   ULS File Number             char(14)
4   EBF Number                  varchar(30)
5   Call Sign                   char(10)
6   Special Condition Type      char(1)
7   Special Condition Code      int
8   Status Code                 char(1)
9   Status Date                 mm/dd/yyyy

License Free Form Special Condition
Position Data Element Definition
[SF] -- unchanged from version 5
1   Record Type [SF]                    char(2)
2   Unique System Identifier            numeric(9,0)
3   ULS File Number                     char(14)
4   EBF Number                          varchar(30)
5   Call Sign                           char(10)
6   License Free Form Type              char(1)
7   Unique License Free Form Identifier numeric(9,0)
8   Sequence Number                     integer
9   License Free Form Condition         varchar(255)
10 Status Code                         char(1)
11 Status Date                         mm/dd/yyyy

The following extract from the code that creates the output database maps these on a one-to-one basis to internal identifiers (the four new fields in the [HD] records don't seem to be important -- at least for now -- so there is no change in the output as compared to the processing of version 5 files, although the new fields are processed):

[AM]

RECORD_TYPE,
ID,
ULS_NUMBER,
EBF_NUMBER,
CALLSIGN,
OPERATOR_CLASS,
GROUP_CODE,
REGION_CODE,
TRUSTEE_CALLSIGN,
TRUSTEE_INDICATOR,
PHYSICIAN_CERTIFICATION,
VE_SIGNATURE,
SYSTEMATIC_CALLSIGN_CHANGE,
VANITY_CALLSIGN_CHANGE,
VANITY_RELATIONSHIP,
PREVIOUS_CALLSIGN,
PREVIOUS_OPERATOR_CLASS,
TRUSTEE_NAME

[CO]

RECORD_TYPE,
ID,
ULS_NUMBER,
CALLSIGN,
COMMENT_DATE,
DESCRIPTION,
STATUS_CODE,
STATUS_DATE

[EN]

RECORD_TYPE,
ID,
ULS_NUMBER,
EBF_NUMBER,
CALLSIGN,
ENTITY_TYPE,
LICENSE_ID,
ENTITY_NAME,
FIRST_NAME,
MIDDLE_INITIAL,
LAST_NAME,
SUFFIX,
PHONE,
FAX,
EMAIL,
STREET_ADDRESS,
CITY,
STATE,
ZIP_CODE,
PO_BOX,
ATTENTION_LINE,
SGIN,
FRN,
APPLICANT_TYPE_CODE,
APPLICANT_TYPE_CODE_OTHER,
STATUS_CODE,
LICENSE_TYPE_37,
LINKED_ID,
LINKED_CALLSIGN,

[HD]

RECORD_TYPE,
ID,
ULS_NUMBER,
EBF_NUMBER,
CALLSIGN,
LICENSE_STATUS,
RADIO_SERVICE_CODE,
GRANT_DATE,
EXPIRED_DATE,
CANCELLATION_DATE,
ELIGIBILITY_RULE_NUM,
RESERVED_1,
ALIEN,
ALIEN_GOVERNMENT,
ALIEN_CORPORATION,
ALIEN_OFFICER,
ALIEN_CONTROL,
REVOKED,
CONVICTED,
ADJUDGED,
RESERVED_2,
COMMON_CARRIER,
NON_COMMON_CARRIER,
PRIVATE_COMM,
FIXED,
MOBILE,
RADIOLOCATION,
SATELLITE,
DEVELOPMENTAL_STA_DEMONSTRATION,
INTERCONNECTED_SERVICE,
CERTIFIER_FIRST_NAME,
CERTIFIER_MIDDLE_INITIAL,
CERTIFIER_LAST_NAME,
CERTIFIER_SUFFIX,
CERTIFIER_TITLE,
FEMALE,
BLACK_AFRICAN_AMERICAN,
NATIVE_AMERICAN,
HAWAIIAN,
ASIAN,
WHITE,
HISPANIC,
EFFECTIVE_DATE,
LAST_ACTION_DATE,
AUCTION_ID,
BROADCAST_SERVICES_REGULATORY_STATUS,
BAND_MANAGER_REGULATORY_STATUS,
BROADCAST_SERVICES_SERVICE_TYPE,
ALIEN_RULING,
LICENSEE_NAME_CHANGE,
WHITESPACE_INDICATOR,
REQUIREMENT_CHOICE,
REQUIREMENT_ANSWER,
DISCONTINUED_SERVICE,
REGULATORY_COMPLIANCE,
ELIGIBILITY_900_MHZ,
TRANSITION_PLAN_900_MHZ,
RETURN_SPRCTRUM_900_MHZ,
PAYMENT_900_MHZ

[HS]

RECORD_TYPE,
ID,
ULS_NUMBER,
CALLSIGN,
LOG_DATE,
CODE

[LA]

RECORD_TYPE,
ID,
CALLSIGN,
ATTACHMENT_CODE,
ATTACHMENT_DESCRIPTION,
ATTACHMENT_DATE,
ATTACHMENT_FILENAME,
ACTION_PERFORMED

[SC]

RECORD_TYPE,
ID,
ULS_NUMBER,
EBF_NUMBER,
CALLSIGN,
SPECIAL_CONDITION_TYPE,
SPECIAL_CONDITION_CODE,
STATUS_CODE,
STATUS_DATE

[SF]

RECORD_TYPE,
ID,
ULS_NUMBER,
EBF_NUMBER,
CALLSIGN,
LICENSE_FREEFORM_TYPE,
UNIQUE_LICENSE_FREEFORM_ID,
SEQUENCE_NUMBER,
LICENSE_FREEFORM_CONDITION,
STATUS_CODE,
STATUS_DATE

The 50 output fields selected from the above lists are (arranged in groups of ten for easy counting):

ID,
CALLSIGN,
OPERATOR_CLASS,
GROUP_CODE,
REGION_CODE,
TRUSTEE_CALLSIGN,
TRUSTEE_INDICATOR,
SYSTEMATIC_CALLSIGN_CHANGE,
VANITY_CALLSIGN_CHANGE,
VANITY_RELATIONSHIP,

PREVIOUS_CALLSIGN,
PREVIOUS_OPERATOR_CLASS,
TRUSTEE_NAME,
COMMENT_DATE,
DESCRIPTION,
CO_STATUS_CODE, (i.e., STATUS_CODE from [CO])
CO_STATUS_DATE, (i.e., STATUS_DATE from [CO])
ENTITY_NAME,
FIRST_NAME,
MIDDLE_INITIAL,

LAST_NAME,
SUFFIX,
PHONE,
FAX,
EMAIL,
STREET_ADDRESS,
CITY,
STATE,
ZIP_CODE,
PO_BOX,

ATTENTION_LINE,
FRN,
APPLICANT_TYPE_CODE,
APPLICANT_TYPE_CODE_OTHER,
EN_STATUS_CODE, (i.e., STATUS_CODE from [EN])
EN_STATUS_DATE, (i.e., STATUS_DATE from [EN])
LICENSE_STATUS,
RADIO_SERVICE_CODE,
GRANT_DATE,
EXPIRED_DATE,

CANCELLATION_DATE,
ELIGIBILITY_RULE_NUM,
REVOKED,
CONVICTED,
ADJUDGED,
EFFECTIVE_DATE,
LAST_ACTION_DATE,
LICENSEE_NAME_CHANGE,
LINKED_ID,
LINKED_CALLSIGN

The contents of these fields are based on the original equivalent entries in the original data files. The entries for the fields are subject to the following transformations before being written to the output file:

The entry is converted to upper case;
Any line feeds (yes, the FCC allows line feeds within a field) are converted to the four-character sequence: <LF>;
Leading and trailing spaces are removed;
If the field is a date, it is converted from FCC format (mm/dd/yyyy) to ISO 8601 extended format: YYYY-MM-DD.

The latest output file created in this manner (and its MD5 checksum) may be downloaded from this directory.

The full source code to generate the output file may be downloaded here.

To create the binary from the source code, go to the directory that contains the makefile and type:

make fcc-db

This should generate the executable program as: bin/fcc-db. The program may be executed from within the bin directory as:

fcc-db [directory]

where [directory] is the name of the directory that contains the input FCC AM.dat, CO.dat, EN.dat and HD.dat files. Those files should be processed and the output written to stdout.

For what it's worth, it takes somewhat less than 15 seconds for the program to execute to completion on my desktop computer if stdout is redirected to an output file.

2021-04-26

More on CW Activity

I have generalised the code to plot the CW Activity Metric in the relevant github repository so that it can generate plots of data in bins that are less than a year in duration. For example:

This allows one to see the peaks corresponding to the CQ WW and ARRL DX contests each year, as well as the annual cycles of activity as propagation changes through the year. (And the change in behaviour in 2020, in which the early months of the year saw increased activity, corresponding -- one assumes -- to widespread stay-at-home lockdowns because of the COVID-19 pandemic, followed by a decrease as restrictions were eased over the following months.)

Of course, it's trivial to filter the RBN data before generating the CW metric. So, for example, we can create plots of annual data by continent (ignoring AN, for which there are too few data for meaningful analysis). The graphs are presented without further comment, as I think that they generally speak for themselves:

We can repeat this process, but binning the data into twelve bins of equal duration each year:

2021-04-19

Code for Plotting CW Activity

In prior posts (here and here) I describe a CW activity metric derived from the RBN, and plot the value of that metric on an annual basis. I thought that it might be useful to share the code that I used; although the earlier of the posts above describes the algorithm, the posts do not include any code.

Accordingly, I have added the code to a github repository. The new code is in the cw-activity directory.

Although it would have been much more efficient to code a monolithic program in C++, in the interests of making things more portable I used only scripts that should run on a wide range of machines. The only fundamental change needed to run the scripts on other machines is probably to change the variable FILENAME in the rbncat script. That variable should point to a mounted copy of the RBN database, as may be generated by the other tools in that github repository. Note that byte offsets are hard-coded into rbncat, so you should check that that script works as expected on your system before attempting to generate the activity plots. You might need to change the values of the byte offsets if, for example, for some reason you are using a line separator in your copy of the RBN database that is more than one character in length.

2021-04-15

Minor change to Reverse Beacon Network data files

While working on a script to process the RBN data files last week, I discovered that the raw historical data maintained by the RBN itself have not been properly filtered to remove duplicated lines. I have therefore replaced the data files I use with new versions in which the duplicated lines have been removed. This is a minor change (on a recent day, there were a total of 1,194,001,674 lines in the database, 1,188,655,665 of these were unique, implying that 5,346,009 additional lines were present, for a duplication rate of about 0.5%).

2021-04-06

Estimating CW Activity from RBN data, 2009 to 2020

It is possible to generate a metric of CW activity from RBN data. Extending the analysis in the linked post to 2020, we first check that the data have not changed in a way that would vitiate the result:

As in prior years, the overall shape of the data seems to be robust.

So, extending the prior analysis to include 2020, we obtain this graph:

Unsurprisingly, we see that a marked increase in CW activity (as measured by the defined metric) occurred in 2020. Because activity on all HF bands seems to have increased, it seems fair to ascribe the overall increase substantially to the COVID-19 pandemic rather than merely the improvement in propagation above 14 MHz that characterised the last quarter of 2020. It will be interesting to see if this increase is sustained in 2021.

Perhaps the most important result from this exercise, though, is the continued lack of evidence of any substantive long-term decline in the number of calls active on CW.

2021-03-29

Evaluating Station Contributions to the Reverse Beacon Network: 2020

Applying the algorithm described here to the Reverse Beacon Network data for 2020 (1.1GB; MD5: 966cc0317bc0e94d70d3b9a553215702) we obtain the following tables for the stations that, on the basis of that algorithm, made the highest-valued contributions to the RBN in 2020:

Band	Position	Call	Value
ALL	1	OE9GHV	396,913
ALL	2	OK2EW	337,335
ALL	3	OH6BG	329,298
ALL	4	DL3DTH	316,445
ALL	5	WZ7I	315,386
ALL	6	KM3T	304,869
ALL	7	DO4DXA	289,372
ALL	8	DL9GTB	285,668
ALL	9	HA1VHF	278,712
ALL	10	W1NT-6	265,115

Band	Position	Call	Value
10m	1	G0LUJ	38,448
10m	2	DL3DTH	35,759
10m	3	KU7T	29,824
10m	4	OH6BG	25,361
10m	5	DO4DXA	23,999
10m	6	WZ7I	19,598
10m	7	KM3T	14,227
10m	8	DJ9IE	13,544
10m	9	EA8BFK	12,229
10m	10	G4ZFE	12,090

Band	Position	Call	Value
12m	1	CX6VM	11,101
12m	2	DL3DTH	5,763
12m	3	OH6BG	4,571
12m	4	EA8/DF4UE	3,642
12m	5	LZ4UX	3,494
12m	6	G0LUJ	2,956
12m	7	EA5WU	2,423
12m	8	JH7CSU1	2,292
12m	9	CT7ANO	2,187
12m	10	DJ9IE	2,101

Band	Position	Call	Value
15m	1	CX6VM	42,323
15m	2	VU3KAZ	28,040
15m	3	EA8/DF4UE	22,155
15m	4	OH6BG	21,016
15m	5	EA8BFK	18,996
15m	6	DL3DTH	13,416
15m	7	JH7CSU1	13,302
15m	8	W3UA	13,042
15m	9	DO4DXA	12,982
15m	10	KO7SS	12,850

Band	Position	Call	Value
17m	1	CX6VM	25,584
17m	2	OH6BG	23,968
17m	3	EA8/DF4UE	18,929
17m	4	WZ7I	17,580
17m	5	DL3DTH	17,170
17m	6	KM3T	15,229
17m	7	W1NT-6	14,918
17m	8	EA5WU	11,954
17m	9	OE9GHV	11,417
17m	10	KO7SS	10,529

Band	Position	Call	Value
20m	1	DL9GTB	124,911
20m	2	OE9GHV	109,617
20m	3	WZ7I	108,177
20m	4	KM3T	106,772
20m	5	OH6BG	103,200
20m	6	K1TTT	94,867
20m	7	VE2WU	94,197
20m	8	W1NT-6	86,002
20m	9	W3UA	84,003
20m	10	LZ7AA	83,671

Band	Position	Call	Value
30m	1	OH6BG	38,419
30m	2	SE5E	31,793
30m	3	DL3DTH	30,380
30m	4	OE9GHV	26,680
30m	5	W1NT-6	26,291
30m	6	OL7M	25,808
30m	7	EA5WU	24,246
30m	8	OK2EW	23,308
30m	9	UA4M	21,772
30m	10	F6IIT	21,634

Band	Position	Call	Value
40m	1	OE9GHV	126,261
40m	2	LZ7AA	98,606
40m	3	OL7M	93,747
40m	4	WZ7I	91,648
40m	5	DO4DXA	87,750
40m	6	KM3T	87,222
40m	7	W1NT-6	79,517
40m	8	N5RZ	77,767
40m	9	W3UA	74,512
40m	10	SE5E	73,488

Band	Position	Call	Value
80m	1	OE9GHV	84,977
80m	2	OK2EW	67,364
80m	3	DO4DXA	63,885
80m	4	DL3DTH	54,162
80m	5	DR4W	51,626
80m	6	SM6FMB	48,184
80m	7	LZ7AA	45,539
80m	8	DE1LON	42,187
80m	9	SM7IUN	41,695
80m	10	HB9BXE	41,349

Band	Position	Call	Value
160m	1	OK2EW	105,478
160m	2	HA1VHF	100,279
160m	3	DL3DTH	38,666
160m	4	DO4DXA	28,590
160m	5	AC0C	27,544
160m	6	OH6BG	20,963
160m	7	OE9GHV	19,965
160m	8	VE6JY	19,006
160m	9	KM3T	18,945
160m	10	UA4M	18,240

2021-03-22

HF Beacons and the Reverse Beacon Network, 2020

Here is a table of the twenty fixed-frequency stations most often posted by the RBN in 2020:

Position	Station	Frequency (kHz)	Number of Posts
1	CS3B	14100	113,851
2	YV5B	14100	98,980
3	YU7QF	14017	85,288
4	I1MMR	7026	83,120
5	AA1K	1821	81,490
6	OH2B	14100	63,044
7	W6WX	14100	62,000
8	OP5K	7017	55,551
9	HB4FV/B	10134	48,303
10	UA3KW	14006	47,479
11	SP3CW	3565	45,430
12	4X6TU	18110	45,235
13	DK5JPL	3541	43,544
14	EW7LO	7008	42,240
15	OP5K	7018	40,034
16	CS3B	18110	39,162
17	4X6TU	14100	38,866
18	DK4AN	3569	37,837
19	SQ6JAN	3565	37,424
20	DJ6UX	7039	35,894

Notes:

Frequencies are rounded to the nearest kHz;
I am unsure how the U.S. stations in the list can be legal, since the FCC's regulations appear to limit [unattended] HF beacons to a portion of 10m;
It is my memory that the original HF beacons were all located on 28 MHz, so that listeners could be made aware of an opening. It is noticeable that not a single one of the stations on the list above is on 10m: the vast majority are on bands that can reasonably be expected to support some kind of non-local propagation at almost all times (which is probably the very reason that they are posted by the RBN so often -- but one does wonder what the putative purpose of such a beacon is);
The two entries for OP5K probably reflect errors in the reporting of frequency by the RBN stations; the actual frequency was probably at or close to 7017.5 kHz

Below are figures showing, for each of the stations in the table above, the signal strength as reported by the ten RBN stations that most frequently posted each individual beacon station.

In the following figures:

The ordinate for each of the strip charts ranges between 0 dB and the value shown as FSD (i.e., full scale deflection) near the bottom right-hand corner; in this case, the maximum value of each strip is therefore 80 dB.
The value plotted in this manner is the value denoted SNR by the RBN. Remember that the RBN has an odd definition of SNR.
The abscissa is divided into a number of bins of equal duration. On each plot there are 100 such bins; because the duration covered by each plot is one year, each bin therefore covers about 3½ days.
At the bottom of each strip chart is a coloured bar. Each bin in these bars is coloured so as to represent the total number of times that the RBN station spotted the beacon in the period covered by the bin.The colour legend for each figure is to the right of the figure.
For the period covered by each 3½-day bin, the lower quartile of SNR readings is coloured grey, the upper quartile is coloured white, and the middle two quartiles are coloured blue.
The vertical order of the various RBN stations is determined solely by the chronological order in which each station first spotted the beacon.

2021-03-15

Summary File for RBN data, 2009 to 2020

The complete set of RBN data for 2009 to the end of 2020, after uncompression, exceeds 100GB in size. As not all analyses need the complete dataset, I have constructed a summary file (rbn-summary-data.xz) that contains an overview of the data and which is sufficient for many kinds of analysis that do not depend on the details of individual posts to the RBN. (The basic script used to generate this summary file may be found here; the actual summary file is created by running this basic script for each individual year from 2009 to 2020 and concatenating the results after removing the header line from all except the first year.)

The summary file, after being uncompressed, comprises a single large table of values separated by white space. The name of each column (there are twelve columns in all) is on the first row. The columns are:

band: a string that identifies the band pertaining to this row. Typical values are "15m" or "160m"; if a row contains data that are not distinguished by band, then the characters "NA" are used.
mode: a string that identifies the mode pertaining to this row. Typical values are "CW" or "RTTY"; if a row contains data that are not distinguished by mode, then the characters "NA" are used.
type: a single character that identifies whether the data on this row are for a period of a year ("A"), a month ("M") or a day ("D").
year: the numeric four-digit value of the year to which the current row pertains.
month: the numeric value of the month (January = 1, etc.) of the data in this row. If the data are of type A or D, then this element has the value "NA".
doy: the numeric value of the day number of the year (January 1st = 1, etc.). The maximum value in each year is 366 (even if the year is not a leap year). In the event that the year is not a leap year, the data in columns 7, 8 and 9 will be set to 0 when doy is 366. If the data are of type A or M, then this element has the value "NA".
posts: the total number of posts recorded by the RBN for the band, mode and period identified by the first six columns.
calls: the total number of distinguishable calls recorded by the RBN for the band, mode and period identified by the first six columns.
posters: the total number of distinguishable posters recorded by the RBN for the band, mode and period identified by the first six columns.
scatter: the value of a scatter metric that characterises the geography of the RBN for the band, mode and period identified by the first six columns. The scatter metric is the sum of all possible distance pairs of good posters (measure in km), divided by the number of distance pairs.
good posters: the total number of distinguishable posters recorded by the RBN for the band, mode and period identified by the first six columns, and for which location data are available from the RBN.
grid metric: the total number of G(15, 100) grid cells that contain good posters.

For example, the first two lines of the summary file are (presented here as a table, in order to make it easier to view on mobile devices):

band	NA
mode	NA
type	A
year	2009
month	NA
doy	NA
posts	5007040
calls	143724
posters	151
scatter	5541
good_posters	150
grid_metric	22

This tells us that the first line of actual data in the file comprises annual data for the year 2009, with no separation by band or mode. In 2009, we see that there were 50,007,040 posts of 143,724 callsigns by 151 posters; the scatter metric, which is a measure of the geographic dispersion of the posters on the RBN., was 5,541; 150 different posters contributed the data, spread across 22 distinct G(15, 100) grid cells.

The summary file allows rather rapid analysis of many RBN overview statistics. For example, a plot of the daily number of posts covering the period from the inception of the RBN to the end of 2020 --

-- can be generated on an ordinary desktop PC in a few seconds. From this plot, for example, we can immediately see that the largest number of daily posts occurred during the 2020 running of the CQ WW CW contest in late November (the second-highest cluster of peaks is for the CQ WPX contest, and the third is for the ARRL DX CW contest); also, the burst of activity that coincides with weekends is unmistakable.

For what it's worth, this is the code I used to generate the above plot (I apologise for the awful layout caused by the wrapping of long lines as they are forced into the narrow format used by blogger.com):

#!/usr/bin/Rscript

# generate a plot of the diurnal number of posts by the RBN, stacked by year

MIN_YEAR <- 2009
MAX_YEAR <- 2020

filename <- "/zd1/rbn/rbn-summary-data" # the local location of the RBN summary data file

# first two lines of the file:
#band          mode          type          year         month           doy         posts         calls       posters       scatter good_posters   grid_metric
#NA            NA             A          2009            NA            NA       5007040        143724           151          5541           150            22

# rounding function
round_n <- function(x, n) { return ( ( as.integer( (x - 1) / n) +1 ) * n ) }    # function to return next higher integral multiple of n, unless value is already such a multiple

data <- read.table(filename, header=TRUE)

# select diurnal data
diurnal_data <- subset(data, type=='D')

# drop the per-band and per-mode data
diurnal_all_bands_and_modes_data <- subset(diurnal_data, is.na(band) & is.na(mode))

# drop a bunch of columns that we don't want from the summary file
diurnal_all_bands_and_modes_data\$band <- NULL
diurnal_all_bands_and_modes_data\$mode <- NULL
diurnal_all_bands_and_modes_data\$type <- NULL
diurnal_all_bands_and_modes_data\$month <- NULL
diurnal_all_bands_and_modes_data\$calls <- NULL
diurnal_all_bands_and_modes_data\$posters <- NULL
diurnal_all_bands_and_modes_data\$scatter <- NULL
diurnal_all_bands_and_modes_data\$good_posters <- NULL
diurnal_all_bands_and_modes_data\$grid_metric <- NULL

# get ready to start to plot
graphics.off()

png(filename=paste(sep="", "/tmp/rbn-posts-from-summary.png"), width=800, height=600)

x_lab <- 'DOY'

#            2009   2010                2011      2012     2013    2014      2015               2016      2017             2018     2019   2020
clrs <- c("black", "red", rgb(0.1, 0.1, 0.5), "yellow", "green", "blue", "violet", rgb(0.6, 0.2, 0.2), "white", "cornflowerblue", "gold1", "darkorange")

# create a frame to map between year and days in the year
days_in_year <- data.frame(seq(MIN_YEAR, MAX_YEAR), 365)
names(days_in_year) <- c("year", "days")

days_in_year\$days[days_in_year\$year %% 4 == 0] <- 366

# set boundaries
plot(0, 0, xlim = c(0.5, 366.5), ylim = c(0, round_n(max(diurnal_all_bands_and_modes_data\$posts), 1000000)), xaxt = "n", yaxt = "n", xlab = x_lab, ylab = "", type = 'n', yaxs="i")         # define the plotting region, but don't actually plot anything
rect(par("usr")[1], par("usr")[3], par("usr")[2], par("usr")[4], col = 'grey')

# now generate the plot, superimposing each year
for (this_year in seq(MIN_YEAR, MAX_YEAR))
{ this_years_data <- subset(diurnal_all_bands_and_modes_data, year==this_year)
max_element <- days_in_year\$days[this_year - MIN_YEAR + 1]

# set up so that this_years_data\$id_365[365] = this_years_data\$id_366[366] = 366, so either column can be used as a vector of abscissæ,
# depending on whether there are 365 or 366 ordinate values
this_years_data\$id_366<-seq.int(nrow(this_years_data))
this_years_data\$id_365<-((this_years_data\$id_366 - 1) * 365.0 / 364.0) + 1

# remove a couple of columns that we no longer need
this_years_data\$doy <- NULL
this_years_data\$year <- NULL

# move the two new columns of days to the left of the frame: 365, then 366
this_years_data <- (this_years_data[ c(ncol(this_years_data), ncol(this_years_data) - 1, 1:(ncol(this_years_data)-2))])

lines(this_years_data[,(max_element-364)][1:max_element], this_years_data\$posts[1:max_element], type = 'l', col = clrs[this_year - MIN_YEAR + 1], lwd = 1)
}

title_str <- paste(sep="", 'RBN POSTS PER DAY')
title(title_str)

title(ylab = '# OF POSTS (m)', line = 2.1, cex.lab = 1.0)

x_ticks_at <- c(1, 31, 61, 91, 121, 151, 181, 211, 241, 271, 301, 331, 361)
x_labels_at <- x_ticks_at
x_tick_labels <- x_ticks_at

axis(side = 1, at = x_ticks_at, labels = FALSE )    # ticks on x axis
axis(side = 1, at = x_labels_at, labels = x_tick_labels, tick = FALSE )

y_ticks_at <- seq(0, round_n(max(diurnal_all_bands_and_modes_data\$posts), 1000000), 100000)
y_labels_at <- seq(0, round_n(max(diurnal_all_bands_and_modes_data\$posts), 1000000), 1000000)
y_tick_labels <- seq(0, round_n(max(diurnal_all_bands_and_modes_data\$posts), 1000000) / 1000000, 1)

axis(side = 2, at = y_ticks_at, labels = FALSE )
axis(side = 2, at = y_labels_at, labels = y_tick_labels, tick = FALSE )

minx <- par("usr")[1]
maxx <- par("usr")[2]
miny <- par("usr")[3]
maxy <- par("usr")[4]

xrange <- maxx - minx
yrange <- maxy - miny

xpos <- minx + 0.025 * xrange
ypos <- miny + 0.975 * yrange

par(xpd=T, mar=c(0,0,4,0))

legend(x = xpos, y = ypos, legend = seq(MIN_YEAR, MAX_YEAR),
       lty=c(1, 1), lwd=c(2,2), col = clrs,
       bty = 'n', text.col = 'black')

graphics.off()

Of course, many other insights may be gleaned rather rapidly from the summary file.