2021-02-07

Statistics from 2020 CQ WW SSB and CQ WW CW logs

A huge number of analyses can be performed with the various public CQ WW logs (cq-ww-2005--2020-augmented.xz; see here for details of the augmented format) for the period from 2005 to 2020.

As usual, there follow a few basic analyses that interest me. There is, of course, plenty of scope to use the files for further analyses, some of which are suggested by the figures below.

Below are some simple analyses of basic statistics from the logs. The 2020 versions of the contests were, of course, run under the unique circumstance of a world-wide pandemic. So we can expect the data for 2020 to be unlike those for any other year. Whether 2020 changes any long-term trends will take a year or two to become clear.

 

Number of Logs


Until 2020, the raw number of submitted logs for SSB had been relatively flat for several years; the logs submitted for CW showed a fairly steady annual increase. In 2020, unsurprisingly, the number of logs in both modes increased:



One not infrequently reads statements to the effect that the popularity of contests such as CQ WW has long been increasing. This plot suggests that this has not been true for a number of years prior to 2020 (and even when it was true, there are alternative explanations for the year-on-year increase, such as increasing ease of electronic log submission). This year, because of the unique circumstances, one would reasonably expect that there really were more people sitting at home and spending at least a portion of the weekend(s) on the air. But, as we see in the next section, that doesn't really seem to have been the case.

 

Popularity


By definition, popularity requires some measure of people (or, in our case, the simple proxy of callsigns) -- there is no reason to believe, a priori, that the number of received logs as shown above is related in any particular way to the popularity of a contest.

So we look at the number of calls in the logs as a function of time, rather than positing any kind of well-defined positively correlated relationship between log submission and popularity (actually, the posts I have seen don't even bother to posit such a relationship: they are silent on the matter, thereby simply seeming to presume that the reader will assume one). 

However, the situation isn't as simple as it might be, because of the presence of busted calls in logs. If a call appears in the logs just once (or some small number of times), it is more likely to be a bust rather an actual participant. Where to set a cut-off a priori in order to discriminate between busts and actual calls is unclear; but we can plot the results of choosing several such values. 

First, for SSB:

Regardless of how many logs a call has to appear in before we regard it as a legitimate callsign, the popularity of CQ WW SSB in the past few years has fallen to a level rarely (if ever) seen in the public logs. It is true that 2020 saw an uptick, but the increase is slight (surprisingly so in light of the number of people around the world who were confined to home and might be expected to escape boredom by getting on the air for the largest contest of the year). It is certainly difficult to argue, on the basis of the above plot, that this contest is now more popular than it was at a similar point in the last solar cycle -- indeed, it appears, on its face, that the opposite is true.


[I note that a reasonable argument can be made that the number of uniques will be more or less proportional to the number of QSOs made (I have not tested that hypothesis; I leave it as an exercise for the interested reader to determine whether it is true), but there is no obvious reason why the same would be true for, for example, callsigns that appear in, say, ten or more logs.]

Moving to CW:

we see a similar story to SSB, except that any decrease in participation since the same point in the last cycle appears to be very small: participation in the CW event in the current inter-cycle doldrums seems to be more or less the same as at the corresponding point in the last cycle. The uptick in 2020 was considerably higher than for the SSB weekend, which on the face of it seems rather surprising: I expected that if a substantial number of people returned to the contest because of the pandemic, they would prefer the SSB weekend; the data show, however, that that expectation was not borne out.

 

Geographical Participation


How has the geographical distribution of entries changed over time?

Again looking at SSB first: 

Zone 28 has resumed its increase in the number of logs submitted, and the absolute number is now considerably higher than it was five or more years ago. The number of logs from zones outside EU or the US (and, to a lesser extent, JA) is miniscule. This can be seen more clearly if we plot the percentage of logs received from each zone as a function of time:

2020 shows a clear increase in western Europe -- the place that already dominated the submissions -- presumably because of the pandemic, and a continuation in Indonesia of the increase that has been ongoing for a number of years now. Of course, this comes at the cost of a decrease in other areas, particularly, it seems, Japan.

On CW, most zones evidence a long-term increase:


But the relative increase seems to be spread more or less evenly across all zones, with the percentages of logs from each zone barely changing over the years 2005 to 2020 (although again what increase there is seems to be most pronounced in western Europe):


Activity


Total activity in a contest depends both on the number of people who participate and on how many QSOs each of those people makes. We can use the public logs to count the total number of distinct QSOs in the logs (that is, each QSO is counted only once, even if both participants have submitted a log).

For SSB: 

The total number of distinct QSOs in the current inter-cycle doldrums is essentially the same as at the same point in the last solar cycle.

And for CW:


On this mode there appears to be a long-lived underlying upward trend (on which the effect of the solar cycle is superimposed), perhaps augmented somewhat by the pandemic in 2020. Despite the claims I see that CW is an obsolete technology in serious decline, the actual evidence, at least from this, the largest contest of the year, is quite the opposite. (This is a good reminder that when someone makes a claim whose truth is not self-evident, one should examine the underlying data for oneself. I have found that all too often it transpires that no defensible evidence has been put forward for the conclusion being drawn.) The evidence certainly seems to indicate that CW activity is faring rather better than activity on SSB, at least insofar as CQ WW is concerned.

 

Running and Calling


On SSB, the ongoing gradual shift towards stations strongly favouring either running or calling, rather than splitting their effort between the two types of operation, finally appears to have reached some kind of equilibrium, with essentially no change between 2018 and 2019, and even a slight reversal of the trend in 2020:




I have not investigated the cause of the decrease in the percentage of stations strongly favouring running, although the public logs could readily be used to distinguish possibilities that spring to mind, such as more SO2R operation, more multi-operator stations, and/or a reluctance of stations to forego the perceived advantages of spots from cluster networks.

On CW, the split between callers and runners continues to be much less bimodal than on SSB (on SSB, fully 30% of entrants have no run QSOs; on CW, the equivalent number is below 10%). Indeed, the difference in call/run behaviour on the two modes (and the difference in the way that the behaviour has changed over time) is profound, and probably worthy of further investigation. CW continues to appear to have what would seem to be a much healthier split between the two operating styles:




Assisted and Unassisted


We can see how the relative popularity of the assisted and unassisted categories has changed since they were introduced:

On CW, there are essentially equal numbers of assisted and unassisted logs, while on SSB the unassisted logs handily exceeds the number of assisted logs. My guess, for what it's worth, is that CW assistance is more widespread partly because it (partially) absolves stations from actually being able to copy at high speed, and partly because the RBN is so effective that essentially all CQing stations are spotted.

I find it particularly interesting that the number of CWU logs has remained essentially unchanged ever since the unassisted category was created.

Looking at the number of QSOs appearing in the unassisted and assisted logs:


(The lines are for the median number of logs; the vertical bars run from 10% to 90%, 20% to 80%, 30% to 70%, 40% to 80%, with opacity increasing in that order.)

A long-term downward trend in the numbers of QSOs in the assisted logs ceased in 2016, and since then the median number of QSOs in the assisted logs has remained essentially unchanged. The more or less constant difference of roughly one hundred QSOs between CW and SSB logs (in favour of CW) continues.

Inter-Zone QSOs


We can show the number of inter-zone QSOs, both band-by-band and in total. In these plots, the number of QSOs is accumulated every ten minutes, so there are six points per hour.

As expected at this point in the cycle, there were a negligible number of QSOs on 10m, in either the SSB or the CW events.


In 2018, I wrote:
In 2018, activity decreased substantially on 15m as compared to 2017. We certainly seem to be very close to the bottom of the cycle. Perhaps by next year there will be a slight improvement in conditions.

That slight improvement did in fact occur in 2019 and continued into 2020, so it does seem that the new cycle is making itself felt, which bodes well for 2021. (In fact, by this metric, 15m was about as good as it ever gets in this late-November contest on CW.)

20m was a victim of the propagation on 15m, but still saw plenty of activity, especially on CW.

As usual, CW dominates on 40m (and the other low bands), and the bulk of CW DX activity was in the first few hours (unlike 2018, which was exceptional for the activity on the last few hours of the contest).

80m was also dominated by CW, with, as usual, the bulk of DX activity in the first six hours.

160m paints a similar story to 80m, although the raw QSO counts are much lower. DX QSOs in 2020 were well down on the numbers in the prior couple of years.
The overall picture at last shows increases in the numbers of DX QSOs, especially on CW.

2021-01-29

Most-Logged Stations in CQ WW CW and SSB Contests: 2020, and the decade from 2011 to 2020

The public CQ WW CW and SSB logs allow us easily to tabulate the stations that appear in the largest number of entrants' logs. For 2020, the ten stations with the largest number of appearances in CQ WW SSB logs were:

Callsign Appearances % logs
LZ9W 9,916 53
YT5A 9,459 55
DF0HQ 9,248 53
II2S 7,000 46
ES9C 6,951 45
RL3A 6,890 46
EA8RM 6,795 43
LZ5R 6,591 44
UB7K 6,565 43
E7DX 6,459 41

The first column in the table is the callsign. The second column is the total number of times that the call appears in logs. That is, for example, if a station worked LZ9W on six bands, that will increment the value in the second column of the LZ9W row by six. The third column is the percentage of logs that contain the callsign at least once.

Similarly, the ten stations with the largest number of appearances in CQ WW CW 2020 were:

Callsign Appearances % logs
CR3W 14,512 69
LZ9W 11,614 67
YT5A 11,190 65
TI7W 9,471 48
ZF1A 9,157 47
LN8W 8,993 55
CR6K 8,563 49
RM9A 8,476 51
LZ5R 8,332 54
OM7M 8,251 55

Note the substantial difference between the SSB and CW tables.

I find it interesting to see which stations have had the most long-term activity on the contests. For the ten years from 2011 to 2020 on SSB we find:

Callsign Appearances % logs
LZ9W 89,249 55
CN3A 81,518 53
DF0HQ 80,576 53
PJ2T 66,941 41
K3LR 66,338 46
OT5A 64,399 43
A73A 62,605 43
P33W 62,540 43
HG7T 60,871 44
TM6M 58,211 43

And for the same years on CW:

Callsign Appearances % logs
LZ9W 102,880 66
9A1A 97,305 61
PJ2T 88,148 52
P33W 78,611 52
DF0HQ 78,010 53
W3LPL 74,284 49
K3LR 70,112 48
LZ5R 69,355 53
PJ4A 67,404 48
ES9C 67,220 45



2021-01-16

New CQ WW Video Maps

I have updated the set of CQ WW video maps on my youtube channel (channel N7DR). These video maps cover all the years for which public CQ WW logs are currently available (2005 to 2020).

To access individual videos directly:


The videos are created with time steps of ten minutes; when playing the video, each time step is displayed for five seconds. The videos are presented as animated GIF files, so they should display correctly without any specialised video software installed on your computer.

The videos assume that all communication is via the great-circle short path route, and include only inter-zone contacts. The width of the arcs is an absolute measure of the number of QSOs taking place over that path in the particular 10-minute segment. The colour of the arc reflects the relative number of QSOs taking place over the path. Each separate image (i.e., 10-minute segment) is normalized so that the path with the greatest number of QSOs is rendered in white. Paths with fewer QSOs are in progressively darker colours. Thus, arc colour should not be compared from one still image to another; arc width, however, is meaningful. The width of an arc in pixels is one plus the natural logarithm of the number of QSOs represented by the arc.

2021-01-15

Cleaned and Augmented Logs (including RBN data) for CQ WW CW and SSB Contests, 2005 to 2020

 

Cleaned and augmented versions of the logs for the CQ WW CW and SSB contests are now available for the period 2005 to 2020.

Links to the cleaned and augmented logs may be followed here.

The cleaned logs are the result of processing the QSO: lines from the entrants' submitted Cabrillo files to ensure that all fields contain valid values and all the data match the format required in the rules. Any line containing illegal data in a field (for example, a zone number greater than 40, or a date/time stamp that is outside the contest period) has simply been removed. Also, only the QSO: lines are retained, so that each line in the file can be processed easily. All zones are rendered with two digits, so as to further simplify processing by scripts or programs.

The augmented logs contain the same information as the cleaned logs, but with the addition of some useful (derived) information on each line. In addition to the actual logs, two additional sources of information are used when appropriate:

  1. AD1C has recently made accessible historical cty.dat and associated files. A copy of the cty,dat files is here. These allow us to use callsign-based multiplier lists as they would have existed at the time of each contest.

  2. From 2009 onwards, the Reverse Beacon Network (RBN) has been available for the CW contests. This allows us to include the time since a station was last posted by the RBN (see below for details).

The information added to each line of the augmented logs comprises:
  1. A sequence of four characters that are the same for each entry in a particular log:
    •  a. letter "A" or "U" indicating "assisted" or "unassisted"
    •  b. letter "Q", "L", "H" or "U", indicating respectively QRP, low power, high power or unknown power level
    •  c. letter "S", "M", "C" or "U", indicating respectively a single-operator, multi-operator, checklog or unknown operator category [ the contest organisers have stated that checklogs are not made public, but in fact at least some of them from the early years have been, hence the need for the "C" category ]
    •  d. character "1", "2", "+" or "U", indicating respectively that the number of transmitters is one, two, unlimited or unknown
  2. A four-digit number representing the time if the contact in minutes measured from the start of the contest. (I realise that this can be calculated from the other information on the line, but it saves subsequent script-based processors of the file considerable time to have the number readily available in the file without having to calculate it for each QSO.)
  3. Band
  4. A set of fourteen flags, each -- apart from column k and column n -- encoded as T/F: 
    • a. QSO is confirmed by a log from the second party 
    • b. QSO is a reverse bust (i.e., the second party appears to have bust the call of the first party) 
    • c. QSO is an ordinary bust (i.e., the first party appears to have bust the call of the second party) 
    • d. the call of the second party is unique 
    • e. QSO appears to be a NIL 
    • f. QSO is with a station that did not send in a log, but who did make 20 or more QSOs in the contest 
    • g. QSO appears to be a country mult 
    • h. QSO appears to be a zone mult 
    • i. QSO is a zone bust (i.e., the received zone appears to be a bust)
    • j. QSO is a reverse zone bust (i.e. the second party appears to have bust the zone of the first party)
    • k. This entry has three possible values rather than just T/F:
      • T: QSO appears to be made during a run by the first party
      • F: QSO appears not to be made during a run by the first party
      • U: the run status is unknown because insufficient frequency information is available in the first party's log
    • l. QSO is a dupe
    • m. QSO is a dupe in the second party's log
    • n. RBN information (see below)
  5. If the QSO is a reverse bust, the call logged by the second party; otherwise, the placeholder "-"
  6. If the QSO is an ordinary bust, the correct call that should have been logged by the first party; otherwise, the placeholder "-"
  7. If the QSO is a reverse zone bust, the zone logged by the second party; otherwise, the placeholder "-"
  8.  If the QSO is an ordinary zone bust, the correct zone that should have been logged by the first party; otherwise, the placeholder "-" 

RBN Information


In the CW contests from 2009 onwards, the RBN was active, automatically spotting the frequency at which any station calling CQ was transmitting. To reflect possible use of RBN information, the augmented files now include a fourteenth flag. For the sake of uniformity, this column is present in all the augmented files, regardless of whether the RBN actually contributed useful information to a particular contest.

Each QSO has one of several characters in the fourteenth column of flags. These characters should be interpreted as follows:

'-'
  No useful RBN-derived information is available for this QSO.

'0'
  The worked station (i.e., the second call on the log line) appears to have begun to CQ on this frequency within (roughly) 60 seconds prior to the QSO.

'A' to 'Z'
  For the nth letter of the alphabet: the worked station appears to have been CQing on this frequency for (roughly) n minutes prior to the QSO.

'+'
  The worked station appears to have been CQing for more than 26 minutes on this frequency.

'<'
  Because the the RBN is distributed, and because each contest entrant station has its own clock, there is generally a skew between the reading of the clock of the station making the QSO and the timestamp from the RBN at which it believes a posting was made (indeed, it's unclear from the RBN's [lack of] documentation exactly how the timestamp on an individual RBN posting is to be interpreted). If the character '<' appears in the the RBN column, it indicates that the raw values of the clocks suggest that the QSO took place up to two minutes before the RBN reported the worked station commencing to CQ at this frequency. When this occurs, the most likely interpretation is that there is non-negligible skew between the two clocks, and the station was actually worked almost as soon as a CQ was posted by the RBN. This character also appears if the RBN erroneously posts the worked station as CQing at this frequency shortly after the QSO. But it might also mean that the entrant was simply lucky and found the CQing station just as it fired up on a new frequency.

Notes:
  • The encoding of some of the flags requires subjective decisions to be made as to whether the flag should be true or false; consequently, and because CQ has yet to understand the importance of making their scoring code public, the value of a flag for a specific QSO line in some circumstances might not match the value that CQ would assign. (Also, CQ has more data available in the form of check logs, which are generally not made public.)
  • I made no attempt to deduce or infer the run status of a QSO in the second party's log (if such exists), regardless of the status in the first party's log. This allows one cleanly to perform correct statistical analyses anent the number of QSOs made by running stations merely by excluding QSOs marked with a U in column k.
  • No attempt is made to detect the case in which both participants of a QSO bust the other station's call. This is a problematic situation because of the relatively high probability of a false positive unless both stations accurately log the frequency as opposed to merely the band. (Also, on bands on which split-frequency QSOs are common, the absence of both transmit and receive frequency is a problem; I confess that I have never understood why Cabrillo was not designed to report both transmit and receive frequencies -- or even to define clearly which frequency is to be reported. I digress.) Because of the likelihood of false positives, it seems better, given the presumed rarity of double-bust QSOs, that no attempt be made to mark them.
  • The entries for the zones in the case of zone or reverse zone busts are normalised to two-digit values.

2021-01-02

Reverse Beacon Network Actvity: 2009-2020

I here show various plots of the G(15, 100) grid-based scatter metric, G(15, 100), for the Reverse Beacon Network (RBN), using data from the inception of the RBN up to the end of 2020.

As in the past I note that a reasonable a priori case can be made on the basis of propagation characteristics that somewhat different metrics in the G(Δ, n) series might be better representations of RBN coverage on some of the bands. However, rather than make this into a full-scale research project, I shall here simply continue to use the G(15, 100) metric on the basis that it seems "good enough" on all bands.

RBN Posting Stations as a Function of Time


We begin by looking simply at how the number of per-band posters to the RBN has varied since the RBN's inception. (NB Throughout this post, we ignore posters for which the location is not recorded by the RBN; plots for which the abscissa is time show one datum per month.)

First, a plot of the total number of posters as a function of time:


This can be more compactly represented, along with similar per-band data for 160m through 10m (excluding 60m):


G(15, 100) as a Function of Time


Turning now to the geographical distribution of the posting stations, we can display the mensal values of G(15, 100) in a similar manner:


These figures seem to make rather clearly the rather depressing point that, with the exception of 2020, which by definition was an exceptional year because of the prevalence of COVID-19, since early 2017 there has been no substantive or sustained increase in either the number or geographical distribution of the stations posting to the RBN. It will be interesting to see what happens in 2021.

G(15, 100) as a Function of the Number of Posters


Finally, we can combine the mensal values of G(15, 100) and the number of posters. Firstly, including all bands:


The summary plot for these data is slightly different, as the ordinate is multi-valued for some values of the abscissa. So, in this summary plot, we take the mean value of G(15, 100) in bins of width equivalent to ten posters, and plot rectangles in the equivalent colours:


All in all, a rather unhappy picture emerges, in which the RBN, after expanding and increasing coverage rather nicely for the better part of a decade, became essentially static in early 2017 and has effectively failed to expand numerically or in geographical coverage until he emergence of the pandemic in 2020. It will be interesting to see whether 2021 brings a return to stasis or whether the renewed slight improvement in coverage will continue.


2021-01-01

2020 RBN Data

 

All the postings to the Reverse Beacon Network in 2020, along with the postings from prior years, are now available in this directory.

Some simple annual statistics for the period 2009 to 2020 follow (the 2009 numbers cover only part of that year, as the RBN was instantiated partway through that year).

Total posts:
2009:   5,007,040
2010:  25,116,810
2011:  49,705,539
2012:  71,584,195
2013:  92,875,152
2014:  108,862,505
2015:  116,385,762
2016:  111,027,068
2017:  117,973,111
2018:  131,930,432
2019:  135,558,461
2020:  173,655,453 
  Total posting stations:
2009: 151
2010: 265
2011: 320
2012: 420
2013: 473
2014: 515
2015: 511
2016: 590
2017: 625
2018: 550
2019: 583
2020: 616
 Total posted distinct callsigns:
2009: 143,724
2010: 266,189
2011: 271,133
2012: 308,010
2013: 353,952
2014: 398,293
2015: 433,197
2016: 375,613
2017: 356,461
2018: 361,058
2019: 337,246
2020: 369,580
Obviously, statistics that are considerably more comprehensive may be derived rather easily from the files in the directory.

Note that if you intend to use the databaseß´s reported signal strengths in an analysis, you should be sure that you understand the ramifications of what the RBN means by SNR.

2020-12-30

Creating Local FCC Database From Version 5 Data

 

This is a followup to this post, required because the FCC has changed the format of their files from version 4 to version 5.

The contents of the eight FCC files are now (as described in the version 5 document):

Amateur
[AM] -- unchanged from version 4
1   Record Type [AM]            char(2)
2   Unique System Identifier    numeric(9,0)
3   ULS File Number             char(14)
4   EBF Number                  varchar(30)
5   Call Sign                   char(10)
6   Operator Class              char(1)
7   Group Code                  char(1)
8   Region Code                 tinyint
9   Trustee Call Sign           char(10)
10  Trustee Indicator           char(1)
11  Physician Certification     char(1)
12  VE Signature                char(1)
13  Systematic Call Sign Change char(1)
14  Vanity Call Sign Change     char(1)
15  Vanity Relationship         char(12)
16  Previous Call Sign          char(10)
17  Previous Operator Class     char(1)
18  Trustee Name                varchar(50)

Comments
[CO] -- unchanged from version 4
1   Record Type [CO]            char(2)
2   Unique System Identifier    numeric(9,0)
3   ULS File Number             char(14)
4   Call Sign                   char(10)
5   Comment Date                mm/dd/yyyy
6   Description                 varchar(255)
7   Status Code                 char(1)
8   Status Date                 mm/dd/yyyy

Entity
[EN]
1   Record Type [EN]                char(2)
2   Unique System Identifier        numeric(9,0)
3   ULS File Number                 char(14)
4   EBF Number                      varchar(30)
5   Call Sign                       char(10)
6   Entity Type                     char(2)
7   Licensee ID                     char(9)
8   Entity Name                     varchar(200)
9   First Name                      varchar(20)
10  MI                              char(1)
11  Last Name                       varchar(20)
12  Suffix                          char(3)
13  Phone                           char(10)
14  Fax                             char(10)
15  Email                           varchar(50)
16  Street Address                  varchar(60)
17  City                            varchar(20)
18  State                           char(2)
19  Zip Code                        char(9)
20  PO Box                          varchar(20)
21  Attention Line                  varchar(35)
22  SGIN                            char(3)
23  FCC Registration Number (FRN)   char(10)
24  Applicant Type Code             char(1)
25  Applicant Type Code Other       char(40)
26  Status Code                     char(1)
27  Status Date                     mm/dd/yyyy
28  3.7 GHz License Type            char(1)
29  Linked Unique System Identifier numeric(9,0)
30  Linked Call Sign                 char(10)
 
Application/License Header -- unchanged from version 4
[HD]
1   Record Type [HD]                            char(2)
2   Unique System Identifier                    numeric(9,0)
3   ULS File Number                             char(14)
4   EBF Number                                  varchar(30)
5   Call Sign                                   char(10)
6   License Status                              char(1)
7   Radio Service Code                          char(2)
8   Grant Date                                  mm/dd/yyyy
9   Expired Date                                mm/dd/yyyy
10  Cancellation Date                           mm/dd/yyyy
11  Eligibility Rule Num                        char(10)
12  Reserved                                    char(1)
13  Alien                                       char(1)
14  Alien Government                            char(1)
15  Alien Corporation                           char(1)
16  Alien Officer                               char(1)
17  Alien Control                               char(1)
18  Revoked                                     char(1)
19  Convicted                                   char(1)
20  Adjudged                                    char(1)
21  Reserved                                    char(1)
22  Common Carrier                              char(1)
23  Non Common Carrier                          char(1)
24  Private Comm                                char(1)
25  Fixed                                       char(1)
26  Mobile                                      char(1)
27  Radiolocation                               char(1)
28  Satellite                                   char(1)
29  Developmental or STA or Demonstration       char(1)
30  InterconnectedService                       char(1)
31  Certifier First Name                        varchar(20)
32  Certifier MI                                char(1)
33  Certifier Last Name                         varchar(20)
34  Certifier Suffix                            char(3)
35  Certifier Title                             char(40)
36  Female                                      char(1)
37  Black or African-American                   char(1)
38  Native American                             char(1)
39  Hawaiian                                    char(1)
40  Asian                                       char(1)
41  White                                       char(1)
42  Hispanic                                    char(1)
43  Effective Date                              mm/dd/yyyy
44  Last Action Date                            mm/dd/yyyy
45  Auction ID                                  integer
46  Broadcast Services - Regulatory Status      char(1)
47  Band Manager - Regulatory Status            char(1)
48  Broadcast Services - Type of Radio Service  char(1)
49  Alien Ruling                                char(1)
50  Licensee Name Change                        char(1)
51  Whitespace Indicator                        char(1)
52  Operation/Performance Requirement Choice    char(1)
53  Operation/Performance Requirement Answer    char(1)
54  Discontinuation of Service                  char(1)
55  Regulatory Compliance                       char(1)

History
[HS] -- unchanged from version 4
1   Record Type [HS]            char(2)
2   Unique System Identifier    numeric(9,0)
3   ULS File Number             char(14)
4   Call Sign                   char(10)
5   Log Date                    mm/dd/yyyy
6   Code                        char(6)

License Attachment
[LA] -- unchanged from version 4
1   Record Type [LA]            char(2)
2   Unique System Identifier    numeric(9,0)
3   Call Sign                   char(10)
4   Attachment Code             char(1)
5   Attachment Description      varchar(60)
6   Attachment Date             mm/dd/yyyy
7   Attachment File Name        varchar(60)
8   Action Performed            char(1)

Special Condition
[SC] -- unchanged from version 4
1   Record Type [SC]            char(2)
2   Unique System Identifier    numeric(9,0)
3   ULS File Number             char(14)
4   EBF Number                  varchar(30)
5   Call Sign                   char(10)
6   Special Condition Type      char(1)
7   Special Condition Code      int
8   Status Code                 char(1)
9   Status Date                 mm/dd/yyyy

License Free Form Special Condition
Position Data Element Definition
[SF] -- unchanged from version 4
1   Record Type [SF]                    char(2)
2   Unique System Identifier            numeric(9,0)
3   ULS File Number                     char(14)
4   EBF Number                          varchar(30)
5   Call Sign                           char(10)
6   License Free Form Type              char(1)
7   Unique License Free Form Identifier numeric(9,0)
8   Sequence Number                     integer
9   License Free Form Condition         varchar(255)
10  Status Code                         char(1)
11  Status Date                         mm/dd/yyyy
The following extract from the code that creates the output database maps these on a one-to-one basis to internal identifiers (the four new fields in the [HD] records don't seem to be important -- at least for now -- so there is no change in the output as compared to the processing of version 3 files, although the new fields are processed):

[AM]
RECORD_TYPE,
ID,
ULS_NUMBER,
EBF_NUMBER,
CALLSIGN,
OPERATOR_CLASS,
GROUP_CODE,
REGION_CODE,
TRUSTEE_CALLSIGN,
TRUSTEE_INDICATOR,
PHYSICIAN_CERTIFICATION,
VE_SIGNATURE,
SYSTEMATIC_CALLSIGN_CHANGE,
VANITY_CALLSIGN_CHANGE,
VANITY_RELATIONSHIP,
PREVIOUS_CALLSIGN,
PREVIOUS_OPERATOR_CLASS,
TRUSTEE_NAME
 [CO]
RECORD_TYPE,
ID,
ULS_NUMBER,
CALLSIGN,
COMMENT_DATE,
DESCRIPTION,
STATUS_CODE,
STATUS_DATE
 [EN]
RECORD_TYPE,
ID,
ULS_NUMBER,
EBF_NUMBER,
CALLSIGN,
ENTITY_TYPE,
LICENSE_ID,
ENTITY_NAME,
FIRST_NAME,
MIDDLE_INITIAL,
LAST_NAME,
SUFFIX,
PHONE,
FAX,
EMAIL,
STREET_ADDRESS,
CITY,
STATE,
ZIP_CODE,
PO_BOX,
ATTENTION_LINE,
SGIN,
FRN,
APPLICANT_TYPE_CODE,
APPLICANT_TYPE_CODE_OTHER,
STATUS_CODE,
LICENSE_TYPE_37,
LINKED_ID,
LINKED_CALLSIGN,
 
  [HD]
RECORD_TYPE,
ID,
ULS_NUMBER,
EBF_NUMBER,
CALLSIGN,
LICENSE_STATUS,
RADIO_SERVICE_CODE,
GRANT_DATE,
EXPIRED_DATE,
CANCELLATION_DATE,
ELIGIBILITY_RULE_NUM,
RESERVED_1,
ALIEN,
ALIEN_GOVERNMENT,
ALIEN_CORPORATION,
ALIEN_OFFICER,
ALIEN_CONTROL,
REVOKED,
CONVICTED,
ADJUDGED,
RESERVED_2,
COMMON_CARRIER,
NON_COMMON_CARRIER,
PRIVATE_COMM,
FIXED,
MOBILE,
RADIOLOCATION,
SATELLITE,
DEVELOPMENTAL_STA_DEMONSTRATION,
INTERCONNECTED_SERVICE,
CERTIFIER_FIRST_NAME,
CERTIFIER_MIDDLE_INITIAL,
CERTIFIER_LAST_NAME,
CERTIFIER_SUFFIX,
CERTIFIER_TITLE,
FEMALE,
BLACK_AFRICAN_AMERICAN,
NATIVE_AMERICAN,
HAWAIIAN,
ASIAN,
WHITE,
HISPANIC,
EFFECTIVE_DATE,
LAST_ACTION_DATE,
AUCTION_ID,
BROADCAST_SERVICES_REGULATORY_STATUS,
BAND_MANAGER_REGULATORY_STATUS,
BROADCAST_SERVICES_SERVICE_TYPE,
ALIEN_RULING,
LICENSEE_NAME_CHANGE,
WHITESPACE_INDICATOR
 [HS]
RECORD_TYPE,
ID,
ULS_NUMBER,
CALLSIGN,
LOG_DATE,
CODE
 [LA]
RECORD_TYPE,
ID,
CALLSIGN,
ATTACHMENT_CODE,
ATTACHMENT_DESCRIPTION,
ATTACHMENT_DATE,
ATTACHMENT_FILENAME,
ACTION_PERFORMED
 [SC]
RECORD_TYPE,
ID,
ULS_NUMBER,
EBF_NUMBER,
CALLSIGN,
SPECIAL_CONDITION_TYPE,
SPECIAL_CONDITION_CODE,
STATUS_CODE,
STATUS_DATE
 [SF]
RECORD_TYPE,
ID,
ULS_NUMBER,
EBF_NUMBER,
CALLSIGN,
LICENSE_FREEFORM_TYPE,
UNIQUE_LICENSE_FREEFORM_ID,
SEQUENCE_NUMBER,
LICENSE_FREEFORM_CONDITION,
STATUS_CODE,
STATUS_DATE

The 50 output fields selected from the above lists are (arranged in groups of ten for easy counting):

ID,
CALLSIGN,
OPERATOR_CLASS,
GROUP_CODE,
REGION_CODE,
TRUSTEE_CALLSIGN,
TRUSTEE_INDICATOR,
SYSTEMATIC_CALLSIGN_CHANGE,
VANITY_CALLSIGN_CHANGE,
VANITY_RELATIONSHIP,

PREVIOUS_CALLSIGN,
PREVIOUS_OPERATOR_CLASS,
TRUSTEE_NAME,
COMMENT_DATE,
DESCRIPTION,
CO_STATUS_CODE, (i.e., STATUS_CODE from [CO])
CO_STATUS_DATE, (i.e., STATUS_DATE from [CO])
ENTITY_NAME,
FIRST_NAME,
MIDDLE_INITIAL,

LAST_NAME,
SUFFIX,
PHONE,
FAX,
EMAIL,
STREET_ADDRESS,
CITY,
STATE,
ZIP_CODE,
PO_BOX,

ATTENTION_LINE,
FRN,
APPLICANT_TYPE_CODE,
APPLICANT_TYPE_CODE_OTHER,
EN_STATUS_CODE, (i.e., STATUS_CODE from [EN])
EN_STATUS_DATE, (i.e., STATUS_DATE from [EN])
LICENSE_STATUS,
RADIO_SERVICE_CODE,
GRANT_DATE,
EXPIRED_DATE,

CANCELLATION_DATE,
ELIGIBILITY_RULE_NUM,
REVOKED,
CONVICTED,
ADJUDGED,
EFFECTIVE_DATE,
LAST_ACTION_DATE,
LICENSEE_NAME_CHANGE,
LINKED_ID,
LINKED_CALLSIGN
The contents of these fields are based on the original equivalent entries in the original data files. The entries for the fields are subject to the following transformations before being written to the output file:
  • The entry is converted to upper case;
  • Any line feeds (yes, the FCC allows line feeds within a field) are converted to the four-character sequence: <LF>;
  • Leading and trailing spaces are removed;
  • If the field is a date, it is converted from FCC format (mm/dd/yyyy) to ISO 8601 extended format: YYYY-MM-DD.
The latest output file created in this manner (and its MD5 checksum) may be downloaded from this directory.

The full source code to generate the output file may be downloaded here.

To create the binary from the source code, go to the directory that contains the makefile and type:
make fcc-db
This should generate the executable program as: bin/fcc-db. The program may be executed from within the bin directory as:
fcc-db [directory]
where [directory] is the name of the directory that contains the input FCC AM.dat, CO.dat, EN.dat and HD.dat files. Those files should be processed and the output written to stdout.

For what it's worth, it takes somewhat less than 15 seconds for the program to execute to completion on my desktop computer if stdout is redirected to an output file.