2017-06-26

New Augmented Logs for CQ WW, 2005 to 2016

Now available are new augmented versions of the public logs for CQ WW CW and SSB for the period 2005 to 2016.

The cleaned logs are the result of processing the QSO: lines from the entrants' submitted Cabrillo files to ensure that all fields contain valid values and all the data match the format required in the rules. Any line containing illegal data in a field (for example, a zone number greater than 40, or a date/time stamp that is outside the contest period) has simply been removed. Also, only the QSO: lines are retained, so that each line in the file can be processed easily. The MD5 checksum for the file of cleaned logs is: 1b47059d1f2431b55d89a5eb954a05cc.

The augmented logs contain the same information as the cleaned logs, with the addition of some useful information on each line. The MD5 checksum for the compressed (~800 MB) file of augmented logs is: 1e981765c9c1d8edb76fb18d2cd32460. The information added to each line now includes two new fields: the callsign copied by the second party in the case that the second party bust the cull of the first party; amd the correct callsign of the second party in the case that the first party bust the second party's call.

In all, the addition fields in the augmented file comprise:
  1. The letter "A" or "U" indicating "assisted" or "unassisted"
  2. A four-digit number representing the time if the contact in minutes measured from the start of the contest. (I realise that this can be calculated from the other information on the line, but it saves a lot of time to have the number readily available in the file without having to calculate it each time.)
  3. Band
  4. A set of eleven flags, each -- apart from column k -- encoded as T/F: 
    • a. QSO is confirmed by a log from the second party 
    • b. QSO is a reverse bust (i.e., the second party appears to have bust the call of the first party) 
    • c. QSO is an ordinary bust (i.e., the first party appears to have bust the call of the second party) 
    • d. the call of the second party is unique 
    • e. QSO appears to be a NIL 
    • f. QSO is with a station that did not send in a log, but who did make 20 or more QSOs in the contest 
    • g. QSO appears to be a country mult 
    • h. QSO appears to be a zone mult 
    • i. QSO is a zone bust (i.e., the received zone appears to be a bust)
    • j. QSO is a reverse zone bust (i.e. the second party appears to have bust the zone of the first party)
    • k. This entry has three possible values rather than just T/F:
      • T: QSO appears to be made during a run by the first party
      • F: QSO appears not to be made during a run by the first party
      • U: the run status is unknown because insufficient frequency information is available in the first party's log 
  5. If the QSO is a reverse bust, the call logged by the second party; otherwise, the placeholder "-"
  6. If the QSO is an ordinary bust, the correct call that should have been logged by the first party; otherwise, the placeholder "-"
Notes:
  • The encoding of some of the flags requires subjective decisions to be made as to whether the flag should be true or false; consequently, and because CQ has yet to understand the importance of making their scoring code public, the value of a flag for a specific QSO line in some circumstances might not match the value that CQ would assign. (Also, CQ has more data available in the form of check logs, which are not made public.)
  • I made no attempt to deduce the run status of a QSO in the second party's log (if such exists), regardless of the status in the first party's log. This allows one cleanly to perform correct statistical analyses anent the number of QSOs made by running stations merely by excluding QSOs marked with a U in column k.
  • No attempt is made to detect the case in which both participants of a QSO bust the other station's call. This is a problematic situation because of the relatively high probability of a false positive unless both stations log the frequency as opposed to the band. (Also, on bands on which split-frequency QSOs are common, the absence of both transmit and receive frequency is a problem.) Because of the likelihood of false positives, it seems better, given the presumed rarity of double-bust QSOs, that no attempt be made to mark them.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.