2017-08-28

More on Reporting Bust Rates in Contests

Here I began to address the problem of how to report and compare bust rates in contests in a defensible, objective manner.

At the end of that post, we decided that: rather than simply quoting some kind of "bust rate", a range should be quoted for each station, representing, say, the 99% confidence limit for the rate.

If we do that, and reorder the stations in order of decreasing upper limit (which seems like the most reasonable ordering: it means that we are 99.5% sure that the actual bust rate is less than this number), then we find: and then we follow this text with a table.

Let us apply this methodology to an example contest (CQ WW SSB for 2005, as that is the first CQ WW contest for which public logs are available), and look at the stations that appear to have the best rate of copying. What we tabulate is the 99.5% confidence limit for the probability of a bust, $p_{bust}$, as in this table:

2005 CQ WW SSB -- 99.5% confidence upper bound to $p_{bust}$
Position Call $p_{995}$
1 OH5BM 0.0005
2 9A2EU 0.0008
3 N5AU 0.0012
4 UA1OMX 0.0015
5 V31MQ 0.0018
6 UR6IJ 0.0019
7 KF2O 0.0020
8 DL3BRA 0.0021
9 EA1JO 0.0022
10 W3YY 0.0022

Let's expand the table so as to include the verified number of QSOs, $Q_v$, and the number of busts, $B$, for each station:

2005 CQ WW SSB -- 99.5% confidence upper bound to $p_{bust}$
Position Call $p_{995}$ $Q_v$ $B$
1 OH5BM 0.0005 1904 0
2 9A2EU 0.0008 1175 0
3 N5AU 0.0012 802 0
4 UA1OMX 0.0015 649 0
5 V31MQ 0.0018 563 0
6 UR6IJ 0.0019 533 0
7 KF2O 0.0020 488 0
8 DL3BRA 0.0021 470 0
9 EA1JO 0.0022 457 0
10 W3YY 0.0022 447 0

It should be comforting to observe that by ordering the stations in increasing order of $p_{995}$, we have also sorted it in decreasing order of $Q_v$.

However, note that none of the listed stations have any busts. What happens when we look at stations that have different numbers of busts? Let us look at an example further down the table:

2005 CQ WW SSB -- 99.5% confidence upper bound to $p_{bust}$
Position Call $p_{995}$ $Q_v$ $B$
4025 EC7ALM 0.6096 19 6
4026 GM8KSJ 0.6307 7 1

Let's now look at the actual probability distribution of $p_{bust}$ for these two stations (the vertical lines are at the two values of $p_{995}$):


It is obvious from this plot that the order of these two stations should be reversed in any meaningful table that purports to order stations in increasing order of estimates of $p_{bust}$. What is not so obvious is that the order of the two stations depends on the particular confidence limit chosen. For the 99.5% limit, the order is as stated, but (for example) for the 98% limit, the order is reversed. This arbitrariness is obviously as unacceptable as the basic notion that somehow GM8KSJ has a higher value of $p_{bust}$ than does EC7ALM.

So we need a different approach: although quoting a particular confidence limit is useful for defining $p_{bust}$ for a single station, it is inappropriate for ordering multiple stations, particularly in the case when different numbers of busts are involved (because the shapes of the probability curves differ markedly as $B$ varies).

A better way to order two stations is to determine the weighted mean value of $p_{bust}$ as distributed according to probability function of each station. This automatically results in an ordering such that if one selects a large number of values of $p_{bust}$ distributed according to the two relevant probability functions, the mean value for the first station will be less than the mean value for the second station.

[It is probably worth noting that, because of the lack of symmetry in the probability curves, this is not the same as the first station being necessarily more likely to have a lower value of  $p_{bust}$ than is the second. If this isn't obvious, just plot the difference between the probability curves for two stations with different values of $B$, such as EC7ALM and GM8KSJ.]

Note also that this procedure will leave the order of stations with equal numbers of busts unchanged, which is comforting.

Applying this procedure, our top-ten table now looks like this:

2005 CQ WW SSB -- weighted mean values of $p_{bust}$
Position Call weighted mean $Q_v$ $B$
1 OH5BM 0.0005 1904 0
2 9A2EU 0.0008 1175 0
3 N5AU 0.0012 802 0
4 UA1OMX 0.0015 649 0
5 V31MQ 0.0017 563 0
6 UR6IJ 0.0018 533 0
7 ES5RY 0.00191068 1
8 KF2O 0.0020 488 0
9 DL3BRA 0.0021 470 0
10 EA1JO 0.0022 457 0

This looks reasonable (to me, anyway).


2017-08-09

Further Sanity Checking of the LANL GPS Charged-Particle Dataset

Prior posts in this series may be found at:
In this post I continue looking at the values of fields for the CXD experiment carried on ns53 through ns73.

Sanity check for collection_interval

The values for collection_interval seem reasonable. (Recall that we processed these values earlier, at stage 4 and stage 10.)

Sanity check for year

The values for year seem reasonable.

Sanity check for decimal_year

The documentation for decimal_year states:

Column Variable name type Dim. description
24 decimal_year double 1 decimal year = year + (decimal_day-1.0)/(days in year)

But if we look at the values in the files, we see an immediate problem. The values are formatted as, for example:
  2.007879e+03
That is, the smallest increment is 0.000001e+03 years, which is 0.001 years, or somewhat less than nine hours. This is obviously useless.

The decimal_year field is redundant, as it can be derived, as stated in the documentation, from the year field and the decimal_day field. But if the decimal_year field is going to be present, it should at least have the same value as the result of the calculation in the documentation. There is obvious use for this field if it has an accurate value, so that brings us to define stage 26:

Stage 26: Recalculate decimal_year


The values of decimal_day are recorded to 10-6 day. So decimal_year should be recorded to (roughly) 10-9 year. We can accomplish the reformatting easily, by applying the following script, do-stage-26.py,  to each file:

#!/usr/bin/env python
# -*- coding: utf8 -*-

import re
import sys

filename = sys.argv[1]

with open(filename) as records:
  for line in records:
    fields = line.split()
    decimal_day_str = fields[0]
    year_str = fields[22]
    decimal_day_value = float(decimal_day_str)
    year_value = int(year_str)
   
    days_in_year = 365
   
    if ( (year_value == 2004) or (year_value == 2008) or (year_value == 2012) or (year_value == 2016)):
      days_in_year = 366
     
    decimal_year_value = year_value + (decimal_day_value - 1.0) / days_in_year
     
    decimal_year_str = '{:14.9f}'.format(decimal_year_value)
   
    fields[23] = decimal_year_str
    newline = " ".join(fields)
  
    print newline


Applying this to each file:

 for file in ns[567]*
 do 
   ./do-stage-26.py $file | tr -s " " | sed 's/ $//' > ../gps-stage-26/$file
 done

The resulting maximum and minimum values of decimal_year look like this:

decimal_year
Satellite Minimum Maximum
53 2005.769864123 2016.999996964
54 2001.169868310 2016.999995825
55 2007.879454718 2017.000000000
56 2003.145206589 2016.999995066
57 2008.051914464 2017.000000000
58 2006.939732433 2016.999995825
59 2004.256832087 2016.999999620
60 2004.562844536 2016.999993169
61 2004.887981525 2016.999998104
62 2010.465755041 2016.999998325
63 2011.578088408 2016.999999653
64 2014.164385844 2016.923493000
65 2012.784155790 2016.999996205
66 2013.435620655 2016.999996964
67 2014.394527493 2016.999996396
68 2014.624659342 2016.999995003
69 2014.912332318 2016.999999716
70 2016.120224178 2016.999992601
71 2015.276714230 2016.999999874
72 2015.583565671 2016.999995825
72 2015.871238110 2016.999997628

Sanity check for SVN_number


The documentation for SVN_number states:

Column Variable name type Dim. description
25 SVN_number int 1 SVN number of satellite

The values for SVN_number seem to be correct, except for the obvious silliness that the field name, when expanded, is Satellite_Vehicle_Number_number, and the description includes the nonsensical term SVN number. Consequently I simply redefine this field as:

Column Variable name type Dim. description
25 SVN int 1 Satellite Vehicle Number

Sanity check for b_coord_radius


The maximum and minimum values for b_coord_radius are:

b_coord_radius
Satellite Minimum Maximum
53 6.828054e-01 4.200857e+00
54 8.312735e-01 4.211072e+00
55 8.557934e-01 4.198010e+00
56 7.165368e-01 4.202026e+00
57 7.242072e-01 4.171993e+00
58 7.577640e-01 4.188587e+00
59NA 4.214008e+00
60 NA 4.205339e+00
61 NA 4.223202e+00
62 1.304388e+00 4.193189e+00
63 6.851737e-01 4.209296e+00
64 1.461084e+00 4.209824e+00
65 1.361361e+00 4.202571e+00
66 1.049070e+00 4.182467e+00
67 7.584448e-01 4.211562e+00
68 7.835127e-01 4.211309e+00
69 7.074734e-01 4.169802e+00
70 8.317451e-01 4.207506e+00
71 1.880229e+00 4.171600e+00
72 1.366415e+00 4.170103e+00
72 7.233613e-01 4.213620e+00

There are two obvious problems with these values: some satellites contain invalid (NA) values; and the precision of the field varies as a function of the value of the field.

We can remove all the records that contain the an invalid value easily enough, which gives us stage 27.

Stage 27: Remove records with invalid values of b_coord_radius


for file in ns[567]*
do
  awk '$26!="NA" { print $0 }' $file > ../gps-stage-27/$file
done

This gives us:

b_coord_radius
Satellite Minimum Maximum
53 6.828054e-01 4.200857e+00
54 8.312735e-01 4.211072e+00
55 8.557934e-01 4.198010e+00
56 7.165368e-01 4.202026e+00
57 7.242072e-01 4.171993e+00
58 7.577640e-01 4.188587e+00
59 6.926215e-01 4.214008e+00
60 8.069324e-01 4.205339e+00
61 7.764630e-01 4.223202e+00
62 1.304388e+00 4.193189e+00
63 6.851737e-01 4.209296e+00
64 1.461084e+00 4.209824e+00
65 1.361361e+00 4.202571e+00
66 1.049070e+00 4.182467e+00
67 7.584448e-01 4.211562e+00
68 7.835127e-01 4.211309e+00
69 7.074734e-01 4.169802e+00
70 8.317451e-01 4.207506e+00
71 1.880229e+00 4.171600e+00
72 1.366415e+00 4.170103e+00
72 7.233613e-01 4.213620e+00

Stage 28: Reformat values of b_coord_radius


As noted above, the accuracy of the numbers reported varies as a function of the value of the field. This is not a desirable situation. Since values greater than unity are reported with a precision of a millionth of a terrestrial radius, there seems no point in reporting values at greater precision (i.e., less than about 6m), especially since such a precision surely vastly exceeds the accuracy of the magnetic field model on which the numbers are based.

Consequently, we can reformat the values so that they have a consistent precision of one millionth of the terrestrial radius. We define a script called do-stage-28.py:

#!/usr/bin/env python
# -*- coding: utf8 -*-

import re
import sys

filename = sys.argv[1]

with open(filename) as records:
  for line in records:
    fields = line.split()
    value_str = fields[25]      # b_coord_radius
    value = float(value_str)
     
    output_format = '{:8.6f}'
         
    new_value_str = output_format.format(value)
   
    fields[25] = new_value_str
    newline = " ".join(fields)
  
    print newline

And operate on the files:

for file in ns[567]*
do 
  ./do-stage-28.py $file | tr -s " " | sed 's/ $//' > ../gps-stage-28/$file
done

Now we have:

b_coord_radius
Satellite Minimum Maximum
53 0.682805 4.200857
54 0.831273 4.211072
55 0.855793 4.198010
56 0.716537 4.202026
57 0.724207 4.171993
58 0.757764 4.188587
59 0.692622 4.214008
60 0.806932 4.205339
61 0.776463 4.223202
62 1.304388 4.193189
63 0.685174 4.209296
64 1.461084 4.209824
65 1.361361 4.202571
66 1.049070 4.182467
67 0.758445 4.211562
68 0.783513 4.211309
69 0.707473 4.169802
70 0.831745 4.207506
71 1.880229 4.171600
72 1.366415 4.170103
72 0.723361 4.213620

Sanity check for b_coord_height


The maximum and minimum values for b_coord_height are:

b_coord_height
Satellite Minimum Maximum
53 -4.105137e+00 4.082754e+00
54 -4.011254e+00 4.119900e+00
55 -4.097133e+00 3.981823e+00
56 -4.091484e+00 4.110250e+00
57 -4.076309e+00 4.113386e+00
58 -4.051769e+00 4.112167e+00
59 -4.092374e+00 4.107748e+00
60 -3.965876e+00 4.114727e+00
61 -4.095934e+00 4.074463e+00
62 -3.918038e+00 3.950894e+00
63 -4.087862e+00 4.103668e+00
64 -3.801516e+00 3.899932e+00
65 -3.946078e+00 3.755339e+00
66 -3.898498e+00 4.027861e+00
67 -4.095983e+00 4.105099e+00
68 -4.039725e+00 4.085420e+00
69 -4.077435e+00 4.103233e+00
70 -4.077986e+00 4.069141e+00
71 -3.667803e+00 3.713735e+00
72 -3.727146e+00 3.938838e+00
72 -4.134635e+00 4.132903e+00

These require the same kind of massaging as b_coord_radius.

Stage 29: Reformat values of b_coord_height


We define do-stage-29.py:

#!/usr/bin/env python
# -*- coding: utf8 -*-

import re
import sys

filename = sys.argv[1]

with open(filename) as records:
  for line in records:
    fields = line.split()
    value_str = fields[26]      # b_coord_height
    value = float(value_str)
    
    output_format = '{:8.6f}'
    
    new_value_str = output_format.format(value)
  
    fields[26] = new_value_str
    newline = " ".join(fields)
 
    print newline


and apply it to the files:

for file in ns[567]*
do 
  ./do-stage-29.py $file | tr -s " " | sed 's/ $//' > ../gps-stage-29/$file
done

with the result:

b_coord_height
Satellite Minimum Maximum
53 -4.105137 4.082754
54 -4.011254 4.119900
55 -4.097133 3.981823
56 -4.091484 4.110250
57 -4.076309 4.113386
58 -4.051769 4.112167
59 -4.092374 4.107748
60 -3.965876 4.114727
61 -4.095934 4.074463
62 -3.918038 3.950894
63 -4.087862 4.103668
64 -3.801516 3.899932
65 -3.946078 3.755339
66 -3.898498 4.027861
67 -4.095983 4.105099
68 -4.039725 4.085420
69 -4.077435 4.103233
70 -4.077986 4.069141
71 -3.667803 3.713735
72 -3.727146 3.938838
72 -4.134635 4.132903

Sanity check for magnetic_longitude


The maximum and minimum values for magnetic_longitude are:

magnetic_longitude
Satellite Minimum Maximum
53 4.185175e-03 3.599997e+02
54 1.637927e-04 3.599992e+02
55 3.244386e-05 3.600000e+02
56 3.396840e-04 3.599996e+02
57 5.629913e-04 3.599997e+02
58 1.443975e-03 3.599997e+02
59 2.364732e-05 3.600000e+02
60 3.059084e-04 3.599989e+02
61 6.888084e-04 3.599998e+02
62 9.731220e-04 3.599994e+02
63 1.367992e-03 3.599995e+02
64 3.209574e-04 3.599982e+02
65 1.819354e-03 3.599992e+02
66 8.708465e-04 3.599990e+02
67 1.153824e-05 3.599956e+02
68 6.599287e-04 3.599986e+02
69 5.731681e-03 3.599997e+02
70 3.041434e-02 3.599747e+02
71 5.421115e-04 3.599966e+02
72 1.369661e-03 3.599986e+02
72 6.566200e-04 3.599996e+02


We apply changes similar to stage 23, as follows.

Stage 30: reformat values of magnetic_longitude

We define do-stage-30.py:

#!/usr/bin/env python
# -*- coding: utf8 -*-

import re
import sys

filename = sys.argv[1]

field_nr = 27    # magnetic_longitude

with open(filename) as records:
  for line in records:
    fields = line.split()
    value_str = fields[field_nr]      # longitude
    value = float(value_str)
   
    if value < 0:
      value += 360
     
    if value >= 360:
      value -= 360
     
    new_value_str = '{:9.4f}'.format(value)
   
    fields[field_nr] = new_value_str
    newline = " ".join(fields)
  
    print newline

And apply it:

for file in ns[567]*
do 
  ./do-stage-30.py $file | tr -s " " | sed 's/ $//' > ../gps-stage-30/$file
done

With the result:


magnetic_longitude
Satellite Minimum Maximum
53 0.0042 359.9997
54 0.0002 359.9992
55 0.0000 359.9998
56 0.0003 359.9996
57 0.0006 359.9997
58 0.0014 359.9997
59 0.0000 359.9999
60 0.0003 359.9989
61 0.0007 359.9998
62 0.0010 359.9994
63 0.0014 359.9995
64 0.0003 359.9982
65 0.0018 359.9992
66 0.0009 359.9990
67 0.0000 359.9956
68 0.0007 359.9986
69 0.0057 359.9997
70 0.0304 359.9747
71 0.0005 359.9966
72 0.0014 359.9986
72 0.0007 359.9996


We  checkpoint the stage 30 dataset. The checkpoint file has the MD5 checksum: eba369c61f7974a636358758e1035ae9.

The data table for ns41 and ns48 still looks like this:


Column Variable name type Dim. Description
1 decimal_day double 1 GPS time -- a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years
2 Geographic_Latitude double 1 Latitude of satellite (deg)
3 Geographic_Longitude double 1 Longitude of satellite (deg)
4 Rad_Re double 1 (radius of satellite)/Rearth
5-12 rate_electron_measured double 8 Measured rate (Hz) in each of the 8 BDD electron channels (E1-E8)
13-20 rate_proton_measured double 8 Measured rate (Hz) in each of the 8 BDD proton channels (P1-P8)
21 collection_interval int 1 dosimeter collection period (seconds)
22 year int 1 year (e.g. 2015)
23 decimal_year double 1 decimal year = year + (decimal_day-1.0)/(days in year)
24 svn_number int 1 SVN number of satellite
25 b_coord_radius double 1 radius from earth's dipole axis (earth radii)
26 b_coord_height double 1 height above the earth's dipole equatorial plane (earth radii)
27 magnetic_longitude double 1 Magnetic longitude (degrees)
28 L_shell double 1 L_shell (earth radii) -- I do not clearly understand the origin of the calculation, but it seems to be a dipole field/T-89
29 bfield_ratio double 1 Bsatellite/Bequator
30 local_time double 1 magnetic local time (0-24 hours)
31 b_sattelite double 1 B field at satellite (gauss)
32 b_equator double 1 B field at equator (on this field line I think) (gauss)
33-40 electron_background double 8 estimated background in electron channels E1-E8 (Hz)
41-48 proton_background double 8 estimated background in proton channels P1-P8 (Hz)
49 proton_activity int 1 =1 if there is significant proton activity
50 electron_temperature double 1 electron temperature from a one Maxwellian fit (MeV)
51 electron_density_fit double 1 electron number density from a one Maxwellian fit (cm-3)
52-59 model_counts_electron_fit double 8 E1-E8 rates from the 2-parameter Maxwellian fit to the electron data
60-67 dtc_counts_electron double 8 Dead time corrected electron rates (from data, not fit)
68-97 integral_flux_instrument double 30 (based on 2 parameter Maxwellian fit) integral of electron flux above integral_flux_energy[i] particles/(cm2sec)
98-127 integral_flux_energy double 30 energies for the integral of integral_flux_instrument (MeV)
128-142 electron_diff_flux_energy double 15 energies for the fluxes in electron_diff_flux_energy (MeV)
143-157 electron_diff_flux double 15 (based on 2 parameter Maxwellian fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec))

And for the remaining satellites we now have (I have clarified the descriptions of the b_coord_radius and b_coord_height fields):


Column Variable name type Dim. description
1 decimal_day double 1 GPS time: a decimal number in the range [1, 367) in leap years or [1, 366) otherwise, representing the day of the year (1-Jan 00:00 to 31-Dec 24:00).
2 Geographic_Latitude double 1 Latitude of satellite (°, N +ve)
3 Geographic_Longitude double 1 Longitude of satellite (°, E +ve, measured from Greenwich meridian)
4 Rad_Re double 1 Distance from centre of Earth, in units of Earth radii.
5-15 rate_electron_measured double 11 Measured rate (Hz) in each of the 11 CXD electron channels
16-20 rate_proton_measured double 5 Measured rate (Hz) in each of the 5 CXD proton channels (P1-P5)
21 LEP_thresh double 1 LEP threshold in E1 channels, in keV
22 collection_interval int 1 dosimeter collection period (seconds)
23 year int 1 year (e.g. 2015)
24 decimal_year double 1 decimal year = year + (decimal_day-1.0) / (days in year)
25 SVN int 1 Satellite Vehicle Number
26 b_coord_radius double 1 Distance from dipole axis, in units of Earth radii.
27 b_coord_height double 1 Distance from dipole equatorial plane, in units of Earth radii (N +ve).
28 magnetic_longitude double 1 Magnetic longitude (degrees)
29 L_shell double 1 L shell: McIlwain calculation according to model with T89 External Field, IGRF Internal Field.
30 L_LGM_TS04IGRF double 1 LanlGeoMag L-shell McIlwain calculation, TS04 External Field, IGRF Internal Field.
31 L_LGM_OP77IGRF double 1 LanlGeoMag L-shell McIlwain calculation, OP77 External Field, IGRF Internal Field (not currently filled)
32 L_LGM_T89CDIP double 1 LanlGeoMag L-shell McIlwain calculation, T89 External Field, Centered Dipole Internal Field
33 bfield_ratio double 1 Bsatellite/Bequator
34 local_time double 1 magnetic local time (0-24 hours)
35 utc_lgm double 1 UTC (0-24 hours)
36 b_sattelite double 1 B field at satellite (gauss)
37 b_equator double 1 B field at equator (on this field line I think) (gauss)
38-48 electron_background double 11 estimated background in electron channels E1-E11 (Hz)
49-53 proton_background double 5 estimated background in proton channels P1-P5 (Hz)
54 proton_activity int 1 =1 if there is significant proton activity
55 proton_temperature_fit double 1 characteristic momentum -- R0 in the expression given above (MeV/c)
56 proton_density_fit double 1 N0 parameter in fit to proton flux ((protons/(cm2 sec sr MeV))
57 electron_temperature_fit double 1 electron temperature from a one Maxwellian fit (MeV)
58 electron_density_fit double 1 electron number density from a one Maxwellian fit (cm-3)
59-63 model_counts_proton_fit_pf double 5 P1-P5 rate from proton fit (using proton_temperature_fit, proton_density_fit)
64-74 model_counts_electron_fit double 11 E1-E11 rates from the 9-parameter electron flux model
75-80 proton_integrated_flux_fit double 6 integral of proton flux (based on fit) above 10, 15.85, 25.11, 30, 40, 79.43 MeV (proton kinetic energy)
81-110 integral_flux_instrument double 30 (based on 9 parameter fit) integral of electron flux above integral_flux_energy[i] particles/(cm2 sec)
111-140 integral_flux_energy double 30 energies for the integral of integral_flux_instrument (MeV)
141-155 electron_diff_flux_energy double 15 energies for the fluxes in electron_diff_flux_energy (MeV)
156-170 electron_diff_flux double 15 (based on 9 parameter fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec))
171-179 Efitpars double 9 fit parameters for 9 parameter electron fit


Finally, here are the number of records for each satellite at the end of the processing for stage 30:


Satellite Stage 30 Records
ns41 1,990,340
ns48 1,105,300
ns53 1,331,017
ns54 1,938,603
ns55 1,054,569
ns56 1,680,369
ns57 1,082,028
ns58 1,174,629
ns59 1,513,323
ns60 1,492,497
ns61 1,467,283
ns62 774,976
ns63 651,078
ns64 343,643
ns65 480,017
ns66 446,139
ns67 327,174
ns68 304,964
ns69 262,021
ns70 110,260
ns71 220,694
ns72 181,519
ns73 145,292