We continue to rationalise the data by removing fields that are unused or provide no useful information, and marking invalid values.
Stage 21: mark invalid values of L-shell fields for the CXD experiment
I have been informed in a private communication that, for satellites ns53 through ns73, values of the L_shell (i.e., L_LGM_T89IGRF; see stage 14) L_LGM_TS04IGRF, L_LGM_OP77IGRF and L_LGM_T89CDIP fields that are set to 999 indicate a value that is "too large to calculate"; at the very least, L-shell values more than ten or so would in any case seem to be too large to be useful. So we convert all the instances of the value 999 to NA:
for file in ns[567]*
do
awk '{for(i=29;i<=32;++i) if ($i=="9.990000e+02") $i="NA"; {print $0}}' $file > ../gps-stage-21/$file
done
Sanity checks
At this point we have modified what seem to me to be the most obvious problems with the datasets. However, the unfortunately large number of modifications that have been necessary suggest that it is worthwhile to perform basic sanity checks on the values of all the fields, in an attempt to uncover and correct other issues.
We start by considering, one at a time, the fields in the CXD datasets.
decimal_day
The documentation for decimal_day states:
Column | Variable name | type | Dim. | description |
1 | decimal_day | double | 1 | GPS time, a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years. |
Reviewing the minimum and maximum values, we find:
decimal_day | ||
---|---|---|
Satellite | Minimum | Maximum |
53 | 1.000394 | 366.998889 |
54 | 1.000405 | 366.999583 |
55 | 1.000000 | 366.998194 |
56 | 1.000972 | 366.998472 |
57 | 1.000000 | 366.997917 |
58 | 1.001250 | 366.999306 |
59 | 1.000532 | 366.999861 |
60 | 1.000278 | 366.998322 |
61 | 1.001782 | 366.999306 |
62 | 1.001424 | 366.999387 |
63 | 1.001157 | 366.999873 |
64 | 1.001215 | 365.999803 |
65 | 1.001389 | 366.998611 |
66 | 1.001667 | 366.998889 |
67 | 1.001458 | 366.998681 |
68 | 1.000660 | 366.998171 |
69 | 1.000868 | 366.999896 |
70 | 45.002049 | 366.997292 |
71 | 1.002720 | 366.999954 |
72 | 1.001470 | 366.998472 |
72 | 1.001910 | 366.999132 |
Pending more detailed review, these values look not unreasonable; there is therefore no need to change or mark any of the values.
However, we can edit the description of the field in order to make it a little clearer:
Column | Variable name | type | Dim. | description |
1 | decimal_day | double | 1 | GPS time: a decimal number in the range [1, 367) in leap years or [1, 366) otherwise, representing the day of the year (1-Jan 00:00 to 31-Dec 24:00). |
Geographic_Latitude
The documentation for Geographic_Latitude states:
Column | Variable name | type | Dim. | description |
2 | Geographic_Latitude | double | 1 | Latitude of satellite (deg) |
Reviewing the minimum and maximum values, we find:
Geographic_Latitude | ||
---|---|---|
Satellite | Minimum | Maximum |
53 | -5.607510e+01 | 5.607590e+01 |
54 | -5.527530e+01 | 5.527490e+01 |
55 | -5.488720e+01 | 5.488530e+01 |
56 | -5.678880e+01 | 5.679100e+01 |
57 | -5.613660e+01 | 5.613600e+01 |
58 | -5.676120e+01 | 5.676080e+01 |
59 | -5.591630e+01 | 5.591610e+01 |
60 | -5.567280e+01 | 5.567290e+01 |
61 | -5.500000e+01 | 5.486250e+01 |
62 | -5.608950e+01 | 5.608750e+01 |
63 | -5.537850e+01 | 5.537940e+01 |
64 | -5.499830e+01 | 5.499820e+01 |
65 | -5.497930e+01 | 5.497810e+01 |
66 | -5.578240e+01 | 5.578390e+01 |
67 | -5.536860e+01 | 5.536670e+01 |
68 | -5.498410e+01 | 5.498420e+01 |
69 | -5.500030e+01 | 5.500240e+01 |
70 | -5.502090e+01 | 5.501900e+01 |
71 | -5.505770e+01 | 5.505750e+01 |
72 | -5.531860e+01 | 5.531810e+01 |
72 | -5.501500e+01 | 5.501500e+01 |
These values look reasonable (i.e., the absolute values of the maxima and minima are roughly equal; and, of course they correspond to the inclination of the satellites). However, the data could be more sensibly reported as numbers with four deicmal places: this would be both more compact and more accurately represent the precision of the numbers. So that suggests:
Stage 22: reformat values of Geographic_Latitude
We can accomplish the reformatting easily, by applying the following script, do-stage-22.py, to each file:
#!/usr/bin/env python
# -*- coding: utf8 -*-
import re
import sys
filename = sys.argv[1]
with open(filename) as records:
for line in records:
fields = line.split()
value_str = fields[1] # latitude
value = float(value_str)
new_value_str = '{:10.4f}'.format(value)
fields[1] = new_value_str
newline = " ".join(fields)
print newline
Applying this to each file:
for file in ns[567]*
do
./do-stage-22.py $file | tr -s " " | sed 's/ $//' > ../gps-stage-22/$file
done
The resulting maxima and minima now look like this:
Geographic_Latitude | ||
---|---|---|
Satellite | Minimum | Maximum |
53 | -56.0751 | 56.0759 |
54 | -55.2753 | 55.2749 |
55 | -54.8872 | 54.8853 |
56 | -56.7888 | 56.7910 |
57 | -56.1366 | 56.1360 |
58 | -56.7612 | 56.7608 |
59 | -55.9163 | 55.9161 |
60 | -55.6728 | 55.6729 |
61 | -55.0000 | 54.8625 |
62 | -56.0895 | 56.0875 |
63 | -55.3785 | 55.3794 |
64 | -54.9983 | 54.9982 |
65 | -54.9793 | 54.9781 |
66 | -55.7824 | 55.7839 |
67 | -55.3686 | 55.3667 |
68 | -54.9841 | 54.9842 |
69 | -55.0003 | 55.0024 |
70 | -55.0209 | 55.0190 |
71 | -55.0577 | 55.0575 |
72 | -55.3186 | 55.3181 |
72 | -55.0150 | 55.0150 |
We can also improve the clarity of the description:
Column | Variable name | type | Dim. | description |
2 | Geographic_Latitude | double | 1 | Latitude of satellite (°, N +ve) |
Geographic_Longitude
The documentation for Geographic_Longitude states
Column | Variable name | type | Dim. | description |
3 | Geographic_Longitude | double | 1 | Longitude of satellite (deg) |
Reviewing the minimum and maximum values, we find:
Geographic_Longitude | ||
---|---|---|
Satellite | Minimum | Maximum |
53 | -1.799853e+02 | 3.599998e+02 |
54 | -1.799993e+02 | 3.599999e+02 |
55 | 6.000000e-04 | 3.599999e+02 |
56 | -1.799998e+02 | 3.600000e+02 |
57 | 1.000000e-04 | 3.599983e+02 |
58 | -1.798043e+02 | 3.599996e+02 |
59 | -1.799988e+02 | 3.599999e+02 |
60 | -1.800000e+02 | 3.600000e+02 |
61 | -1.799970e+02 | 3.599998e+02 |
62 | 2.000000e-03 | 3.599986e+02 |
63 | 1.700000e-03 | 3.599990e+02 |
64 | 0.000000e+00 | 3.599998e+02 |
65 | 9.000000e-04 | 3.599995e+02 |
66 | 2.900000e-03 | 3.599995e+02 |
67 | 5.000000e-04 | 3.599987e+02 |
68 | 2.000000e-04 | 3.600000e+02 |
69 | 5.000000e-04 | 3.599997e+02 |
70 | 2.500000e-03 | 3.599793e+02 |
71 | 3.900000e-03 | 3.599997e+02 |
72 | 2.100000e-03 | 3.599992e+02 |
72 | 1.000000e-03 | 3.599954e+02 |
These numbers raise two issues:
1. Not all values are correctly mapped to the principal domain (i.e., [0, 360))
2. The accuracy to which a value is reported is a function of its value.
The second of these is not unreasonable when reporting some kinds of values (for example, counts), but it makes no sense when applied to geographic longitude. (If this isn't obvious, think about the difference in accuracy for the two values 0+ε and 0-ε; or, in the alternative, consider that the reported accuracy will change under a mere rotation transformation.)
Thus we are led to:
Stage 23: reformat values of Geographic_Longitude
We can accomplish the reformatting easily, by applying the following script, do-stage-23.py, to each file:
#!/usr/bin/env python
# -*- coding: utf8 -*-
import re
import sys
filename = sys.argv[1]
with open(filename) as records:
for line in records:
fields = line.split()
value_str = fields[2] # longitude
value = float(value_str)
if value < 0:
value += 360
if value >= 360:
value -= 360
new_value_str = '{:9.4f}'.format(value)
fields[2] = new_value_str
newline = " ".join(fields)
print newline
Applying this to each file:
for file in ns[567]*
do
./do-stage-23.py $file | tr -s " " | sed 's/ $//' > ../gps-stage-23/$file
done
The resulting maxima and minima now look like this:
Geographic_Longitude | ||
---|---|---|
Satellite | Minimum | Maximum |
53 | 0.0000 | 359.9998 |
54 | 0.0002 | 359.9999 |
55 | 0.0006 | 359.9999 |
56 | 0.0000 | 359.9999 |
57 | 0.0001 | 359.9983 |
58 | 0.0010 | 359.9996 |
59 | 0.0001 | 359.9999 |
60 | 0.0000 | 359.9999 |
61 | 0.0015 | 359.9998 |
62 | 0.0020 | 359.9986 |
63 | 0.0017 | 359.9990 |
64 | 0.0000 | 359.9998 |
65 | 0.0009 | 359.9995 |
66 | 0.0029 | 359.9995 |
67 | 0.0005 | 359.9987 |
68 | 0.0000 | 359.9965 |
69 | 0.0005 | 359.9997 |
70 | 0.0025 | 359.9793 |
71 | 0.0039 | 359.9997 |
72 | 0.0021 | 359.9992 |
72 | 0.0010 | 359.9954 |
We can also improve the clarity of the description:
Column | Variable name | type | Dim. | description |
3 | Geographic_Longitude | double | 1 | Longitude of satellite (°, E +ve, measured from Greenwich meridian) |
Rad_Re
The documentation for Rad_Re states:
Column | Variable name | type | Dim. | description |
4 | Rad_Re | double | 1 | (radius of satellite)/Rearth |
Reviewing the minimum and maximum values, we find:
Rad_Re | ||
---|---|---|
Satellite | Minimum | Maximum |
53 | 4.120330e+00 | 4.217263e+00 |
54 | 4.096529e+00 | 4.241498e+00 |
55 | 4.131586e+00 | 4.206511e+00 |
56 | 4.131644e+00 | 4.206146e+00 |
57 | 4.152958e+00 | 4.184506e+00 |
58 | 4.142916e+00 | 4.194755e+00 |
59 | 4.120665e+00 | 4.218718e+00 |
60 | 4.122245e+00 | 4.215404e+00 |
61 | 4.100667e+00 | 4.237281e+00 |
62 | 4.144375e+00 | 4.209335e+00 |
63 | 4.143611e+00 | 4.213319e+00 |
64 | 4.158991e+00 | 4.213424e+00 |
65 | 4.146987e+00 | 4.206427e+00 |
66 | 4.151798e+00 | 4.186115e+00 |
67 | 4.166345e+00 | 4.215303e+00 |
68 | 4.165405e+00 | 4.214924e+00 |
69 | 4.164891e+00 | 4.173357e+00 |
70 | 4.166151e+00 | 4.211028e+00 |
71 | 4.162719e+00 | 4.175120e+00 |
72 | 4.159129e+00 | 4.178379e+00 |
72 | 4.160420e+00 | 4.217236e+00 |
While these values look fine, the use of scientific notation wastes space. Accordingly:
Stage 24: reformat values of Rad_Re
We can accomplish the reformatting easily, by applying the following script, do-stage-24.py, to each file:
#!/usr/bin/env python
# -*- coding: utf8 -*-
import re
import sys
filename = sys.argv[1]
FIELD_NR = 3
with open(filename) as records:
for line in records:
fields = line.split()
value_str = fields[FIELD_NR] # Rad_Re
value = float(value_str)
new_value_str = '{:10.6f}'.format(value)
fields[FIELD_NR] = new_value_str
newline = " ".join(fields)
print newline
Applying this to each file:
for file in ns[567]*
do
./do-stage-24.py $file | tr -s " " | sed 's/ $//' > ../gps-stage-24/$file
done
The resulting maxima and minima look like this:
Rad_Re | ||
---|---|---|
Satellite | Minimum | Maximum |
53 | 4.120330 | 4.217263 |
54 | 4.096529 | 4.241498 |
55 | 4.131586 | 4.206511 |
56 | 4.131644 | 4.206146 |
57 | 4.152958 | 4.184506 |
58 | 4.142916 | 4.194755 |
59 | 4.120665 | 4.218718 |
60 | 4.122245 | 4.215404 |
61 | 4.100667 | 4.237281 |
62 | 4.144375 | 4.209335 |
63 | 4.143611 | 4.213319 |
64 | 4.158991 | 4.213424 |
65 | 4.146987 | 4.206427 |
66 | 4.151798 | 4.186115 |
67 | 4.166345 | 4.215303 |
68 | 4.165405 | 4.214924 |
69 | 4.164891 | 4.173357 |
70 | 4.166151 | 4.211028 |
71 | 4.162719 | 4.175120 |
72 | 4.159129 | 4.178379 |
72 | 4.160420 | 4.217236 |
We can also improve the clarity of the description:
Column | Variable name | type | Dim. | description |
4 | Rad_Re | double | 1 | Distance from centre of Earth, in units of Earth radii. |
Stage 25: reformat values of LEP_thresh
The documentation for LEP_thresh states:
Column | Variable name | type | Dim. | description |
21 | LEP_thresh | double | 1 | LEP threshold in E1 channels (0 means low, 1 means high) |
Checking the actual values in the data files shows that the values are, as expected, always either zero or one. However, as the documentation states, the values are stores as doubles. It seems silly to store a simple integer value as a double, so we can reformat this field so that it explicitly contains an integer. Further, I have received via private communication an explanation of the meaning of zero and one in this field: zero means that the threshold is 75 keV, and one means that it is 120 keV. So we can write these values (as integers) directly into the data files:
for file in ns[567]*
do
awk '$21=="0.000000e+00" {$21="75"}; $21=="1.000000e+00" {$21="120"}; \
{print $0};' $file > ../gps-stage-25/$file
done
We can also amend the documentation for the field:
Column | Variable name | type | Dim. | description |
21 | LEP_thresh | int | 1 | LEP threshold in E1 channels, in keV |
Time to checkpoint the dataset again. The checkpoint file has the MD5 checksum: 00b632a911316fa9e44093450f56c3cb.
The data table for ns41 and ns48 still looks like this:
Column | Variable name | type | Dim. | Description |
1 | decimal_day | double | 1 | GPS time -- a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years |
2 | Geographic_Latitude | double | 1 | Latitude of satellite (deg) |
3 | Geographic_Longitude | double | 1 | Longitude of satellite (deg) |
4 | Rad_Re | double | 1 | (radius of satellite)/Rearth |
5-12 | rate_electron_measured | double | 8 | Measured rate (Hz) in each of the 8 BDD electron channels (E1-E8) |
13-20 | rate_proton_measured | double | 8 | Measured rate (Hz) in each of the 8 BDD proton channels (P1-P8) |
21 | collection_interval | int | 1 | dosimeter collection period (seconds) |
22 | year | int | 1 | year (e.g. 2015) |
23 | decimal_year | double | 1 | decimal year = year + (decimal_day-1.0)/(days in year) |
24 | svn_number | int | 1 | SVN number of satellite |
25 | b_coord_radius | double | 1 | radius from earth's dipole axis (earth radii) |
26 | b_coord_height | double | 1 | height above the earth's dipole equatorial plane (earth radii) |
27 | magnetic_longitude | double | 1 | Magnetic longitude (degrees) |
28 | L_shell | double | 1 | L_shell (earth radii) -- I do not clearly understand the origin of the calculation, but it seems to be a dipole field/T-89 |
29 | bfield_ratio | double | 1 | Bsatellite/Bequator |
30 | local_time | double | 1 | magnetic local time (0-24 hours) |
31 | b_sattelite | double | 1 | B field at satellite (gauss) |
32 | b_equator | double | 1 | B field at equator (on this field line I think) (gauss) |
33-40 | electron_background | double | 8 | estimated background in electron channels E1-E8 (Hz) |
41-48 | proton_background | double | 8 | estimated background in proton channels P1-P8 (Hz) |
49 | proton_activity | int | 1 | =1 if there is significant proton activity |
50 | electron_temperature | double | 1 | electron temperature from a one Maxwellian fit (MeV) |
51 | electron_density_fit | double | 1 | electron number density from a one Maxwellian fit (cm-3) |
52-59 | model_counts_electron_fit | double | 8 | E1-E8 rates from the 2-parameter Maxwellian fit to the electron data |
60-67 | dtc_counts_electron | double | 8 | Dead time corrected electron rates (from data, not fit) |
68-97 | integral_flux_instrument | double | 30 | (based on 2 parameter Maxwellian fit) integral of electron flux above integral_flux_energy[i] particles/(cm2sec) |
98-127 | integral_flux_energy | double | 30 | energies for the integral of integral_flux_instrument (MeV) |
128-142 | electron_diff_flux_energy | double | 15 | energies for the fluxes in electron_diff_flux_energy (MeV) |
143-157 | electron_diff_flux | double | 15 | (based on 2 parameter Maxwellian fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec)) |
And for the remaining satellites we now have:
Column | Variable name | type | Dim. | description |
---|---|---|---|---|
1 | decimal_day | double | 1 | GPS time: a decimal number in the range [1, 367) in leap years or [1, 366) otherwise, representing the day of the year (1-Jan 00:00 to 31-Dec 24:00). |
2 | Geographic_Latitude | double | 1 | Latitude of satellite (°, N +ve) |
3 | Geographic_Longitude | double | 1 | Longitude of satellite (°, E +ve, measured from Greenwich meridian) |
4 | Rad_Re | double | 1 | Distance from centre of Earth, in units of Earth radii. |
5-15 | rate_electron_measured | double | 11 | Measured rate (Hz) in each of the 11 CXD electron channels |
16-20 | rate_proton_measured | double | 5 | Measured rate (Hz) in each of the 5 CXD proton channels (P1-P5) |
21 | LEP_thresh | double | 1 | LEP threshold in E1 channels, in keV |
22 | collection_interval | int | 1 | dosimeter collection period (seconds) |
23 | year | int | 1 | year (e.g. 2015) |
24 | decimal_year | double | 1 | decimal year = year + (decimal_day-1.0)/(days in year) |
25 | SVN_number | int | 1 | SVN number of satellite |
26 | b_coord_radius | double | 1 | radius from earth's dipole axis (earth radii) |
27 | b_coord_height | double | 1 | height above the earth's dipole equatorial plane (earth radii) |
28 | magnetic_longitude | double | 1 | Magnetic longitude (degrees) |
29 | L_shell | double | 1 | L shell: McIlwain calculation according to model with T89 External Field, IGRF Internal Field. |
30 | L_LGM_TS04IGRF | double | 1 | LanlGeoMag L-shell McIlwain calculation, TS04 External Field, IGRF Internal Field. |
31 | L_LGM_OP77IGRF | double | 1 | LanlGeoMag L-shell McIlwain calculation, OP77 External Field, IGRF Internal Field (not currently filled) |
32 | L_LGM_T89CDIP | double | 1 | LanlGeoMag L-shell McIlwain calculation, T89 External Field, Centered Dipole Internal Field |
33 | bfield_ratio | double | 1 | Bsatellite/Bequator |
34 | local_time | double | 1 | magnetic local time (0-24 hours) |
35 | utc_lgm | double | 1 | UTC (0-24 hours) |
36 | b_sattelite | double | 1 | B field at satellite (gauss) |
37 | b_equator | double | 1 | B field at equator (on this field line I think) (gauss) |
38-48 | electron_background | double | 11 | estimated background in electron channels E1-E11 (Hz) |
49-53 | proton_background | double | 5 | estimated background in proton channels P1-P5 (Hz) |
54 | proton_activity | int | 1 | =1 if there is significant proton activity |
55 | proton_temperature_fit | double | 1 | characteristic momentum -- R0 in the expression given above (MeV/c) |
56 | proton_density_fit | double | 1 | N0 parameter in fit to proton flux ((protons/(cm2 sec sr MeV)) |
57 | electron_temperature_fit | double | 1 | electron temperature from a one Maxwellian fit (MeV) |
58 | electron_density_fit | double | 1 | electron number density from a one Maxwellian fit (cm-3) |
59-63 | model_counts_proton_fit_pf | double | 5 | P1-P5 rate from proton fit (using proton_temperature_fit, proton_density_fit) |
64-74 | model_counts_electron_fit | double | 11 | E1-E11 rates from the 9-parameter electron flux model |
75-80 | proton_integrated_flux_fit | double | 6 | integral of proton flux (based on fit) above 10, 15.85, 25.11, 30, 40, 79.43 MeV (proton kinetic energy) |
81-110 | integral_flux_instrument | double | 30 | (based on 9 parameter fit) integral of electron flux above integral_flux_energy[i] particles/(cm2 sec) |
111-140 | integral_flux_energy | double | 30 | energies for the integral of integral_flux_instrument (MeV) |
141-155 | electron_diff_flux_energy | double | 15 | energies for the fluxes in electron_diff_flux_energy (MeV) |
156-170 | electron_diff_flux | double | 15 | (based on 9 parameter fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec)) |
171-179 | Efitpars | double | 9 | fit parameters for 9 parameter electron fit |