We continue to rationalise the data by removing fields that are unused or provide no useful information, and marking invalid values.
Stage 11: remove double precision values that are out of range
In the discussion for stage 9, I briefly mentioned values that are less than the maximum subnormal double precision number representable in accordance with the current (2008) IEEE 754 standard. It turns out that such numbers appear throughout the dataset, so it is useful to replace them with a more explicit indication that data should be ignored. We can do this using the string "NA" to replace all such numbers.
To be more precise, we generate stage 11 by running the following code on the stage 10 files:
#!/usr/bin/env python
# -*- coding: utf8 -*-
# replace all fields that contain "e-31[0123456789]" with "NA"
import re
import sys
filename = sys.argv[1]
with open(filename) as records:
for line in records:
fields = line.split()
for i in xrange(len(fields)):
m = re.search('e-31[0123456789]', fields[i])
if m:
fields[i] = "NA"
newline = " ".join(fields)
print newline
I note that although the maximum subnormal double precision number exceeds e-310, no occurrences of such numbers appear in the dataset; hence the use above of the search term "e-31[0123456789]".
Stage 12: correct values of proton_activity
The documentation for the proton_activity field states only:
=1 if there is significant proton activity
Some expansion is in order. In private communication with the authors of the dataset, I have learned that the definition of significant in the above is:
For NS41 and NS48 (i.e., the Burst Detector-Dosimeter (BDD) instrument): the P1 rate is greater than 10 events per second, and the L shell is greater than 24.99;
For the other satellites (i.e., the Combined X-ray/Dosimeter (CXD) instrument): the P1 rate is greater than 5 events per second.
Although the documentation does not describe the meaning of values other than one, the dataset authors inform me that the intention is that the value zero implies no significant proton activity, and any non-zero value should be treated as if it were unity.
Consequently, it is useful to replace all values other than zero and one for the proton_activity field with the value one.
It turns out that only ns41 has invalid values of proton_activity; these can be replaced by:
awk '$49!="0" && $49!="1" {$49="1"}; {print $0};' ns41 > ../gps-stage-12/ns41
The equivalent command for the CXD instrument, although it is unnecessary with this dataset, is:
awk '$55!="0" && $55!="1" {$55="1"}; {print $0};' ns<nn> > ../gps-stage-12/ns<nn>
Stage 13: Mark bad values associated with L = 0
The ns41 and ns48 satellites have one field that is associated with the L shell (called, reasonably enough, L_shell). Some records have a value of zero for this field, which is obviously not a legitimate value. Such records appear to have illegal values for other fields associated with the magnetic field: b_coord_radius, b_coord_height, magnetic_longitude, bfield_ratio, b_sattelite (sic) and b_equator. Consequently, for these two satellites, if L_shell is zero, all these fields, and L_shell itself, should be set to "NA":
awk '$28=="0.000000e+00" {$25=$26=$27=$28=$29=$31=$32="NA"}; {print $0}' ns<nn> > ../gps-stage-13/ns<nn>
The L_shell field associated with most of the other satellites seems never to be set to zero; however, for ns59, ns60 and ns61, there are records that have many values, including L_shell, filled with zeroes. In these records, it seems that all values after the SVN_number field are set to zero. The records are not completely useless, though, since the data up to and including the SVN_number field appear reasonable. Hence for these spacecraft, all fields after number 25 (which corresponds to SVN_number) should be set to "NA":
awk '$29=="0.000000e+00" {for(i=26;i<=NF;++i)$i="NA"}; {print $0}' ns<nn> > ../gps-stage-13/ns<nn>
Stage 14: Remove redundant field L_LGM_T891GRF
On satellites ns53 to ns73 (i.e., the CXD instrument), the documentation for the L_shell field states: currently this is the same as L_LGM_T89IGRF. Thus, the field L_LGM_T89IGRF is redundant and can be removed without loss of information.
At stage 13, the data table for the CXD satellites looks like this:
Column | Variable name | type | Dim. | description |
1 | decimal_day | double | 1 | GPS time, a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years. |
2 | Geographic_Latitude | double | 1 | Latitude of satellite (deg) |
3 | Geographic_Longitude | double | 1 | Longitude of satellite (deg) |
4 | Rad_Re | double | 1 | (radius of satellite)/Rearth |
5-15 | rate_electron_measured | double | 11 | Measured rate (Hz) in each of the 11 CXD electron channels |
16-20 | rate_proton_measured | double | 5 | Measured rate (Hz) in each of the 5 CXD proton channels (P1-P5) |
21 | LEP_thresh | double | 1 | LEP threshold in E1 channels (0 means low, 1 means high) |
22 | collection_interval | int | 1 | dosimeter collection period (seconds) |
23 | year | int | 1 | year (e.g. 2015) |
24 | decimal_year | double | 1 | decimal year = year + (decimal_day-1.0)/(days in year) |
25 | SVN_number | int | 1 | SVN number of satellite |
26 | b_coord_radius | double | 1 | radius from earth's dipole axis (earth radii) |
27 | b_coord_height | double | 1 | height above the earth's dipole equatorial plane (earth radii) |
28 | magnetic_longitude | double | 1 | Magnetic longitude (degrees) |
29 | L_shell | double | 1 | L_shell (earth radii) – currently this is the same as L_LGM_T89IGRF but this is intended to be our suggested choice for the L shell calculation in the long run. |
30 | L_LGM_TS04IGRF | double | 1 | LanlGeoMag L-shell McIlwain calculation, TS04 External Field, IGRF Internal Field. |
31 | L_LGM_OP77IGRF | double | 1 | LanlGeoMag L-shell McIlwain calculation, OP77 External Field, IGRF Internal Field (not currently filled) |
32 | L_LGM_T89CDIP | double | 1 | LanlGeoMag L-shell McIlwain calculation, T89 External Field, Centered Dipole Internal Field |
33 | L_LGM_T89IGRF | double | 1 | LanlGeoMag L-shell McIlwain calculation, T89 External Field, IGRF Internal Field |
34 | bfield_ratio | double | 1 | Bsatellite/Bequator |
35 | local_time | double | 1 | magnetic local time (0-24 hours) |
36 | utc_lgm | double | 1 | UTC (0-24 hours) |
37 | b_sattelite | double | 1 | B field at satellite (gauss) |
38 | b_equator | double | 1 | B field at equator (on this field line I think) (gauss) |
39-49 | electron_background | double | 11 | estimated background in electron channels E1-E11 (Hz) |
50-54 | proton_background | double | 5 | estimated background in proton channels P1-P5 (Hz) |
55 | proton_activity | int | 1 | =1 if there is significant proton activity |
56 | proton_temperature_fit | double | 1 | characteristic momentum -- R0 in the expression given above (MeV/c) |
57 | proton_density_fit | double | 1 | N0 parameter in fit to proton flux ((protons/(cm2 sec sr MeV)) |
58 | electron_temperature_fit | double | 1 | electron temperature from a one Maxwellian fit (MeV) |
59 | electron_density_fit | double | 1 | electron number density from a one Maxwellian fit (cm-3) |
60-70 | model_counts_electron_fit_pf | double | 11 | E1-E11 rates due to proton background based on proton flux fit -- currently not filled (all -1's) |
71-75 | model_counts_proton_fit_pf | double | 5 | P1-P5 rate from proton fit (using proton_temperature_fit, proton_density_fit) |
76-86 | model_counts_electron_fit | double | 11 | E1-E11 rates from the 9-parameter electron flux model |
87-91 | model_counts_proton_fit | double | 5 | P1-P5 rates from electron background -- currently not filled (all -1's) |
92-97 | proton_integrated_flux_fit | double | 6 | integral of proton flux (based on fit) above 10, 15.85, 25.11, 30, 40, 79.43 MeV (proton kinetic energy) |
98-128 | proton_flux_fit | double | 31 | intended to be proton flux at 31 energies, not filled currently |
129-134 | proton_fluence_fit | double | 6 | not filled currently |
135-164 | integral_flux_instrument | double | 30 | (based on 9 parameter fit) integral of electron flux above integral_flux_energy[i] particles/(cm2 sec) |
165-194 | integral_flux_energy | double | 30 | energies for the integral of integral_flux_instrument (MeV) |
195-209 | electron_diff_flux_energy | double | 15 | energies for the fluxes in electron_diff_flux_energy (MeV) |
210-224 | electron_diff_flux | double | 15 | (based on 9 parameter fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec)) |
225-233 | Efitpars | double | 9 | fit parameters for 9 parameter electron fit |
So we remove L_LGM_T89IGRF:
awk '{$33=""; print $0}' ns<nn> | tr -s " " > ../gps-stage-14/ns<nn>
Leaving us with the following data table (NB: I have edited the description so as to preserve the derivation information for the L_shell field):
Column | Variable name | type | Dim. | description |
1 | decimal_day | double | 1 | GPS time, a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years. |
2 | Geographic_Latitude | double | 1 | Latitude of satellite (deg) |
3 | Geographic_Longitude | double | 1 | Longitude of satellite (deg) |
4 | Rad_Re | double | 1 | (radius of satellite)/Rearth |
5-15 | rate_electron_measured | double | 11 | Measured rate (Hz) in each of the 11 CXD electron channels |
16-20 | rate_proton_measured | double | 5 | Measured rate (Hz) in each of the 5 CXD proton channels (P1-P5) |
21 | LEP_thresh | double | 1 | LEP threshold in E1 channels (0 means low, 1 means high) |
22 | collection_interval | int | 1 | dosimeter collection period (seconds) |
23 | year | int | 1 | year (e.g. 2015) |
24 | decimal_year | double | 1 | decimal year = year + (decimal_day-1.0)/(days in year) |
25 | SVN_number | int | 1 | SVN number of satellite |
26 | b_coord_radius | double | 1 | radius from earth's dipole axis (earth radii) |
27 | b_coord_height | double | 1 | height above the earth's dipole equatorial plane (earth radii) |
28 | magnetic_longitude | double | 1 | Magnetic longitude (degrees) |
29 | L_shell | double | 1 | L shell: McIlwain calculation according to model with T89 External Field, IGRF Internal Field. |
30 | L_LGM_TS04IGRF | double | 1 | LanlGeoMag L-shell McIlwain calculation, TS04 External Field, IGRF Internal Field. |
31 | L_LGM_OP77IGRF | double | 1 | LanlGeoMag L-shell McIlwain calculation, OP77 External Field, IGRF Internal Field (not currently filled) |
32 | L_LGM_T89CDIP | double | 1 | LanlGeoMag L-shell McIlwain calculation, T89 External Field, Centered Dipole Internal Field |
33 | bfield_ratio | double | 1 | Bsatellite/Bequator |
34 | local_time | double | 1 | magnetic local time (0-24 hours) |
35 | utc_lgm | double | 1 | UTC (0-24 hours) |
36 | b_sattelite | double | 1 | B field at satellite (gauss) |
37 | b_equator | double | 1 | B field at equator (on this field line I think) (gauss) |
38-48 | electron_background | double | 11 | estimated background in electron channels E1-E11 (Hz) |
49-53 | proton_background | double | 5 | estimated background in proton channels P1-P5 (Hz) |
54 | proton_activity | int | 1 | =1 if there is significant proton activity |
55 | proton_temperature_fit | double | 1 | characteristic momentum -- R0 in the expression given above (MeV/c) |
56 | proton_density_fit | double | 1 | N0 parameter in fit to proton flux ((protons/(cm2 sec sr MeV)) |
57 | electron_temperature_fit | double | 1 | electron temperature from a one Maxwellian fit (MeV) |
58 | electron_density_fit | double | 1 | electron number density from a one Maxwellian fit (cm-3) |
59-69 | model_counts_electron_fit_pf | double | 11 | E1-E11 rates due to proton background based on proton flux fit -- currently not filled (all -1's) |
70-74 | model_counts_proton_fit_pf | double | 5 | P1-P5 rate from proton fit (using proton_temperature_fit, proton_density_fit) |
75-85 | model_counts_electron_fit | double | 11 | E1-E11 rates from the 9-parameter electron flux model |
86-90 | model_counts_proton_fit | double | 5 | P1-P5 rates from electron background -- currently not filled (all -1's) |
91-96 | proton_integrated_flux_fit | double | 6 | integral of proton flux (based on fit) above 10, 15.85, 25.11, 30, 40, 79.43 MeV (proton kinetic energy) |
97-127 | proton_flux_fit | double | 31 | intended to be proton flux at 31 energies, not filled currently |
128-133 | proton_fluence_fit | double | 6 | not filled currently |
134-163 | integral_flux_instrument | double | 30 | (based on 9 parameter fit) integral of electron flux above integral_flux_energy[i] particles/(cm2 sec) |
164-193 | integral_flux_energy | double | 30 | energies for the integral of integral_flux_instrument (MeV) |
194-208 | electron_diff_flux_energy | double | 15 | energies for the fluxes in electron_diff_flux_energy (MeV) |
209-223 | electron_diff_flux | double | 15 | (based on 9 parameter fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec)) |
224-232 | Efitpars | double | 9 | fit parameters for 9 parameter electron fit |
Stage 15: Remove proton_fluence_fit
The documentation for the CXD field proton_fluence_fit says simply:
128-133 | proton_fluence_fit | double | 6 | not filled currently |
I note that at least some records appear to have plausible data in these fields, but I take the documentation at its word -- after all, these data constitute an official release from LANL, and are provided by NOAA; although it is manifest that the data were not completely corrected before release, users must be able to trust the documentation. Further, and more pragmatically, to second-guess the documentation would render the task of trying to establish a ready-to-process version of the data almost endless.
So we remove the fields corresponding to the proton_fluence_fit variable:
awk '{$128=$129=$130=$131=$132=$133=""; print $0}' ns<nn> | tr -s " " > ../gps-stage-15/ns<nn>
Note that there is no proton_fluence_fit field for the BBD instrument, so we make no changes to the files for ns41 and ns48.
At this point, we checkpoint the data again. The checkpointed data file has the MD5 checksum 06ddd2268edb93c2c040d604a4b9fa7a.
The data table for ns41 and ns48 is unchanged from the last checkpoint (at stage 10):
Column | Variable name | type | Dim. | Description |
1 | decimal_day | double | 1 | GPS time -- a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years |
2 | Geographic_Latitude | double | 1 | Latitude of satellite (deg) |
3 | Geographic_Longitude | double | 1 | Longitude of satellite (deg) |
4 | Rad_Re | double | 1 | (radius of satellite)/Rearth |
5-12 | rate_electron_measured | double | 8 | Measured rate (Hz) in each of the 8 BDD electron channels (E1-E8) |
13-20 | rate_proton_measured | double | 8 | Measured rate (Hz) in each of the 8 BDD proton channels (P1-P8) |
21 | collection_interval | int | 1 | dosimeter collection period (seconds) |
22 | year | int | 1 | year (e.g. 2015) |
23 | decimal_year | double | 1 | decimal year = year + (decimal_day-1.0)/(days in year) |
24 | svn_number | int | 1 | SVN number of satellite |
25 | b_coord_radius | double | 1 | radius from earth's dipole axis (earth radii) |
26 | b_coord_height | double | 1 | height above the earth's dipole equatorial plane (earth radii) |
27 | magnetic_longitude | double | 1 | Magnetic longitude (degrees) |
28 | L_shell | double | 1 | L_shell (earth radii) -- I do not clearly understand the origin of the calculation, but it seems to be a dipole field/T-89 |
29 | bfield_ratio | double | 1 | Bsatellite/Bequator |
30 | local_time | double | 1 | magnetic local time (0-24 hours) |
31 | b_sattelite | double | 1 | B field at satellite (gauss) |
32 | b_equator | double | 1 | B field at equator (on this field line I think) (gauss) |
33-40 | electron_background | double | 8 | estimated background in electron channels E1-E8 (Hz) |
41-48 | proton_background | double | 8 | estimated background in proton channels P1-P8 (Hz) |
49 | proton_activity | int | 1 | =1 if there is significant proton activity |
50 | electron_temperature | double | 1 | electron temperature from a one Maxwellian fit (MeV) |
51 | electron_density_fit | double | 1 | electron number density from a one Maxwellian fit (cm-3) |
52-59 | model_counts_electron_fit | double | 8 | E1-E8 rates from the 2-parameter Maxwellian fit to the electron data |
60-67 | dtc_counts_electron | double | 8 | Dead time corrected electron rates (from data, not fit) |
68-97 | integral_flux_instrument | double | 30 | (based on 2 parameter Maxwellian fit) integral of electron flux above integral_flux_energy[i] particles/(cm2sec) |
98-127 | integral_flux_energy | double | 30 | energies for the integral of integral_flux_instrument (MeV) |
128-142 | electron_diff_flux_energy | double | 15 | energies for the fluxes in electron_diff_flux_energy (MeV) |
143-157 | electron_diff_flux | double | 15 | (based on 2 parameter Maxwellian fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec)) |
The data table for ns53 to ns73 now looks like this:
Column | Variable name | type | Dim. | description |
1 | decimal_day | double | 1 | GPS time, a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years. |
2 | Geographic_Latitude | double | 1 | Latitude of satellite (deg) |
3 | Geographic_Longitude | double | 1 | Longitude of satellite (deg) |
4 | Rad_Re | double | 1 | (radius of satellite)/Rearth |
5-15 | rate_electron_measured | double | 11 | Measured rate (Hz) in each of the 11 CXD electron channels |
16-20 | rate_proton_measured | double | 5 | Measured rate (Hz) in each of the 5 CXD proton channels (P1-P5) |
21 | LEP_thresh | double | 1 | LEP threshold in E1 channels (0 means low, 1 means high) |
22 | collection_interval | int | 1 | dosimeter collection period (seconds) |
23 | year | int | 1 | year (e.g. 2015) |
24 | decimal_year | double | 1 | decimal year = year + (decimal_day-1.0)/(days in year) |
25 | SVN_number | int | 1 | SVN number of satellite |
26 | b_coord_radius | double | 1 | radius from earth's dipole axis (earth radii) |
27 | b_coord_height | double | 1 | height above the earth's dipole equatorial plane (earth radii) |
28 | magnetic_longitude | double | 1 | Magnetic longitude (degrees) |
29 | L_shell | double | 1 | L shell: McIlwain calculation according to model with T89 External Field, IGRF Internal Field. |
30 | L_LGM_TS04IGRF | double | 1 | LanlGeoMag L-shell McIlwain calculation, TS04 External Field, IGRF Internal Field. |
31 | L_LGM_OP77IGRF | double | 1 | LanlGeoMag L-shell McIlwain calculation, OP77 External Field, IGRF Internal Field (not currently filled) |
32 | L_LGM_T89CDIP | double | 1 | LanlGeoMag L-shell McIlwain calculation, T89 External Field, Centered Dipole Internal Field |
33 | bfield_ratio | double | 1 | Bsatellite/Bequator |
34 | local_time | double | 1 | magnetic local time (0-24 hours) |
35 | utc_lgm | double | 1 | UTC (0-24 hours) |
36 | b_sattelite | double | 1 | B field at satellite (gauss) |
37 | b_equator | double | 1 | B field at equator (on this field line I think) (gauss) |
38-48 | electron_background | double | 11 | estimated background in electron channels E1-E11 (Hz) |
49-53 | proton_background | double | 5 | estimated background in proton channels P1-P5 (Hz) |
54 | proton_activity | int | 1 | =1 if there is significant proton activity |
55 | proton_temperature_fit | double | 1 | characteristic momentum -- R0 in the expression given above (MeV/c) |
56 | proton_density_fit | double | 1 | N0 parameter in fit to proton flux ((protons/(cm2 sec sr MeV)) |
57 | electron_temperature_fit | double | 1 | electron temperature from a one Maxwellian fit (MeV) |
58 | electron_density_fit | double | 1 | electron number density from a one Maxwellian fit (cm-3) |
59-69 | model_counts_electron_fit_pf | double | 11 | E1-E11 rates due to proton background based on proton flux fit -- currently not filled (all -1's) |
70-74 | model_counts_proton_fit_pf | double | 5 | P1-P5 rate from proton fit (using proton_temperature_fit, proton_density_fit) |
75-85 | model_counts_electron_fit | double | 11 | E1-E11 rates from the 9-parameter electron flux model |
86-90 | model_counts_proton_fit | double | 5 | P1-P5 rates from electron background -- currently not filled (all -1's) |
91-96 | proton_integrated_flux_fit | double | 6 | integral of proton flux (based on fit) above 10, 15.85, 25.11, 30, 40, 79.43 MeV (proton kinetic energy) |
97-127 | proton_flux_fit | double | 31 | intended to be proton flux at 31 energies, not filled currently |
128-157 | integral_flux_instrument | double | 30 | (based on 9 parameter fit) integral of electron flux above integral_flux_energy[i] particles/(cm2 sec) |
158-187 | integral_flux_energy | double | 30 | energies for the integral of integral_flux_instrument (MeV) |
188-202 | electron_diff_flux_energy | double | 15 | energies for the fluxes in electron_diff_flux_energy (MeV) |
203-217 | electron_diff_flux | double | 15 | (based on 9 parameter fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec)) |
218-226 | Efitpars | double | 9 | fit parameters for 9 parameter electron fit |