2017-05-29

Further Rationalisation of the LANL GPS Charged-Particle Dataset

Earlier posts in this series:

We continue to rationalise the data by removing fields that are unused or provide no useful information, and marking invalid values.

Stage 11: remove double precision values that are out of range


In the discussion for stage 9, I briefly mentioned values that are less than the maximum subnormal double precision number representable in accordance with the current (2008) IEEE 754 standard. It turns out that such numbers appear throughout the dataset, so it is useful to replace them with a more explicit indication that data should be ignored. We can do this using the string "NA" to replace all such numbers.

To be more precise, we generate stage 11 by running the following code on the stage 10 files:

#!/usr/bin/env python
# -*- coding: utf8 -*-

# replace all fields that contain "e-31[0123456789]" with "NA"

import re
import sys

filename = sys.argv[1]

with open(filename) as records:
  for line in records:
    fields = line.split()
   
    for i in xrange(len(fields)):

      m = re.search('e-31[0123456789]', fields[i])
      if m:
        fields[i] = "NA"
   
    newline = " ".join(fields)
   
    print newline


I note that although the maximum subnormal double precision number exceeds e-310, no occurrences of such numbers appear in the dataset; hence the use above of the search term "e-31[0123456789]".

Stage 12: correct values of proton_activity


The documentation for the proton_activity field states only: 
  =1 if there is significant proton activity 

Some expansion is in order. In private communication with the authors of the dataset, I have learned that the definition of significant in the above is:
  For NS41 and NS48 (i.e., the Burst Detector-Dosimeter (BDD) instrument): the P1 rate is greater than 10 events per second, and the L shell is greater than 24.99;
  For the other satellites (i.e., the Combined X-ray/Dosimeter (CXD) instrument): the P1 rate is greater than 5 events per second.

Although the documentation does not describe the meaning of values other than one, the dataset authors inform me that the intention is that the value zero implies no significant proton activity, and any non-zero value should be treated as if it were unity.

Consequently, it is useful to replace all values other than zero and one for the proton_activity field with the value one.

It turns out that only ns41 has invalid values of proton_activity; these can be replaced by:

awk '$49!="0" && $49!="1" {$49="1"}; {print $0};'  ns41 > ../gps-stage-12/ns41

The equivalent command for the CXD instrument, although it is unnecessary with this dataset, is:

awk '$55!="0" && $55!="1" {$55="1"}; {print $0};'  ns<nn> > ../gps-stage-12/ns<nn>

Stage 13: Mark bad values associated with L = 0


The ns41 and ns48 satellites have one field that is associated with the L shell (called, reasonably enough, L_shell). Some records have a value of zero for this field, which is obviously not a legitimate value. Such records appear to have illegal values for other fields associated with the magnetic field: b_coord_radius, b_coord_height, magnetic_longitude, bfield_ratio, b_sattelite (sic) and b_equator. Consequently, for these two satellites, if L_shell is zero, all these fields, and L_shell itself, should be set to "NA":

awk '$28=="0.000000e+00" {$25=$26=$27=$28=$29=$31=$32="NA"}; {print $0}' ns<nn> > ../gps-stage-13/ns<nn>

The L_shell field associated with most of the other satellites seems never to be set to zero; however, for ns59, ns60 and ns61, there are records that have many values, including L_shell, filled with zeroes. In these records, it seems that all values after the SVN_number field are set to zero. The records are not completely useless, though, since the data up to and including the SVN_number field appear reasonable. Hence for these spacecraft, all fields after number 25 (which corresponds to SVN_number) should be set to "NA":

awk '$29=="0.000000e+00" {for(i=26;i<=NF;++i)$i="NA"}; {print $0}' ns<nn> > ../gps-stage-13/ns<nn>

Stage 14: Remove redundant field L_LGM_T891GRF


On satellites ns53 to ns73 (i.e., the CXD instrument), the documentation for the L_shell field states: currently this is the same as L_LGM_T89IGRF. Thus, the field L_LGM_T89IGRF is redundant and can be removed without loss of information.

At stage 13, the data table for the CXD satellites looks like this:

Column Variable name type Dim. description
1 decimal_day double 1 GPS time, a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years.
2 Geographic_Latitude double 1 Latitude of satellite (deg)
3 Geographic_Longitude double 1 Longitude of satellite (deg)
4 Rad_Re double 1 (radius of satellite)/Rearth
5-15 rate_electron_measured double 11 Measured rate (Hz) in each of the 11 CXD electron channels
16-20 rate_proton_measured double 5 Measured rate (Hz) in each of the 5 CXD proton channels (P1-P5)
21 LEP_thresh double 1 LEP threshold in E1 channels (0 means low, 1 means high)
22 collection_interval int 1 dosimeter collection period (seconds)
23 year int 1 year (e.g. 2015)
24 decimal_year double 1 decimal year = year + (decimal_day-1.0)/(days in year)
25 SVN_number int 1 SVN number of satellite
26 b_coord_radius double 1 radius from earth's dipole axis (earth radii)
27 b_coord_height double 1 height above the earth's dipole equatorial plane (earth radii)
28 magnetic_longitude double 1 Magnetic longitude (degrees)
29 L_shell double 1 L_shell (earth radii) – currently this is the same as L_LGM_T89IGRF but this is intended to be our suggested choice for the L shell calculation in the long run.
30 L_LGM_TS04IGRF double 1 LanlGeoMag L-shell McIlwain calculation, TS04 External Field, IGRF Internal Field.
31 L_LGM_OP77IGRF double 1 LanlGeoMag L-shell McIlwain calculation, OP77 External Field, IGRF Internal Field (not currently filled)
32 L_LGM_T89CDIP double 1 LanlGeoMag L-shell McIlwain calculation, T89 External Field, Centered Dipole Internal Field
33 L_LGM_T89IGRF double 1 LanlGeoMag L-shell McIlwain calculation, T89 External Field, IGRF Internal Field
34 bfield_ratio double 1 Bsatellite/Bequator
35 local_time double 1 magnetic local time (0-24 hours)
36 utc_lgm double 1 UTC (0-24 hours)
37 b_sattelite double 1 B field at satellite (gauss)
38 b_equator double 1 B field at equator (on this field line I think) (gauss)
39-49 electron_background double 11 estimated background in electron channels E1-E11 (Hz)
50-54 proton_background double 5 estimated background in proton channels P1-P5 (Hz)
55 proton_activity int 1 =1 if there is significant proton activity
56 proton_temperature_fit double 1 characteristic momentum -- R0 in the expression given above (MeV/c)
57 proton_density_fit double 1 N0 parameter in fit to proton flux ((protons/(cm2 sec sr MeV))
58 electron_temperature_fit double 1 electron temperature from a one Maxwellian fit (MeV)
59 electron_density_fit double 1 electron number density from a one Maxwellian fit (cm-3)
60-70 model_counts_electron_fit_pf double 11 E1-E11 rates due to proton background based on proton flux fit -- currently not filled (all -1's)
71-75 model_counts_proton_fit_pf double 5 P1-P5 rate from proton fit (using proton_temperature_fit, proton_density_fit)
76-86 model_counts_electron_fit double 11 E1-E11 rates from the 9-parameter electron flux model
87-91 model_counts_proton_fit double 5 P1-P5 rates from electron background -- currently not filled (all -1's)
92-97 proton_integrated_flux_fit double 6 integral of proton flux (based on fit) above 10, 15.85, 25.11, 30, 40, 79.43 MeV (proton kinetic energy)
98-128 proton_flux_fit double 31 intended to be proton flux at 31 energies, not filled currently
129-134 proton_fluence_fit double 6 not filled currently
135-164 integral_flux_instrument double 30 (based on 9 parameter fit) integral of electron flux above integral_flux_energy[i] particles/(cm2 sec)
165-194 integral_flux_energy double 30 energies for the integral of integral_flux_instrument (MeV)
195-209 electron_diff_flux_energy double 15 energies for the fluxes in electron_diff_flux_energy (MeV)
210-224 electron_diff_flux double 15 (based on 9 parameter fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec))
225-233 Efitpars double 9 fit parameters for 9 parameter electron fit


So we remove L_LGM_T89IGRF:

awk '{$33=""; print $0}' ns<nn> | tr -s " " > ../gps-stage-14/ns<nn>

Leaving us with the following data table (NB: I have edited the description so as to preserve the derivation information for the L_shell field):


Column Variable name type Dim. description
1 decimal_day double 1 GPS time, a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years.
2 Geographic_Latitude double 1 Latitude of satellite (deg)
3 Geographic_Longitude double 1 Longitude of satellite (deg)
4 Rad_Re double 1 (radius of satellite)/Rearth
5-15 rate_electron_measured double 11 Measured rate (Hz) in each of the 11 CXD electron channels
16-20 rate_proton_measured double 5 Measured rate (Hz) in each of the 5 CXD proton channels (P1-P5)
21 LEP_thresh double 1 LEP threshold in E1 channels (0 means low, 1 means high)
22 collection_interval int 1 dosimeter collection period (seconds)
23 year int 1 year (e.g. 2015)
24 decimal_year double 1 decimal year = year + (decimal_day-1.0)/(days in year)
25 SVN_number int 1 SVN number of satellite
26 b_coord_radius double 1 radius from earth's dipole axis (earth radii)
27 b_coord_height double 1 height above the earth's dipole equatorial plane (earth radii)
28 magnetic_longitude double 1 Magnetic longitude (degrees)
29 L_shell double 1 L shell: McIlwain calculation according to model with T89 External Field, IGRF Internal Field.
30 L_LGM_TS04IGRF double 1 LanlGeoMag L-shell McIlwain calculation, TS04 External Field, IGRF Internal Field.
31 L_LGM_OP77IGRF double 1 LanlGeoMag L-shell McIlwain calculation, OP77 External Field, IGRF Internal Field (not currently filled)
32 L_LGM_T89CDIP double 1 LanlGeoMag L-shell McIlwain calculation, T89 External Field, Centered Dipole Internal Field
33 bfield_ratio double 1 Bsatellite/Bequator
34 local_time double 1 magnetic local time (0-24 hours)
35 utc_lgm double 1 UTC (0-24 hours)
36 b_sattelite double 1 B field at satellite (gauss)
37 b_equator double 1 B field at equator (on this field line I think) (gauss)
38-48 electron_background double 11 estimated background in electron channels E1-E11 (Hz)
49-53 proton_background double 5 estimated background in proton channels P1-P5 (Hz)
54 proton_activity int 1 =1 if there is significant proton activity
55 proton_temperature_fit double 1 characteristic momentum -- R0 in the expression given above (MeV/c)
56 proton_density_fit double 1 N0 parameter in fit to proton flux ((protons/(cm2 sec sr MeV))
57 electron_temperature_fit double 1 electron temperature from a one Maxwellian fit (MeV)
58 electron_density_fit double 1 electron number density from a one Maxwellian fit (cm-3)
59-69 model_counts_electron_fit_pf double 11 E1-E11 rates due to proton background based on proton flux fit -- currently not filled (all -1's)
70-74 model_counts_proton_fit_pf double 5 P1-P5 rate from proton fit (using proton_temperature_fit, proton_density_fit)
75-85 model_counts_electron_fit double 11 E1-E11 rates from the 9-parameter electron flux model
86-90 model_counts_proton_fit double 5 P1-P5 rates from electron background -- currently not filled (all -1's)
91-96 proton_integrated_flux_fit double 6 integral of proton flux (based on fit) above 10, 15.85, 25.11, 30, 40, 79.43 MeV (proton kinetic energy)
97-127 proton_flux_fit double 31 intended to be proton flux at 31 energies, not filled currently
128-133 proton_fluence_fit double 6 not filled currently
134-163 integral_flux_instrument double 30 (based on 9 parameter fit) integral of electron flux above integral_flux_energy[i] particles/(cm2 sec)
164-193 integral_flux_energy double 30 energies for the integral of integral_flux_instrument (MeV)
194-208 electron_diff_flux_energy double 15 energies for the fluxes in electron_diff_flux_energy (MeV)
209-223 electron_diff_flux double 15 (based on 9 parameter fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec))
224-232 Efitpars double 9 fit parameters for 9 parameter electron fit

 Stage 15: Remove proton_fluence_fit


The documentation for the CXD field proton_fluence_fit says simply:

128-133 proton_fluence_fit double 6 not filled currently

I note that at least some records appear to have plausible data in these fields, but I take the documentation at its word -- after all, these data constitute an official release from LANL, and are provided by NOAA; although it is manifest that the data were not completely corrected before release, users must be able to trust the documentation. Further, and more pragmatically, to second-guess the documentation would render the task of trying to establish a ready-to-process version of the data almost endless.

So we remove the fields corresponding to the proton_fluence_fit variable:

awk '{$128=$129=$130=$131=$132=$133=""; print $0}' ns<nn> | tr -s " " > ../gps-stage-15/ns<nn>

Note that there is no proton_fluence_fit field for the BBD instrument, so we make no changes to the files for ns41 and ns48.

At this point, we checkpoint the data again. The checkpointed data file has the MD5 checksum 06ddd2268edb93c2c040d604a4b9fa7a.

The data table for ns41 and ns48 is unchanged from the last checkpoint (at stage 10):


Column Variable name type Dim. Description
1 decimal_day double 1 GPS time -- a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years
2 Geographic_Latitude double 1 Latitude of satellite (deg)
3 Geographic_Longitude double 1 Longitude of satellite (deg)
4 Rad_Re double 1 (radius of satellite)/Rearth
5-12 rate_electron_measured double 8 Measured rate (Hz) in each of the 8 BDD electron channels (E1-E8)
13-20 rate_proton_measured double 8 Measured rate (Hz) in each of the 8 BDD proton channels (P1-P8)
21 collection_interval int 1 dosimeter collection period (seconds)
22 year int 1 year (e.g. 2015)
23 decimal_year double 1 decimal year = year + (decimal_day-1.0)/(days in year)
24 svn_number int 1 SVN number of satellite
25 b_coord_radius double 1 radius from earth's dipole axis (earth radii)
26 b_coord_height double 1 height above the earth's dipole equatorial plane (earth radii)
27 magnetic_longitude double 1 Magnetic longitude (degrees)
28 L_shell double 1 L_shell (earth radii) -- I do not clearly understand the origin of the calculation, but it seems to be a dipole field/T-89
29 bfield_ratio double 1 Bsatellite/Bequator
30 local_time double 1 magnetic local time (0-24 hours)
31 b_sattelite double 1 B field at satellite (gauss)
32 b_equator double 1 B field at equator (on this field line I think) (gauss)
33-40 electron_background double 8 estimated background in electron channels E1-E8 (Hz)
41-48 proton_background double 8 estimated background in proton channels P1-P8 (Hz)
49 proton_activity int 1 =1 if there is significant proton activity
50 electron_temperature double 1 electron temperature from a one Maxwellian fit (MeV)
51 electron_density_fit double 1 electron number density from a one Maxwellian fit (cm-3)
52-59 model_counts_electron_fit double 8 E1-E8 rates from the 2-parameter Maxwellian fit to the electron data
60-67 dtc_counts_electron double 8 Dead time corrected electron rates (from data, not fit)
68-97 integral_flux_instrument double 30 (based on 2 parameter Maxwellian fit) integral of electron flux above integral_flux_energy[i] particles/(cm2sec)
98-127 integral_flux_energy double 30 energies for the integral of integral_flux_instrument (MeV)
128-142 electron_diff_flux_energy double 15 energies for the fluxes in electron_diff_flux_energy (MeV)
143-157 electron_diff_flux double 15 (based on 2 parameter Maxwellian fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec))


The data table for ns53 to ns73 now looks like this:


Column Variable name type Dim. description
1 decimal_day double 1 GPS time, a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years.
2 Geographic_Latitude double 1 Latitude of satellite (deg)
3 Geographic_Longitude double 1 Longitude of satellite (deg)
4 Rad_Re double 1 (radius of satellite)/Rearth
5-15 rate_electron_measured double 11 Measured rate (Hz) in each of the 11 CXD electron channels
16-20 rate_proton_measured double 5 Measured rate (Hz) in each of the 5 CXD proton channels (P1-P5)
21 LEP_thresh double 1 LEP threshold in E1 channels (0 means low, 1 means high)
22 collection_interval int 1 dosimeter collection period (seconds)
23 year int 1 year (e.g. 2015)
24 decimal_year double 1 decimal year = year + (decimal_day-1.0)/(days in year)
25 SVN_number int 1 SVN number of satellite
26 b_coord_radius double 1 radius from earth's dipole axis (earth radii)
27 b_coord_height double 1 height above the earth's dipole equatorial plane (earth radii)
28 magnetic_longitude double 1 Magnetic longitude (degrees)
29 L_shell double 1 L shell: McIlwain calculation according to model with T89 External Field, IGRF Internal Field.
30 L_LGM_TS04IGRF double 1 LanlGeoMag L-shell McIlwain calculation, TS04 External Field, IGRF Internal Field.
31 L_LGM_OP77IGRF double 1 LanlGeoMag L-shell McIlwain calculation, OP77 External Field, IGRF Internal Field (not currently filled)
32 L_LGM_T89CDIP double 1 LanlGeoMag L-shell McIlwain calculation, T89 External Field, Centered Dipole Internal Field
33 bfield_ratio double 1 Bsatellite/Bequator
34 local_time double 1 magnetic local time (0-24 hours)
35 utc_lgm double 1 UTC (0-24 hours)
36 b_sattelite double 1 B field at satellite (gauss)
37 b_equator double 1 B field at equator (on this field line I think) (gauss)
38-48 electron_background double 11 estimated background in electron channels E1-E11 (Hz)
49-53 proton_background double 5 estimated background in proton channels P1-P5 (Hz)
54 proton_activity int 1 =1 if there is significant proton activity
55 proton_temperature_fit double 1 characteristic momentum -- R0 in the expression given above (MeV/c)
56 proton_density_fit double 1 N0 parameter in fit to proton flux ((protons/(cm2 sec sr MeV))
57 electron_temperature_fit double 1 electron temperature from a one Maxwellian fit (MeV)
58 electron_density_fit double 1 electron number density from a one Maxwellian fit (cm-3)
59-69 model_counts_electron_fit_pf double 11 E1-E11 rates due to proton background based on proton flux fit -- currently not filled (all -1's)
70-74 model_counts_proton_fit_pf double 5 P1-P5 rate from proton fit (using proton_temperature_fit, proton_density_fit)
75-85 model_counts_electron_fit double 11 E1-E11 rates from the 9-parameter electron flux model
86-90 model_counts_proton_fit double 5 P1-P5 rates from electron background -- currently not filled (all -1's)
91-96 proton_integrated_flux_fit double 6 integral of proton flux (based on fit) above 10, 15.85, 25.11, 30, 40, 79.43 MeV (proton kinetic energy)
97-127 proton_flux_fit double 31 intended to be proton flux at 31 energies, not filled currently
128-157 integral_flux_instrument double 30 (based on 9 parameter fit) integral of electron flux above integral_flux_energy[i] particles/(cm2 sec)
158-187 integral_flux_energy double 30 energies for the integral of integral_flux_instrument (MeV)
188-202 electron_diff_flux_energy double 15 energies for the fluxes in electron_diff_flux_energy (MeV)
203-217 electron_diff_flux double 15 (based on 9 parameter fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec))
218-226 Efitpars double 9 fit parameters for 9 parameter electron fit