2017-06-30

Continued Rationalisation and Sanity Checking of the LANL GPS Charged-Particle Dataset

 Earlier posts in this series:
We continue to rationalise the data by removing fields that are unused or provide no useful information, and marking invalid values.

Stage 21: mark invalid values of L-shell fields for the CXD experiment


I have been informed in a private communication that, for satellites ns53 through ns73, values of the L_shell (i.e., L_LGM_T89IGRF; see stage 14) L_LGM_TS04IGRF, L_LGM_OP77IGRF and L_LGM_T89CDIP fields that are set to 999 indicate a value that is "too large to calculate"; at the very least, L-shell values more than ten or so would in any case seem to be too large to be useful. So we convert all the instances of the value 999 to NA:

  for file in ns[567]*
  do
    awk '{for(i=29;i<=32;++i) if ($i=="9.990000e+02") $i="NA"; {print $0}}' $file > ../gps-stage-21/$file
  done

Sanity checks


At this point we have modified what seem to me to be the most obvious problems with the datasets. However, the unfortunately large number of modifications that have been necessary suggest that it is worthwhile to perform basic sanity checks on the values of all the fields, in an attempt to uncover and correct other issues.

We start by considering, one at a time, the fields in the CXD datasets.

decimal_day


The documentation for decimal_day states:

Column Variable name type Dim. description
1 decimal_day double 1 GPS time, a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years.

Reviewing the minimum and maximum values, we find:

decimal_day
Satellite Minimum Maximum
53 1.000394 366.998889
54 1.000405 366.999583
55 1.000000 366.998194
56 1.000972 366.998472
57 1.000000 366.997917
58 1.001250 366.999306
59 1.000532 366.999861
60 1.000278 366.998322
61 1.001782 366.999306
62 1.001424 366.999387
63 1.001157 366.999873
64 1.001215 365.999803
65 1.001389 366.998611
66 1.001667 366.998889
67 1.001458 366.998681
68 1.000660 366.998171
69 1.000868 366.999896
70 45.002049 366.997292
71 1.002720 366.999954
72 1.001470 366.998472
72 1.001910 366.999132

Pending more detailed review, these values look not unreasonable; there is therefore no need to change or mark any of the values.

However, we can edit the description of the field in order to make it a little clearer:

Column Variable name type Dim. description
1 decimal_day double 1 GPS time: a decimal number in the range [1, 367) in leap years or [1, 366) otherwise, representing the day of the year (1-Jan 00:00 to 31-Dec 24:00).

Geographic_Latitude


The documentation for Geographic_Latitude states:

Column Variable name type Dim. description
2 Geographic_Latitude double 1 Latitude of satellite (deg)

Reviewing the minimum and maximum values, we find:

Geographic_Latitude
Satellite Minimum Maximum
53 -5.607510e+01 5.607590e+01
54 -5.527530e+01 5.527490e+01
55 -5.488720e+01 5.488530e+01
56 -5.678880e+01 5.679100e+01
57 -5.613660e+01 5.613600e+01
58 -5.676120e+01 5.676080e+01
59 -5.591630e+01 5.591610e+01
60 -5.567280e+01 5.567290e+01
61 -5.500000e+01 5.486250e+01
62 -5.608950e+01 5.608750e+01
63 -5.537850e+01 5.537940e+01
64 -5.499830e+01 5.499820e+01
65 -5.497930e+01 5.497810e+01
66 -5.578240e+01 5.578390e+01
67 -5.536860e+01 5.536670e+01
68 -5.498410e+01 5.498420e+01
69 -5.500030e+01 5.500240e+01
70 -5.502090e+01 5.501900e+01
71 -5.505770e+01 5.505750e+01
72 -5.531860e+01 5.531810e+01
72 -5.501500e+01 5.501500e+01

These values look reasonable (i.e., the absolute values of the maxima and minima are roughly equal; and, of course they correspond to the inclination of the satellites). However, the data could be more sensibly reported as numbers with four deicmal places: this would be both more compact and more accurately represent the precision of the numbers. So that suggests:

Stage 22: reformat values of Geographic_Latitude


We can accomplish the reformatting easily, by applying the following script, do-stage-22.py,  to each file:

#!/usr/bin/env python
# -*- coding: utf8 -*-

import re
import sys

filename = sys.argv[1]

with open(filename) as records:
  for line in records:
    fields = line.split()
    value_str = fields[1]      # latitude
    value = float(value_str)
  
    new_value_str = '{:10.4f}'.format(value)
  
    fields[1] = new_value_str
    newline = " ".join(fields)
 
    print newline


Applying this to each file:

  for file in ns[567]*
  do
    ./do-stage-22.py $file | tr -s " " | sed 's/ $//' > ../gps-stage-22/$file
  done

The resulting maxima and minima now look like this:

Geographic_Latitude
Satellite Minimum Maximum
53 -56.0751 56.0759
54 -55.2753 55.2749
55 -54.8872 54.8853
56 -56.7888 56.7910
57 -56.1366 56.1360
58 -56.7612 56.7608
59 -55.9163 55.9161
60 -55.6728 55.6729
61 -55.0000 54.8625
62 -56.0895 56.0875
63 -55.3785 55.3794
64 -54.9983 54.9982
65 -54.9793 54.9781
66 -55.7824 55.7839
67 -55.3686 55.3667
68 -54.9841 54.9842
69 -55.0003 55.0024
70 -55.0209 55.0190
71 -55.0577 55.0575
72 -55.3186 55.3181
72 -55.0150 55.0150

We can also improve the clarity of the description:

Column Variable name type Dim. description
2 Geographic_Latitude double 1 Latitude of satellite (°, N +ve)

Geographic_Longitude


The documentation for Geographic_Longitude states

Column Variable name type Dim. description
3 Geographic_Longitude double 1 Longitude of satellite (deg)

Reviewing the minimum and maximum values, we find:

Geographic_Longitude
Satellite Minimum Maximum
53 -1.799853e+02 3.599998e+02
54 -1.799993e+02 3.599999e+02
55 6.000000e-04 3.599999e+02
56 -1.799998e+02 3.600000e+02
57 1.000000e-04 3.599983e+02
58 -1.798043e+02 3.599996e+02
59 -1.799988e+02 3.599999e+02
60 -1.800000e+02 3.600000e+02
61 -1.799970e+02 3.599998e+02
62 2.000000e-03 3.599986e+02
63 1.700000e-03 3.599990e+02
64 0.000000e+00 3.599998e+02
65 9.000000e-04 3.599995e+02
66 2.900000e-03 3.599995e+02
67 5.000000e-04 3.599987e+02
68 2.000000e-04 3.600000e+02
69 5.000000e-04 3.599997e+02
70 2.500000e-03 3.599793e+02
71 3.900000e-03 3.599997e+02
72 2.100000e-03 3.599992e+02
72 1.000000e-03 3.599954e+02

These numbers raise two issues:

1. Not all values are correctly mapped to the principal domain (i.e., [0, 360))
2. The accuracy to which a value is reported is a function of its value.

The second of these is not unreasonable when reporting some kinds of values (for example, counts), but it makes no sense when applied to geographic longitude. (If this isn't obvious, think about the difference in accuracy for the two values 0+ε and 0-ε; or, in the alternative, consider that the reported accuracy will change under a mere rotation transformation.)

Thus we are led to:

Stage 23: reformat values of Geographic_Longitude


We can accomplish the reformatting easily, by applying the following script, do-stage-23.py,  to each file:

#!/usr/bin/env python
# -*- coding: utf8 -*-

import re
import sys

filename = sys.argv[1]

with open(filename) as records:
  for line in records:
    fields = line.split()
    value_str = fields[2]      # longitude
    value = float(value_str)
   
    if value < 0:
      value += 360
     
    if value >= 360:
      value -= 360
     
    new_value_str = '{:9.4f}'.format(value)
   
    fields[2] = new_value_str
    newline = " ".join(fields)
  
    print newline


Applying this to each file:

  for file in ns[567]*
  do
    ./do-stage-23.py $file | tr -s " " | sed 's/ $//' > ../gps-stage-23/$file
  done

The resulting maxima and minima now look like this:

Geographic_Longitude
Satellite Minimum Maximum
53 0.0000 359.9998
54 0.0002 359.9999
55 0.0006 359.9999
56 0.0000 359.9999
57 0.0001 359.9983
58 0.0010 359.9996
59 0.0001 359.9999
60 0.0000 359.9999
61 0.0015 359.9998
62 0.0020 359.9986
63 0.0017 359.9990
64 0.0000 359.9998
65 0.0009 359.9995
66 0.0029 359.9995
67 0.0005 359.9987
68 0.0000 359.9965
69 0.0005 359.9997
70 0.0025 359.9793
71 0.0039 359.9997
72 0.0021 359.9992
72 0.0010 359.9954

We can also improve the clarity of the description:

Column Variable name type Dim. description
3 Geographic_Longitude double 1 Longitude of satellite (°, E +ve, measured from Greenwich meridian)

Rad_Re


The documentation for Rad_Re states:

Column Variable name type Dim. description
4 Rad_Re double 1 (radius of satellite)/Rearth

Reviewing the minimum and maximum values, we find:

Rad_Re
Satellite Minimum Maximum
53 4.120330e+00 4.217263e+00
54 4.096529e+00 4.241498e+00
55 4.131586e+00 4.206511e+00
56 4.131644e+00 4.206146e+00
57 4.152958e+00 4.184506e+00
58 4.142916e+00 4.194755e+00
59 4.120665e+00 4.218718e+00
60 4.122245e+00 4.215404e+00
61 4.100667e+00 4.237281e+00
62 4.144375e+00 4.209335e+00
63 4.143611e+00 4.213319e+00
64 4.158991e+00 4.213424e+00
65 4.146987e+00 4.206427e+00
66 4.151798e+00 4.186115e+00
67 4.166345e+00 4.215303e+00
68 4.165405e+00 4.214924e+00
69 4.164891e+00 4.173357e+00
70 4.166151e+00 4.211028e+00
71 4.162719e+00 4.175120e+00
72 4.159129e+00 4.178379e+00
72 4.160420e+00 4.217236e+00

While these values look fine, the use of scientific notation wastes space. Accordingly:

Stage 24: reformat values of Rad_Re


We can accomplish the reformatting easily, by applying the following script, do-stage-24.py,  to each file:

#!/usr/bin/env python
# -*- coding: utf8 -*-

import re
import sys

filename = sys.argv[1]

FIELD_NR = 3

with open(filename) as records:
  for line in records:
    fields = line.split()
    value_str = fields[FIELD_NR]      # Rad_Re
    value = float(value_str)
   
    new_value_str = '{:10.6f}'.format(value)
   
    fields[FIELD_NR] = new_value_str
    newline = " ".join(fields)
  
    print newline


Applying this to each file:

  for file in ns[567]*
  do
    ./do-stage-24.py $file | tr -s " " | sed 's/ $//' > ../gps-stage-24/$file
  done

The resulting maxima and minima look like this:

Rad_Re
Satellite Minimum Maximum
53 4.120330 4.217263
54 4.096529 4.241498
55 4.131586 4.206511
56 4.131644 4.206146
57 4.152958 4.184506
58 4.142916 4.194755
59 4.120665 4.218718
60 4.122245 4.215404
61 4.100667 4.237281
62 4.144375 4.209335
63 4.143611 4.213319
64 4.158991 4.213424
65 4.146987 4.206427
66 4.151798 4.186115
67 4.166345 4.215303
68 4.165405 4.214924
69 4.164891 4.173357
70 4.166151 4.211028
71 4.162719 4.175120
72 4.159129 4.178379
72 4.160420 4.217236

We can also improve the clarity of the description:

Column Variable name type Dim. description
4 Rad_Re double 1 Distance from centre of Earth, in units of Earth radii.

Stage 25: reformat values of LEP_thresh


The documentation for LEP_thresh states:

Column Variable name type Dim. description
21 LEP_thresh double 1 LEP threshold in E1 channels (0 means low, 1 means high)

Checking the actual values in the data files shows that the values are, as expected, always either zero or one. However, as the documentation states, the values are stores as doubles. It seems silly to store a simple integer value as a double, so we can reformat this field so that it explicitly contains an integer. Further, I have received via private communication an explanation of the meaning of zero and one in this field: zero means that the threshold is 75 keV, and one means that it is 120 keV. So we can write these values (as integers) directly into the data files:

  for file in ns[567]*
  do 
    awk '$21=="0.000000e+00" {$21="75"}; $21=="1.000000e+00" {$21="120"}; \
 {print $0};' $file > ../gps-stage-25/$file
  done

We can also amend the documentation for the field:

Column Variable name type Dim. description
21 LEP_thresh int 1 LEP threshold in E1 channels, in keV


Time to checkpoint the dataset again. The checkpoint file has the MD5 checksum: 00b632a911316fa9e44093450f56c3cb.

The data table for ns41 and ns48 still looks like this:


Column Variable name type Dim. Description
1 decimal_day double 1 GPS time -- a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years
2 Geographic_Latitude double 1 Latitude of satellite (deg)
3 Geographic_Longitude double 1 Longitude of satellite (deg)
4 Rad_Re double 1 (radius of satellite)/Rearth
5-12 rate_electron_measured double 8 Measured rate (Hz) in each of the 8 BDD electron channels (E1-E8)
13-20 rate_proton_measured double 8 Measured rate (Hz) in each of the 8 BDD proton channels (P1-P8)
21 collection_interval int 1 dosimeter collection period (seconds)
22 year int 1 year (e.g. 2015)
23 decimal_year double 1 decimal year = year + (decimal_day-1.0)/(days in year)
24 svn_number int 1 SVN number of satellite
25 b_coord_radius double 1 radius from earth's dipole axis (earth radii)
26 b_coord_height double 1 height above the earth's dipole equatorial plane (earth radii)
27 magnetic_longitude double 1 Magnetic longitude (degrees)
28 L_shell double 1 L_shell (earth radii) -- I do not clearly understand the origin of the calculation, but it seems to be a dipole field/T-89
29 bfield_ratio double 1 Bsatellite/Bequator
30 local_time double 1 magnetic local time (0-24 hours)
31 b_sattelite double 1 B field at satellite (gauss)
32 b_equator double 1 B field at equator (on this field line I think) (gauss)
33-40 electron_background double 8 estimated background in electron channels E1-E8 (Hz)
41-48 proton_background double 8 estimated background in proton channels P1-P8 (Hz)
49 proton_activity int 1 =1 if there is significant proton activity
50 electron_temperature double 1 electron temperature from a one Maxwellian fit (MeV)
51 electron_density_fit double 1 electron number density from a one Maxwellian fit (cm-3)
52-59 model_counts_electron_fit double 8 E1-E8 rates from the 2-parameter Maxwellian fit to the electron data
60-67 dtc_counts_electron double 8 Dead time corrected electron rates (from data, not fit)
68-97 integral_flux_instrument double 30 (based on 2 parameter Maxwellian fit) integral of electron flux above integral_flux_energy[i] particles/(cm2sec)
98-127 integral_flux_energy double 30 energies for the integral of integral_flux_instrument (MeV)
128-142 electron_diff_flux_energy double 15 energies for the fluxes in electron_diff_flux_energy (MeV)
143-157 electron_diff_flux double 15 (based on 2 parameter Maxwellian fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec))


And for the remaining satellites we now have:


Column Variable name type Dim. description
1 decimal_day double 1 GPS time: a decimal number in the range [1, 367) in leap years or [1, 366) otherwise, representing the day of the year (1-Jan 00:00 to 31-Dec 24:00).
2 Geographic_Latitude double 1 Latitude of satellite (°, N +ve)
3 Geographic_Longitude double 1 Longitude of satellite (°, E +ve, measured from Greenwich meridian)
4 Rad_Re double 1 Distance from centre of Earth, in units of Earth radii.
5-15 rate_electron_measured double 11 Measured rate (Hz) in each of the 11 CXD electron channels
16-20 rate_proton_measured double 5 Measured rate (Hz) in each of the 5 CXD proton channels (P1-P5)
21 LEP_thresh double 1 LEP threshold in E1 channels, in keV
22 collection_interval int 1 dosimeter collection period (seconds)
23 year int 1 year (e.g. 2015)
24 decimal_year double 1 decimal year = year + (decimal_day-1.0)/(days in year)
25 SVN_number int 1 SVN number of satellite
26 b_coord_radius double 1 radius from earth's dipole axis (earth radii)
27 b_coord_height double 1 height above the earth's dipole equatorial plane (earth radii)
28 magnetic_longitude double 1 Magnetic longitude (degrees)
29 L_shell double 1 L shell: McIlwain calculation according to model with T89 External Field, IGRF Internal Field.
30 L_LGM_TS04IGRF double 1 LanlGeoMag L-shell McIlwain calculation, TS04 External Field, IGRF Internal Field.
31 L_LGM_OP77IGRF double 1 LanlGeoMag L-shell McIlwain calculation, OP77 External Field, IGRF Internal Field (not currently filled)
32 L_LGM_T89CDIP double 1 LanlGeoMag L-shell McIlwain calculation, T89 External Field, Centered Dipole Internal Field
33 bfield_ratio double 1 Bsatellite/Bequator
34 local_time double 1 magnetic local time (0-24 hours)
35 utc_lgm double 1 UTC (0-24 hours)
36 b_sattelite double 1 B field at satellite (gauss)
37 b_equator double 1 B field at equator (on this field line I think) (gauss)
38-48 electron_background double 11 estimated background in electron channels E1-E11 (Hz)
49-53 proton_background double 5 estimated background in proton channels P1-P5 (Hz)
54 proton_activity int 1 =1 if there is significant proton activity
55 proton_temperature_fit double 1 characteristic momentum -- R0 in the expression given above (MeV/c)
56 proton_density_fit double 1 N0 parameter in fit to proton flux ((protons/(cm2 sec sr MeV))
57 electron_temperature_fit double 1 electron temperature from a one Maxwellian fit (MeV)
58 electron_density_fit double 1 electron number density from a one Maxwellian fit (cm-3)
59-63 model_counts_proton_fit_pf double 5 P1-P5 rate from proton fit (using proton_temperature_fit, proton_density_fit)
64-74 model_counts_electron_fit double 11 E1-E11 rates from the 9-parameter electron flux model
75-80 proton_integrated_flux_fit double 6 integral of proton flux (based on fit) above 10, 15.85, 25.11, 30, 40, 79.43 MeV (proton kinetic energy)
81-110 integral_flux_instrument double 30 (based on 9 parameter fit) integral of electron flux above integral_flux_energy[i] particles/(cm2 sec)
111-140 integral_flux_energy double 30 energies for the integral of integral_flux_instrument (MeV)
141-155 electron_diff_flux_energy double 15 energies for the fluxes in electron_diff_flux_energy (MeV)
156-170 electron_diff_flux double 15 (based on 9 parameter fit) electron flux at energies electron_diff_flux[i] (particle/(cm2 sr MeV sec))
171-179 Efitpars double 9 fit parameters for 9 parameter electron fit



No comments:

Post a Comment

Note: Only a member of this blog may post a comment.