D. R. Evans (N7DR): Continued Rationalisation and Sanity Checking of the LANL GPS Charged-Particle Dataset

Earlier posts in this series:

We continue to rationalise the data by removing fields that are unused or provide no useful information, and marking invalid values.

Stage 21: mark invalid values of L-shell fields for the CXD experiment

I have been informed in a private communication that, for satellites ns53 through ns73, values of the L_shell (i.e., L_LGM_T89IGRF; see stage 14) L_LGM_TS04IGRF, L_LGM_OP77IGRF and L_LGM_T89CDIP fields that are set to 999 indicate a value that is "too large to calculate"; at the very least, L-shell values more than ten or so would in any case seem to be too large to be useful. So we convert all the instances of the value 999 to NA:

for file in ns[567]*
do
awk '{for(i=29;i<=32;++i) if (

$i=="9.990000e+02")$ i="NA"; {print

$0}}'$ file > ../gps-stage-21/$file
done

Sanity checks

At this point we have modified what seem to me to be the most obvious problems with the datasets. However, the unfortunately large number of modifications that have been necessary suggest that it is worthwhile to perform basic sanity checks on the values of all the fields, in an attempt to uncover and correct other issues.

We start by considering, one at a time, the fields in the CXD datasets.

decimal_day

The documentation for decimal_day states:

Column	Variable name	type	Dim.	description
1	decimal_day	double	1	GPS time, a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years.

Reviewing the minimum and maximum values, we find:

decimal_day
Satellite	Minimum	Maximum
53	1.000394	366.998889
54	1.000405	366.999583
55	1.000000	366.998194
56	1.000972	366.998472
57	1.000000	366.997917
58	1.001250	366.999306
59	1.000532	366.999861
60	1.000278	366.998322
61	1.001782	366.999306
62	1.001424	366.999387
63	1.001157	366.999873
64	1.001215	365.999803
65	1.001389	366.998611
66	1.001667	366.998889
67	1.001458	366.998681
68	1.000660	366.998171
69	1.000868	366.999896
70	45.002049	366.997292
71	1.002720	366.999954
72	1.001470	366.998472
72	1.001910	366.999132

Pending more detailed review, these values look not unreasonable; there is therefore no need to change or mark any of the values.

However, we can edit the description of the field in order to make it a little clearer:

Column	Variable name	type	Dim.	description
1	decimal_day	double	1	GPS time: a decimal number in the range [1, 367) in leap years or [1, 366) otherwise, representing the day of the year (1-Jan 00:00 to 31-Dec 24:00).

Geographic_Latitude

The documentation for Geographic_Latitude states:

Column	Variable name	type	Dim.	description
2	Geographic_Latitude	double	1	Latitude of satellite (deg)

Reviewing the minimum and maximum values, we find:

Geographic_Latitude
Satellite	Minimum	Maximum
53	-5.607510e+01	5.607590e+01
54	-5.527530e+01	5.527490e+01
55	-5.488720e+01	5.488530e+01
56	-5.678880e+01	5.679100e+01
57	-5.613660e+01	5.613600e+01
58	-5.676120e+01	5.676080e+01
59	-5.591630e+01	5.591610e+01
60	-5.567280e+01	5.567290e+01
61	-5.500000e+01	5.486250e+01
62	-5.608950e+01	5.608750e+01
63	-5.537850e+01	5.537940e+01
64	-5.499830e+01	5.499820e+01
65	-5.497930e+01	5.497810e+01
66	-5.578240e+01	5.578390e+01
67	-5.536860e+01	5.536670e+01
68	-5.498410e+01	5.498420e+01
69	-5.500030e+01	5.500240e+01
70	-5.502090e+01	5.501900e+01
71	-5.505770e+01	5.505750e+01
72	-5.531860e+01	5.531810e+01
72	-5.501500e+01	5.501500e+01

These values look reasonable (i.e., the absolute values of the maxima and minima are roughly equal; and, of course they correspond to the inclination of the satellites). However, the data could be more sensibly reported as numbers with four deicmal places: this would be both more compact and more accurately represent the precision of the numbers. So that suggests:

Stage 22: reformat values of Geographic_Latitude

We can accomplish the reformatting easily, by applying the following script, do-stage-22.py, to each file:

#!/usr/bin/env python
# -*- coding: utf8 -*-

import re
import sys

filename = sys.argv[1]

with open(filename) as records:
for line in records:
    fields = line.split()
    value_str = fields[1]      # latitude
    value = float(value_str)

    new_value_str = '{:10.4f}'.format(value)

    fields[1] = new_value_str
    newline = " ".join(fields)

    print newline

Applying this to each file:

for file in ns[567]*
do
    ./do-stage-22.py

$file | tr -s " " | sed 's/$ //' > ../gps-stage-22/$file
done

The resulting maxima and minima now look like this:

Geographic_Latitude
Satellite	Minimum	Maximum
53	-56.0751	56.0759
54	-55.2753	55.2749
55	-54.8872	54.8853
56	-56.7888	56.7910
57	-56.1366	56.1360
58	-56.7612	56.7608
59	-55.9163	55.9161
60	-55.6728	55.6729
61	-55.0000	54.8625
62	-56.0895	56.0875
63	-55.3785	55.3794
64	-54.9983	54.9982
65	-54.9793	54.9781
66	-55.7824	55.7839
67	-55.3686	55.3667
68	-54.9841	54.9842
69	-55.0003	55.0024
70	-55.0209	55.0190
71	-55.0577	55.0575
72	-55.3186	55.3181
72	-55.0150	55.0150

We can also improve the clarity of the description:

Column	Variable name	type	Dim.	description
2	Geographic_Latitude	double	1	Latitude of satellite (°, N +ve)

Geographic_Longitude

The documentation for Geographic_Longitude states

Column	Variable name	type	Dim.	description
3	Geographic_Longitude	double	1	Longitude of satellite (deg)

Reviewing the minimum and maximum values, we find:

Geographic_Longitude
Satellite	Minimum	Maximum
53	-1.799853e+02	3.599998e+02
54	-1.799993e+02	3.599999e+02
55	6.000000e-04	3.599999e+02
56	-1.799998e+02	3.600000e+02
57	1.000000e-04	3.599983e+02
58	-1.798043e+02	3.599996e+02
59	-1.799988e+02	3.599999e+02
60	-1.800000e+02	3.600000e+02
61	-1.799970e+02	3.599998e+02
62	2.000000e-03	3.599986e+02
63	1.700000e-03	3.599990e+02
64	0.000000e+00	3.599998e+02
65	9.000000e-04	3.599995e+02
66	2.900000e-03	3.599995e+02
67	5.000000e-04	3.599987e+02
68	2.000000e-04	3.600000e+02
69	5.000000e-04	3.599997e+02
70	2.500000e-03	3.599793e+02
71	3.900000e-03	3.599997e+02
72	2.100000e-03	3.599992e+02
72	1.000000e-03	3.599954e+02

These numbers raise two issues:

1. Not all values are correctly mapped to the principal domain (i.e., [0, 360))
2. The accuracy to which a value is reported is a function of its value.

The second of these is not unreasonable when reporting some kinds of values (for example, counts), but it makes no sense when applied to geographic longitude. (If this isn't obvious, think about the difference in accuracy for the two values 0+ε and 0-ε; or, in the alternative, consider that the reported accuracy will change under a mere rotation transformation.)

Thus we are led to:

Stage 23: reformat values of Geographic_Longitude

We can accomplish the reformatting easily, by applying the following script, do-stage-23.py, to each file:

#!/usr/bin/env python
# -*- coding: utf8 -*-

import re
import sys

filename = sys.argv[1]

with open(filename) as records:
for line in records:
    fields = line.split()
    value_str = fields[2]      # longitude
    value = float(value_str)

    if value < 0:
      value += 360

    if value >= 360:
      value -= 360

    new_value_str = '{:9.4f}'.format(value)

    fields[2] = new_value_str
    newline = " ".join(fields)

    print newline

Applying this to each file:

for file in ns[567]*
do
    ./do-stage-23.py

$file | tr -s " " | sed 's/$ //' > ../gps-stage-23/$file
done

The resulting maxima and minima now look like this:

Geographic_Longitude
Satellite	Minimum	Maximum
53	0.0000	359.9998
54	0.0002	359.9999
55	0.0006	359.9999
56	0.0000	359.9999
57	0.0001	359.9983
58	0.0010	359.9996
59	0.0001	359.9999
60	0.0000	359.9999
61	0.0015	359.9998
62	0.0020	359.9986
63	0.0017	359.9990
64	0.0000	359.9998
65	0.0009	359.9995
66	0.0029	359.9995
67	0.0005	359.9987
68	0.0000	359.9965
69	0.0005	359.9997
70	0.0025	359.9793
71	0.0039	359.9997
72	0.0021	359.9992
72	0.0010	359.9954

We can also improve the clarity of the description:

Column	Variable name	type	Dim.	description
3	Geographic_Longitude	double	1	Longitude of satellite (°, E +ve, measured from Greenwich meridian)

Rad_Re

The documentation for Rad_Re states:

Column	Variable name	type	Dim.	description
4	Rad_Re	double	1	(radius of satellite)/Rearth

Reviewing the minimum and maximum values, we find:

Rad_Re
Satellite	Minimum	Maximum
53	4.120330e+00	4.217263e+00
54	4.096529e+00	4.241498e+00
55	4.131586e+00	4.206511e+00
56	4.131644e+00	4.206146e+00
57	4.152958e+00	4.184506e+00
58	4.142916e+00	4.194755e+00
59	4.120665e+00	4.218718e+00
60	4.122245e+00	4.215404e+00
61	4.100667e+00	4.237281e+00
62	4.144375e+00	4.209335e+00
63	4.143611e+00	4.213319e+00
64	4.158991e+00	4.213424e+00
65	4.146987e+00	4.206427e+00
66	4.151798e+00	4.186115e+00
67	4.166345e+00	4.215303e+00
68	4.165405e+00	4.214924e+00
69	4.164891e+00	4.173357e+00
70	4.166151e+00	4.211028e+00
71	4.162719e+00	4.175120e+00
72	4.159129e+00	4.178379e+00
72	4.160420e+00	4.217236e+00

While these values look fine, the use of scientific notation wastes space. Accordingly:

Stage 24: reformat values of Rad_Re

We can accomplish the reformatting easily, by applying the following script, do-stage-24.py, to each file:

#!/usr/bin/env python
# -*- coding: utf8 -*-

import re
import sys

filename = sys.argv[1]

FIELD_NR = 3

with open(filename) as records:
for line in records:
    fields = line.split()
    value_str = fields[FIELD_NR]      # Rad_Re
    value = float(value_str)

    new_value_str = '{:10.6f}'.format(value)

    fields[FIELD_NR] = new_value_str
    newline = " ".join(fields)

    print newline

Applying this to each file:

for file in ns[567]*
do
    ./do-stage-24.py

$file | tr -s " " | sed 's/$ //' > ../gps-stage-24/$file
done

The resulting maxima and minima look like this:

Rad_Re
Satellite	Minimum	Maximum
53	4.120330	4.217263
54	4.096529	4.241498
55	4.131586	4.206511
56	4.131644	4.206146
57	4.152958	4.184506
58	4.142916	4.194755
59	4.120665	4.218718
60	4.122245	4.215404
61	4.100667	4.237281
62	4.144375	4.209335
63	4.143611	4.213319
64	4.158991	4.213424
65	4.146987	4.206427
66	4.151798	4.186115
67	4.166345	4.215303
68	4.165405	4.214924
69	4.164891	4.173357
70	4.166151	4.211028
71	4.162719	4.175120
72	4.159129	4.178379
72	4.160420	4.217236

We can also improve the clarity of the description:

Column	Variable name	type	Dim.	description
4	Rad_Re	double	1	Distance from centre of Earth, in units of Earth radii.

Stage 25: reformat values of LEP_thresh

The documentation for LEP_thresh states:

Column	Variable name	type	Dim.	description
21	LEP_thresh	double	1	LEP threshold in E1 channels (0 means low, 1 means high)

Checking the actual values in the data files shows that the values are, as expected, always either zero or one. However, as the documentation states, the values are stores as doubles. It seems silly to store a simple integer value as a double, so we can reformat this field so that it explicitly contains an integer. Further, I have received via private communication an explanation of the meaning of zero and one in this field: zero means that the threshold is 75 keV, and one means that it is 120 keV. So we can write these values (as integers) directly into the data files:

for file in ns[567]*
do
awk '

$21=="0.000000e+00" {$ 21="75"};

$21=="1.000000e+00" {$ 21="120"}; \
{print

$0};'$ file > ../gps-stage-25/$file
done

We can also amend the documentation for the field:

Column	Variable name	type	Dim.	description
21	LEP_thresh	int	1	LEP threshold in E1 channels, in keV

Time to checkpoint the dataset again. The checkpoint file has the MD5 checksum: 00b632a911316fa9e44093450f56c3cb.

The data table for ns41 and ns48 still looks like this:

Column	Variable name	type	Dim.	Description
1	decimal_day	double	1	GPS time -- a number from 1 (1-Jan 00:00) to 366 (31-Dec 24:00) or 367 in leap years
2	Geographic_Latitude	double	1	Latitude of satellite (deg)
3	Geographic_Longitude	double	1	Longitude of satellite (deg)
4	Rad_Re	double	1	(radius of satellite)/Rearth
5-12	rate_electron_measured	double	8	Measured rate (Hz) in each of the 8 BDD electron channels (E1-E8)
13-20	rate_proton_measured	double	8	Measured rate (Hz) in each of the 8 BDD proton channels (P1-P8)
21	collection_interval	int	1	dosimeter collection period (seconds)
22	year	int	1	year (e.g. 2015)
23	decimal_year	double	1	decimal year = year + (decimal_day-1.0)/(days in year)
24	svn_number	int	1	SVN number of satellite
25	b_coord_radius	double	1	radius from earth's dipole axis (earth radii)
26	b_coord_height	double	1	height above the earth's dipole equatorial plane (earth radii)
27	magnetic_longitude	double	1	Magnetic longitude (degrees)
28	L_shell	double	1	L_shell (earth radii) -- I do not clearly understand the origin of the calculation, but it seems to be a dipole field/T-89
29	bfield_ratio	double	1	Bsatellite/Bequator
30	local_time	double	1	magnetic local time (0-24 hours)
31	b_sattelite	double	1	B field at satellite (gauss)
32	b_equator	double	1	B field at equator (on this field line I think) (gauss)
33-40	electron_background	double	8	estimated background in electron channels E1-E8 (Hz)
41-48	proton_background	double	8	estimated background in proton channels P1-P8 (Hz)
49	proton_activity	int	1	=1 if there is significant proton activity
50	electron_temperature	double	1	electron temperature from a one Maxwellian fit (MeV)
51	electron_density_fit	double	1	electron number density from a one Maxwellian fit (cm^-3)
52-59	model_counts_electron_fit	double	8	E1-E8 rates from the 2-parameter Maxwellian fit to the electron data
60-67	dtc_counts_electron	double	8	Dead time corrected electron rates (from data, not fit)
68-97	integral_flux_instrument	double	30	(based on 2 parameter Maxwellian fit) integral of electron flux above integral_flux_energy[i] particles/(cm²sec)
98-127	integral_flux_energy	double	30	energies for the integral of integral_flux_instrument (MeV)
128-142	electron_diff_flux_energy	double	15	energies for the fluxes in electron_diff_flux_energy (MeV)
143-157	electron_diff_flux	double	15	(based on 2 parameter Maxwellian fit) electron flux at energies electron_diff_flux[i] (particle/(cm² sr MeV sec))

And for the remaining satellites we now have:

Column	Variable name	type	Dim.	description
1	decimal_day	double	1	GPS time: a decimal number in the range [1, 367) in leap years or [1, 366) otherwise, representing the day of the year (1-Jan 00:00 to 31-Dec 24:00).
2	Geographic_Latitude	double	1	Latitude of satellite (°, N +ve)
3	Geographic_Longitude	double	1	Longitude of satellite (°, E +ve, measured from Greenwich meridian)
4	Rad_Re	double	1	Distance from centre of Earth, in units of Earth radii.
5-15	rate_electron_measured	double	11	Measured rate (Hz) in each of the 11 CXD electron channels
16-20	rate_proton_measured	double	5	Measured rate (Hz) in each of the 5 CXD proton channels (P1-P5)
21	LEP_thresh	double	1	LEP threshold in E1 channels, in keV
22	collection_interval	int	1	dosimeter collection period (seconds)
23	year	int	1	year (e.g. 2015)
24	decimal_year	double	1	decimal year = year + (decimal_day-1.0)/(days in year)
25	SVN_number	int	1	SVN number of satellite
26	b_coord_radius	double	1	radius from earth's dipole axis (earth radii)
27	b_coord_height	double	1	height above the earth's dipole equatorial plane (earth radii)
28	magnetic_longitude	double	1	Magnetic longitude (degrees)
29	L_shell	double	1	L shell: McIlwain calculation according to model with T89 External Field, IGRF Internal Field.
30	L_LGM_TS04IGRF	double	1	LanlGeoMag L-shell McIlwain calculation, TS04 External Field, IGRF Internal Field.
31	L_LGM_OP77IGRF	double	1	LanlGeoMag L-shell McIlwain calculation, OP77 External Field, IGRF Internal Field (not currently filled)
32	L_LGM_T89CDIP	double	1	LanlGeoMag L-shell McIlwain calculation, T89 External Field, Centered Dipole Internal Field
33	bfield_ratio	double	1	Bsatellite/Bequator
34	local_time	double	1	magnetic local time (0-24 hours)
35	utc_lgm	double	1	UTC (0-24 hours)
36	b_sattelite	double	1	B field at satellite (gauss)
37	b_equator	double	1	B field at equator (on this field line I think) (gauss)
38-48	electron_background	double	11	estimated background in electron channels E1-E11 (Hz)
49-53	proton_background	double	5	estimated background in proton channels P1-P5 (Hz)
54	proton_activity	int	1	=1 if there is significant proton activity
55	proton_temperature_fit	double	1	characteristic momentum -- R₀ in the expression given above (MeV/c)
56	proton_density_fit	double	1	N₀ parameter in fit to proton flux ((protons/(cm² sec sr MeV))
57	electron_temperature_fit	double	1	electron temperature from a one Maxwellian fit (MeV)
58	electron_density_fit	double	1	electron number density from a one Maxwellian fit (cm^-3)
59-63	model_counts_proton_fit_pf	double	5	P1-P5 rate from proton fit (using proton_temperature_fit, proton_density_fit)
64-74	model_counts_electron_fit	double	11	E1-E11 rates from the 9-parameter electron flux model
75-80	proton_integrated_flux_fit	double	6	integral of proton flux (based on fit) above 10, 15.85, 25.11, 30, 40, 79.43 MeV (proton kinetic energy)
81-110	integral_flux_instrument	double	30	(based on 9 parameter fit) integral of electron flux above integral_flux_energy[i] particles/(cm² sec)
111-140	integral_flux_energy	double	30	energies for the integral of integral_flux_instrument (MeV)
141-155	electron_diff_flux_energy	double	15	energies for the fluxes in electron_diff_flux_energy (MeV)
156-170	electron_diff_flux	double	15	(based on 9 parameter fit) electron flux at energies electron_diff_flux[i] (particle/(cm² sr MeV sec))
171-179	Efitpars	double	9	fit parameters for 9 parameter electron fit

D. R. Evans (N7DR)

2017-06-30

Continued Rationalisation and Sanity Checking of the LANL GPS Charged-Particle Dataset

Stage 21: mark invalid values of L-shell fields for the CXD experiment

Sanity checks

decimal_day

Geographic_Latitude

Stage 22: reformat values of Geographic_Latitude

Geographic_Longitude

Stage 23: reformat values of Geographic_Longitude

Rad_Re

Stage 24: reformat values of Rad_Re

Stage 25: reformat values of LEP_thresh

No comments:

Post a Comment