Revised Versions of the LANL GPS Charged-Particle Dataset

In the most recent post on this subject, I described various versions of the LANL GPS charged-particle dataset designed to improve the quality of the dataset as compared to the quality of the original release. Here, I intended to continue that process, but, as I describe below, due to some invalid data in the original dataset, I have reprocessed the data and uploaded a new file containing the resultant records.

First, though, I should mention that I received confirmation via private communication with LANL that, as I suspected, the value of collection_interval is intended to be taken from the set { 24, 120, 240, 4608 }, thus supporting the processing performed last time for stage 4.

However, I also received notification that in the original file ns41_041226_v1.03.ascii, the day number is incorrect for all the data marked as being in 2005, except for the very first entry. That is, everything past line 2330 of the original file is invalid.

The easiest way to remove the bad data seems to be (unfortunately) to go back to the the beginning of the processing and simply remove all the lines past line 2330 in file ns41_041226_v1.03.ascii, since I can think of no easy, foolproof way to remove the erroneous lines from the files created in stage 4 (or, indeed, any earlier stage).

Accordingly, we'll go back to the beginning and create new stages as follows:

Stage 1: Remove all lines past line 2330 in the file ns41_041226_v1.03.ascii

This is easily accomplished simply by transferring all the data files to the stage-1 directory unchanged, and then applying the command:

  sed -i '2331,99999d' -i ns41_041226_v1.03.ascii 

We then execute several stages as before (except that what was stage n now becomes stage n+1) -- see the last post on this subject for details.

Stage 2: All the data for each satellite in a single file

Stage 3: Remove all records marked as bad

The next two stages are no longer necessary, as removing the tail of file  ns41_041226_v1.03.ascii removed the records with bad values of decimal_day or collection_interval from the dataset. For the sake of consistency, though, I have retained the stages, although if you are duplicating the dataset there should be no need to create them (simply use the stage 3 files instead):

Stage 4: Remove all records marked with invalid day of year

Stage 5: Correct time information

Here are the number of records for each satellite at the end of the processing for stage 5:

Satellite Stage 5 Records
ns41 1,990,891
ns48 1,105,991
ns53 1,331,687
ns54 1,939,151
ns55 1,055,328
ns56 1,680,887
ns57 1,082,626
ns58 1,175,519
ns59 1,516,528
ns60 1,495,541
ns61 1,470,445
ns62 775,535
ns63 652,110
ns64 344,702
ns65 480,935
ns66 446,801
ns67 327,994
ns68 306,513
ns69 262,992
ns70 110,971
ns71 221,336
ns72 182,332
ns73 145,858

As a checkpoint, I have uploaded the new stage 5 dataset (I have also deleted the old stage 4 dataset). The MD5 checksum of this file is 774cc449fbd6fac836e17196cdaac363.





No comments:

Post a Comment

Note: Only a member of this blog may post a comment.