First, though, I should mention that I received confirmation via private communication with LANL that, as I suspected, the value of collection_interval is intended to be taken from the set { 24, 120, 240, 4608 }, thus supporting the processing performed last time for stage 4.
However, I also received notification that in the original file ns41_041226_v1.03.ascii, the day number is incorrect for all the data marked as being in 2005, except for the very first entry. That is, everything past line 2330 of the original file is invalid.
The easiest way to remove the bad data seems to be (unfortunately) to go back to the the beginning of the processing and simply remove all the lines past line 2330 in file ns41_041226_v1.03.ascii, since I can think of no easy, foolproof way to remove the erroneous lines from the files created in stage 4 (or, indeed, any earlier stage).
Accordingly, we'll go back to the beginning and create new stages as follows:
Stage 1: Remove all lines past line 2330 in the file ns41_041226_v1.03.ascii
This is easily accomplished simply by transferring all the data files to the stage-1 directory unchanged, and then applying the command:sed -i '2331,99999d' -i ns41_041226_v1.03.ascii
We then execute several stages as before (except that what was stage n now becomes stage n+1) -- see the last post on this subject for details.
Stage 2: All the data for each satellite in a single file
Stage 3: Remove all records marked as bad
The next two stages are no longer necessary, as removing the tail of file ns41_041226_v1.03.ascii removed the records with bad values of decimal_day or collection_interval from the dataset. For the sake of consistency, though, I have retained the stages, although if you are duplicating the dataset there should be no need to create them (simply use the stage 3 files instead):
Stage 4: Remove all records marked with invalid day of year
Stage 5: Correct time information
Here are the number of records for each satellite at the end of the processing for stage 5:
Satellite | Stage 5 Records |
---|---|
ns41 | 1,990,891 |
ns48 | 1,105,991 |
ns53 | 1,331,687 |
ns54 | 1,939,151 |
ns55 | 1,055,328 |
ns56 | 1,680,887 |
ns57 | 1,082,626 |
ns58 | 1,175,519 |
ns59 | 1,516,528 |
ns60 | 1,495,541 |
ns61 | 1,470,445 |
ns62 | 775,535 |
ns63 | 652,110 |
ns64 | 344,702 |
ns65 | 480,935 |
ns66 | 446,801 |
ns67 | 327,994 |
ns68 | 306,513 |
ns69 | 262,992 |
ns70 | 110,971 |
ns71 | 221,336 |
ns72 | 182,332 |
ns73 | 145,858 |
As a checkpoint, I have uploaded the new stage 5 dataset (I have also deleted the old stage 4 dataset). The MD5 checksum of this file is 774cc449fbd6fac836e17196cdaac363.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.