data_dumping_doc - EMC

Dashboard

Observational Data Dumping at NCEP

NOAA/NWS/NCEP/EMC
(Last Revised 2/12/2018)

Observational Data Dumping at NCEP

Dennis Keyser - NOAA/NWS/NCEP/EMC
(Last Revised 2/12/2018 - should be up to date!)

Please take a moment to read the Disclaimer for this non-operational web page.

The dumping of observational data is the first step in each NCEP network production suite. At the appropriate network data cutoff time, up to two dump jobs are executed simultaneously. Once the two dump jobs have completed, a separate dump post processing job is initiated.

===> Dump Job 1 performs the following steps in sequence:

A . Copying Files For Later Use By Analyses

In the Global Forecast System (GFS) and Global Data Assimilation System (GDAS) network runs, GRIB files containing current analyses of snow depth, ice distribution, and sea-surface temperature from NESDIS are copied from the NCEP Weather and Climate Operational Supercomputing System (WCOSS) /dcom database into network-specific (“/com”) directories. These fields will be read later by the Global Gridpoint Statistical Interpolation (GSI) analysis. (Note: If the current day's files are not available, then files between one-day old and ten-days old, depending upon the product, are copied.)

In the North American Model (NAM) network runs, GRIB files containing current 16'th mesh (24 km) analyses of snow/sea ice coverageproduced by the NOAA/NESDIS Interactive Multisensor Snow and Ice Mapping System (IMS) and 8'th mesh (48 km) Northern Hemisphere snow-depth/sea-ice produced by the Air Force are copied from the NCEP WCOSS /dcom database into network-specific (“/com”) directories. These fields will be read later by the Regional Gridpoint Statistical Interpolation (GSI) analysis. (Note: If the current day's files are not available, then one-day old files are copied.)

In the full-cycle Rapid Refresh (RAP) network, GRIB files containing current 96'th mesh (4 km) analyses of snow/sea ice coverage produced by the NOAA/NESDIS Interactive Multisensor Snow and Ice Mapping System (IMS) are copied from the NCEP WCOSS /dcom database into network-specific (“/com”) directories. These fields will be read later by the RAP Gridpoint Statistical Interpolation (GSI) analysis. (Note: If the current day's files are not available, then one-day old files are copied.)

B. Dumping of BUFR Observational Data (excluding WSR-88D Level II radial wind and reflectivity - see Dump Job 2 for the dumping of these data)

The process of accessing the observational database and retrieving a select set of observational data is accomplished in several stages by a number of FORTRAN codes. This retrieval process is run in all of the operational networks many times a day to assemble “dump” data for model assimilation. The script that manages the retrieval of observations provides users with a wide range of options. These include observational date/time windows, specification of geographic regions for filtering (via either a lat/lon box, a center point lat/lon and radius, or a lat/lon grid point mask), data specification and combination, duplicate checking and bulletin “part” merging, and parallel processing.

The primary retrieval software performs the initial stage of all data dumping by retrieving subsets of the NCEP WCOSS BUFR /dcom data base that contain all of the data base messages valid for the data type, geographical filter and time window requested by a user. (Recall that the /dcom data base is continuously updated with new data as the data GTS decoder and satellite ingest jobs run.) The retrieval software looks only at the date in Section One of the BUFR message to determine which messages to copy for a particular data type. This results in an observing set containing possibly more data than was requested, but allows the software to function very efficiently.

The second stage of the process performs a final 'winnowing' of the data to an observing set with the exact time window requested ¹. This is done within the codes which remove exact- or near-duplicate reports (the nature of which is data type dependent) and merge bulletin parts for upper-air reports from the TAC-feed.

¹Normally, the six-hour cycle GFS, GDAS, and Climate Data Assimilation System (CDAS) network runs dump BUFR data globally over a six-hour time window centered on the analysis time. The six-hour cycle NAM network runs normally dump data within the expanded WRF-NMM-model domain over a six-hour time window centered on the analysis time [the NAM runs six hourly update ("catchup") cycles to assimilate data hourly between 6-hours and 1-hour prior to cycle time]. The one-hour full-cycle RAP, partial-cycle RAP (RAP_PCYC), early-cycle RAP (RAP_ERLY), early-HRRR-cycle RAP (RAP_EHRRR), RTMA and URMA network runs normally dump BUFR data within the expanded WRF-NMM-model domain (a superset of the RAP, RTMA and URMA domains) over a one-hour time window centered on the analysis time. The 15-minute Rapid-Update RTMA (RTMA_RU) network runs normally dump BUFR data with the expanded WRF-NMM-model domain over a one-hour time window centered on the analysis times of 00, 15, 30 and 45-munites past each hour. The RTMA and URMA dump domain includes Guam and surrounding waters. (Note: The full-cycle and early-cycle RAP shares its PREPBUFR files with the HRRR. The early-cycle-HRRR RAP dumps WSR-88D Level II radial wind and reflectivity specifically for the HRRR.)

The final stage of the process is the application of manual quality marks to the data extracted. The quality marks are provided by personnel in two groups: the NCEP/NCO Systems Integration Branch (SIB) and the NCEP/Ocean Prediction Center (OPC). The NCEP/NCO/SIB Senior Duty Meteorologists (SDMs) can apply quality markers to individual variables in many observational data types such as rawinsonde, dropwinsonde, PIBAL, aircraft, satellite wind, surface land (including mesonet), surface marine, wind profiler/SODAR and Vertical Azimuth Display (VAD) wind reports. These markers either ensure that the datum marked will be assimilated by the particular analysis regardless of any subsequent quality control on it (called a "keep" flag), or ensure that it will NOT be assimilated (called a "purge" flag). The SDMs use an interactive program on the WCOSS which initiates the off line execution of automated quality control programs run in the subsequent PREPBUFR processing steps and then review the programs’ decisions before making assessment decisions. The SDMs use satellite pictures, meteorological graphics, continuity of data, input from reporting stations, past station performance and horizontal data comparisons (buddy checks) to decide whether or not to override quality control flags from the automated programs. All flags are stored in an ASCII file on the IBM-SP for use during this data retrieval process. The NCEP/NCO/SIB also maintains a list of data that should be rejected based on, among other things, monthly statistics provided from the NCEP and other international centers, and feedback from data producers. All rejected data receive either a "reject" or "purge" flag here. The flags are appended to the same ASCII file used for storing the SDM quality marks. NCEP/OPC personnel perform real-time interactive quality control of global surface marine meteorological data and sea surface temperature using a graphical interactive program called CREWSS (Collect, Review, and Edit Weather data from the Sea Surface). CREWSS provides an evaluation of the quality of the marine surface data provided by ships, buoys (drifting and moored), Coastal Marine Automated Network (CMAN) stations, and tide gauge stations by comparing the observations to GFS model first guess fields for all four synoptic periods. Data that differ from the first guess fields by more than certain amounts are then examined via techniques that involve buddy checks versus neighboring platforms, the platform’s track, and a one week history for each platform. The NCEP/OPC personnel can either mark these data according to their quality, here applying either a "keep" or "purge" flag, or they can correct obvious errors in the data, such as incorrect hemisphere, misplaced decimal, etc. (corrected data receive a "good" quality mark in the subsequent PREPBUFR processing steps.) Upon completion of interactive quality control, an ASCII text file containing all quality control decisions and corrections is then uploaded to the WCOSS for use during this data retrieval process.

Each data type selected for dumping is associated with a unique mnemonic string which represents a particular BUFR type and subtype in the /dcom database. The complete list of BUFR data types is shown in Table 1.a. This includes obsolete data types, future data types, and current data types which are currently not dumped in any network job. In order to limit the number of output dump files in the operational network jobs, like data types are grouped together and represented by sequence or group mnemonics. The data group mnemonics used to generate dump files in the various NCEP networks (including obsolete types) are read by either the subsequent PREPBUFR processing steps , by the subsequent analysis codes, or by neither according to network. See Table 1.b for a listing of data group mnemonic dumps read by the PREPBUFR processing steps and Table 1.c for a listing of data group mnemonic dumps read by the analysis codes.

C. Re-processing of BUFR Observational Data Dump Files

Some of the BUFR data dump files are re-processed into new BUFR files such that they can be used properly by the subsequent PREPBUFR processing or analysis programs.

1. SSM/I data - all network runs (NOTE: The SSM/I data went bad in November 2009 resulting in no data being processed. All processing here was permanently turned off in October 2010): The “reports” in the SSM/I products BUFR dump files (group mnemonics “ssmip” or “ssmipn”, see Table 1.b) consist of orbital scans, each of which contain 64 retrieval footprints of one or more products. The program PREPOBS_PREPSSMI unpacks selected products out of the scans, superobs them onto a one-degree latitude/longitude grid (optional in some network runs) then encodes them as individual “reports” in the output, re-processed, BUFR file which contains only those data needed for subsequent PREPBUFR processing. The output filename contains the qualifier “spssmi” (see Table 1.b, key for superscript 2 in “NET” column). The GDAS, GFS and CDAS network runs superob the “operational” rainfall rate product generated at FNMOC, and the surface ocean wind speed and total column precipitable water products generated using a Neural-Net 3 algorithm (OMBNN3) developed by the Marine Modeling Branch of NCEP/EMC. The NAM network runs superob the “operational” surface ocean wind speed and total column precipitable water products generated at FNMOC. The upper-air RUC network run processes the same products as the NAM network runs but it does not superob the data.

2. QuikSCAT data - NAM, GFS, GDAS and CDAS network runs (NOTE: The QuikSCAT data went bad in November 2009 resulting in no data being processed. All processing here was permanently turned off in October 2010): Each “report” in the QuikSCAT BUFR dump file (group mnemonic “qkscat”, see Table 1.b) consists of four sets of nudged wind vectors and other raw scatterometer information. The program WAVE_DCODQUIKSCAT unpacks each report checking the report date for realism, selecting the proper nudged wind vector, and excluding reports over land, reports with missing nudged wind vector, reports with missing model wind direction and speed, reports with probability of rain greater than 10%, and reports at the edges of the orbital swath. Reports passing checks are then superobed onto a one-half degree lat/lon grid according to satellite id and encoded into the output, re-processed BUFR file which contains only those data needed for subsequent PREPBUFR processing. The output filename contains the qualifier “qkswnd” (see Table 1.b, key for superscript 1 in “NET” column).

3. TRMM TMI data - GFS, GDAS and CDAS network runs (NOTE: The TRMM TMI data went bad in April 2015 resulting in no data being processed. All processing here was permanently turned off at that time): Each “report” in the TRMM TMI BUFR dump file (group mnemonic “trmm”, see Table 1.c) is at full footprint resolution. The program BUFR_SUPERTMI unpacks each report checking the validity of the satellite id, observation date and total precipitation observation. Reports passing checks are then superobed onto a one-degree lat/lon grid according to satellite id and encoded into the output, re-processed BUFR file. The output filename contains the qualifier “sptrmm” (see Table 1.c, key for superscript 1 in “NET” column). The Global GSI analysis (GFS and GDAS network runs only) reads the superobed data directly from the reprocessed "sptrmm" BUFR dump file (these data do not pass through the PREPBUFR processing steps).

4. WindSat data - NAM, RAP, RAP_PCYC, RAP_ERLY, RTMA, URMA, RTMA_RU, GFS, GDAS and CDAS network runs: Each “report” in the WindSat BUFR dump file (group mnemonic “wndsat”, see Table 1.b) consists of four sets of nudged wind vectors and other raw scatterometer information. The program BUFR_DCODWINDSAT unpacks each report checking the report date for realism, selecting the proper nudged wind vector, and excluding reports not explicitly over ocean, reports with missing nudged wind vector, reports with missing model wind direction and speed, and reports with a "bad" or "no retrieval" EDR quality flag. Reports passing checks are then superobed onto a one-degree lat/lon grid according to satellite id (GFS, GDAS and CDAS networks only) and encoded into the output, re-processed BUFR file which contains only those data needed for subsequent PREPBUFR processing. The output filename contains the qualifier “wdsatr” (see Table 1.b, key for superscript 5 in “NET” column). (NOTE: WindSat data has not been processed since August 2012 due to a format change in the raw files. In all likelihood these data will not be restored.)

5. ASCAT data - NAM, RAP, RAP_PCYC, RAP_ERLY, RTMA, URMA, RTMA_RU, GFS, GDAS and CDAS network runs: Each “report” in the ASCAT BUFR dump file (group mnemonic “ascatt”, see Table 1.b) consists of two sets of nudged wind vectors and other raw scatterometer information. The program WAVE_DCODQUIKSCAT unpacks each report checking the report date for realism, selecting the proper nudged wind vector, and excluding reports over land, reports with missing nudged wind vector, reports with missing model wind direction and speed, and reports with one or more "critical" wind vector cell quality flags set. Reports passing checks are then encoded (without superobing) into the output, re-processed BUFR file which contains only those data needed for subsequent PREPBUFR processing. The output filename contains the qualifier “ascatw” (see Table 1.b, key for superscript 6 in “NET” column).

===> Dump Job 2, running simultaneously with Dump Job 1, performs the following single step:

Dumping of WSR-88D Level II radial wind and reflectivity BUFR Data

This currently runs in only the NAM network. The processing is identical to that described in Dump Job 1, Step B above. The dumping of WSR-88D Level II radial wind and reflectivity data is performed in a separate job from the dumping of all other data in the NAM network in order to save computation time since it takes almost as long to dump Level II data here as it takes to dump all other observational data in Dump Job 1.

===> Tropical Cyclone Processing Job, running simultaneously with Dump Job 1 and, in the NAM network (NOTE: No longer in NAM after March 2017) Dump Job 2, performs the following steps in sequence:

A. Quality Control of Tropical Cyclone Bulletin Data (NOTE: No longer runs in the NAM after March 2017. After July 2017, the GFS and GDAS perform the quality control of tropical cycle bulletin data in a process that is now upstream and separate from obs processing.)

In the GFS and GDAS network runs, tropical cyclone bulletins valid for the current cycle from the Joint Typhoon Warning Center (JTWC) and Fleet Numerical Meteorology and Oceanography Center (FNMOC) are read from the NCEP WCOSS /dcom database and merged into the proper record structure by the program SYNDAT_GETJTBUL. Next, tropical cyclone bulletins valid for the current cycle from the NCEP/Tropical Prediction Center (TPC) are read from the TPC directory on the NCEP WCOSS (these are already in the proper record format). Finally, manually generated tropical cyclone bulletins are read from the NCEP WCOSS database. The latter can be generated by the NCEP/NCO Senior Duty Meteorologist (SDM) in the event that data from other sources are not available.

Next, the program SYNDAT_QCTROPCY runs in order to merge the tropical cyclone records from the various sources and perform quality control on tropical cyclone position and intensity information. Some of the checks performed include duplicate records, appropriate date/time, proper record structure, storm name/id number, records from multiple institutions, secondary variables (e.g. central pressure), storm position and direction/speed. The emphasis is on internal consistency between the reported storm location and prior motion. The output tropical cyclone vital statistics (tcvitals) file is then copied to the network-specific /com directories in the NCEP WCOSS. This file is read in the next tropical cyclone relocation step in the GFS and GDAS networks.

B. Relocation of Tropical Cyclone Vortices in the Global Sigma (First) Guess (NOTE: After March 2017 the NAM, and after July 2017 the GFS and GDAS, performs relocation of its own first guess in a process that is now separate and separate from obs processing.)

In the GFS and GDAS network runs, the quality-controlled tropical storm position and intensity field (tcvitals) file valid at the current time (output by the previous tropical cyclone record q.c. step). along with the tcvitals files valid 12- and 6-hours prior to the current time, and the "best" global sigma first guess and global pressure grib files valid 6-hours prior to the current time, 3-hours prior to the current time, at the current time, and 3-hours after the current time are input to a series of programs (SUPVIT, GETTRK, RELOCATE_MV_NVORTEX). These programs relocate one or more tropical cyclone (or hurricane) vortices in the global sigma first guess files valid 3-hours prior to the current time, at the current time, and 3-hours after the current time. The updated global sigma guess file for the current time is later read in the PREPBUFR processing by the program PREPOBS_PREPDATA and used by the various quality control programs in the PREPBUFR processing stream. In the GFS and GDAS networks, the updated global sigma guess files for all three times (current time, for 3-hours prior to the current time, and for 3-hours after the current time) are read by the subsequent Global GSI analysis. This processing may also (but usually not) generate an updated tcvitals file valid at the current time. This file, if generated, contains only records for "weak" vortices which could not be used to update the global sigma first guess here. It would be read later in the PREPBUFR processing by the program SYNDAT_SYNDATA in the GFS and GDAS networks in order to generate tropical cyclone bogus wind reports. If this file is empty, no bogus reports will be generated by SYNDAT_SYNDATA. This updated tcvitals file is not considered in the NAM network runs as the original tcvtials file, output by the previous tropical cyclone record q.c. step, is always input to SYNDAT_SYNDATA. Although tropical cyclone relocation is not used in the NAM runs (other than to provide a better guess for PREPBUFR quality control programs) , the t-06 NAM does start with a global sigma guess which reflects tropical cyclone relocation.

Note1: This job runs only in the GFS and GDAS networks, and only if TPC and/or JTWC/FNMOC tropical storm records are originally present and valid at the current time.

===> Dump Post-Processing Job, running after both Dump Job 1 and Dump Job 2 have completed, performs the following single step:

Post-processing of BUFR Observational Data Dump Files

The completion of the data dump job(s) triggers a job which performs post-processing on the data dump files just created. This job does not produce any output necessary to the successful completion of the analysis/forecast network [indeed it runs simultaneously with the PREPBUFR Processing Job which is also triggered by the completion of the data dump job(s)].

The first job step prepares a table of data counts for the various reports just dumped via the execution of the program BUFR_DATACOUNT. These counts are compared to the running average over the past 30 days for each report type for the particular network and cycle time. If the current dump count for a particular type is considered abnormally low (for most report types this means more than 50% below the 30 day average), a dump alert is generated. The action taken for low dump counts depends upon the report type. For those types considered "critical" to the subsequent assimilation system, a low dump count generates diagnostics and triggers a code failure and a return code of 6 in the dump alert job . For those types considered "moderately-critical" (all types that are assimilated which are not in the "critical" category), a low dump count generates diagnostics and a non-fatal return code of 5 in the dump alert job. For those types considered "non-critical" (all types that are not assimilated in the particular network), a low dump count generates diagnostics and a non-fatal return code of 4 in the dump alert job. In all cases, a complete listing of dump counts vs. the 30 day average, along with those types which are either low or high (for most report types this means more than 200% above the 30 day average) is sent to the SDM. High dump counts do not generate non-zero return codes in the dump alert job but they do generate diagnostics. Trends in the 30 day averages vs. those for 3-, 6-, 9- and 12-months ago are also recorded for the SDM (report types trend low vs. one of these previous averaging periods if the current 30 day average is more than 20% below the 30 day average for that period, or report types trend high vs. one of these previous averaging periods if the current 30 day average is more than 20% above the 30 day average for that period). Currently this dump count and alert processing runs only in the NAM (tm00 only), GFS and GDAS networks.

The next job step executes the program BUFR_REMOREST which removes or masks, from the appropriate dump files, certain data types that are restricted (either by the data producers themselves or by the WMO) from redistribution outside of NCEP. NCEP/NCO has created a very strict policy on who may or may not have access to restricted data. The resulting dump files, gleaned of all restricted data, are given a suffix qualifier of ".nr" in the network-specific (*/com*) directories on the NCEP-WCOSS.

The next dump post-processing job step executes the program BUFR_LISTDUMPS which generates files containing text listings of all reports in the various BUFR data dump files. These text files are then copied to the network-specific ("*/com*") directories on the WCOSS in order to provide diagnostic information for troubleshooting problems in the data, etc. Files containing listings of dump files that have been stripped of all restricted data are given the suffix qualifier ".nr".

The post-processing job also contains a step which generates unblocked versions of the BUFR data dump files and copies them to the (/com) directories (again, files containing unblocked forms of dump files that have been stripped of all restricted data are given the suffix qualifier ".nr"). The unblocked files are then copied to servers for use by organizations outside of NCEP. (The native blocking on the IBM-SP machine is Fortran 77.) Restricted data are not copied to these servers. (NOTE: No longer invoked after migration to WCOSS because BUFR files are unblocked by default.)

Finally, in the all networks, the final post-processing job of the day performs a data average processing step via the execution of the program BUFR_AVGDATA. This updates the 30 day running average for each report type dumped, for each cycle for which a dump is generated. These "current" 30 day averages are saved in text files, according to the network, in either the "/com/arch/prod/avgdata" [NAM (tm00 only), RAP], "/com2/arch/prod/avgdata" (RTMA, URMA), or "/gpfs/hps/nco/ops/com/gfs/prod/sdm_rtdm/avgdata" (GFS, GDAS) directory on the NCEP WCOSS. These files are used by the dump alert processing in the NAM (tm00 only), GFS and GDAS networks in order to generate alerts for high or low dump counts for the current dump vs. the current 30 day average (see paragraph two in this section). For the final post-processing job of a particular month, the current 30 day average for the NAM (tm00 only), GFS and GDAS networks is saved off in a separate file for that month in the same "/com" directory as the current 30 day average files. These past month 30 day average files are used to check for high and low trends in the current NAM (tm00 only), GFS or GDAS 30 day average for a particular report vs. the 30 day average for 3-, 6-, 9- and 12-months ago (again, see paragraph two in this section). Only the most recent 12 months of 30 day averages are saved here for the NAM (tm00 only), GFS and GDAS networks.

The NCEP production suite schedule, for those networks which originate with a dump of observational data, is shown in Table 2. “DUMP” indicates the name of the Dump Job 1, "DUMP2” indicates the name of the Dump Job 2, "DPOST” indicates the name of the Dump Post-processing Job, "PREP" (and "PREP1" and "PREP2" in the CDAS network) indicates the name of the PREPBUFR Processing Job, "ANAL” indicates the name of the Analysis Job, "FCST” (and "FCSTH" and "FCSTL" in the GFS network) indicates the name of the Forecast Job, "PPOST" (and "PPOST1" and "PPOST2" in the CDAS network) indicates the name of the PREPBUFR Post-processing Job, "GESS" in the RTMA and URMA networks indicates the name of the job which retrieves the first-guess and "APOST" in the RTMA and URMA networks indicates the name if the Analysis Post-processing Job. The initiation of the dump jobs ("DUMP" and "DUMP2") and the tropical cyclone processing job ("TROPCY") are triggered by the clock at the times indicated. All subsequent jobs run in sequence. "RAP_PCYC" refers to the partial-cycle Rapid Refresh network runs. "RAP_ERLY" refers to the early-cycle Rapid Refresh network runs. "RAP_EH" refers to the early-cycle-HRRR Rapid Refresh network runs. "RTMA_RU" refers to the Rapid-Update RTMA runs.

data_dumping_doc - EMC

Dashboard

Observational Data Dumping at NCEP

NOAA/NWS/NCEP/EMC (Last Revised 2/12/2018)

Observational Data Dumping at NCEP

Dennis Keyser - NOAA/NWS/NCEP/EMC (Last Revised 2/12/2018 - should be up to date!)

NOAA/NWS/NCEP/EMC
(Last Revised 2/12/2018)

Dennis Keyser - NOAA/NWS/NCEP/EMC
(Last Revised 2/12/2018 - should be up to date!)