WxVet
(Weather Data Vetter)

Weather Logo

See the main page for the WxTools overview

Description

This program is intended for vetting the data collected from Oregon Scientific weather stations (WxLogger for the WMR180, WxLog for the WMR928) or from the Pro-Signal PSG04173 and similar Fine Offset clones (WxServer). The name WxVet derives from the common abbreviation WX for "weather conditions". The program was developed by Anthony S. Wilson under the supervision of Ken Turner.

Information logged from the weather station is occasionally corrupt. Although the serial interface uses a checksum, this is relatively weak at detecting errors. The weather station also seems (in my environment) to lose touch occasionally with its sensors. The net effect is that the data collected by WxLog may have errors.

The first purpose of WxVet is to report errors in the raw data. This requires the data files to be edited by hand. Usually, the errors are easy to fix. Suppose some observations are corrupt because of temporary loss of contact with the sensor (i.e. the data values are zero). Plausible values can be interpolated from the nearby values. The raw data files are gzipped text so they can easily be edited. You may have to gunzip the data file first, edit it, then gzip the file again. Some editors such as jEdit will allow you edit gzipped files directly.

The second purpose of WxVet is to summarise the data for use by other programs. WxVet condenses data in the sense of summarising it, not in the sense of using some data compression algorithm such as gzip. The following condensed files are produced by WxVet using raw day files named nn.dat.gz (gzipped data for day nn). These files are placed in the directory YYYY/MM, except for year summaries which appear in directory YYYY.

Cnn.dat.gz
day data condensed into a single-line summary
Wnn.dat.gz
week data comprising seven lines of condensed day summaries
CWnn.dat.gz
week data condensed into a single-line summary
Mnn.dat.gz
month data comprising 28 to 31 lines of condensed day summaries
CMnn.dat.gz
month data condensed into a single-line summary
Ynnnn.dat.gz
year data comprising 12 lines of condensed month summaries
CYnnnn.dat.gz
year data condensed into a single-line summary

The following files are also produced for analysis with SPSS (Statistical Package for the Social Sciences). However, these files are in a simple format that can be read and analysed by any spreadsheet.

day.dat
day data comprising condensed single-line day summaries
week.dat
week data comprising condensed single-line week summaries
month.dat
month data comprising condensed single-line month summaries
year.dat
year data comprising condensed single-line year summaries

Customisation

The WxVet program is provided in source form. Store it where you wish. Compile everything with javac *.java. (This is what the included Unix shell script build does.) The code must be edited to adapt it to your local environment. Specifically, the following constants must be altered at the start of the main class (VetNCompress):

directory
where the weather data files are stored (no '/' or '\' at end of name)
firstMonth
number of first data month in year.

Ideally the first month should be 1, so that the program summarises calendar years. Suppose your data collection starts on 3rd May. You could set firstMonth to 5, but years would then be summarised from 3rd May to 2nd May. You could simply invent fictitious data for the start of a year by creating copies of the 3rd May data.

In addition, you may wish to alter the following constants:

done
name for file defining days already processed
prefix
compressed data file prefix
spssday
filename for (SPSS) day summary
spsweek
filename for (SPSS) week summary
spssmonth
filename for (SPSS) month summary
spssyear
filename for (SPSS) year summary
start
name for file defining the next day to process
suffix
data file suffix.

If you use the separate WxDisplay program, its constants must match those for the vetter.

Class Checker checks the validity of the raw data. In particular, the start of this file includes the constant array limits. This defines the largest difference allowed between successive measurements, the minimum value and the maximum value. It may be necessary to edit the largest difference allowed. The current values are appropriate for the author's locality. But another locality might have much sharper changes in, say, pressure or temperature.

Usage

The data format used by WxVet is that of the WxLog program, with minor variations according to the file. Run the vetter with java VetNCompress. (The included Unix shell script wxvet may be used to do this, though it may need some adaptation according to where files are stored.) Running the vetter will condense the data from the last processed file or from the start, according to the control files.

Control Files

The vetter is controlled by two files that record how much data has been vetted so far. This means that new day files can be added at any time. When the vetter is run, it catches up with the new files. To start initially, set start.txt and done.txt according to the first day of data. To redo all vetting, re-initialise these control files and delete the corresponding condensed data files. (These begin with 'C' unless the prefix constant has been altered.) If the vetter is confused by an error, it is sometimes necessary to adjust the files manually in order to force it to re-vet the data. The included Unix shell script wxreset can be used to clean up vetted files prior to re-vetting, though it may need some adaptation according to where files are stored.

The start file has the following format:

DD MM YYYY    next day month year to be vetted and condensed

The done file has the following format:

DD            week day number last condensed (relative to start)
WW            week number start last condensed (relative to start)
MM            month number last condensed (relative to start)]
YY            year number last condensed (relative to start)]
DD            day for next week to be condensed
MM            month for next week to be condensed
YYYY          year for next week to be condensed
Day           day name for last day condensed

The files start-init.txt and done-init.txt are provided as examples of what these files might start out as.

Licence

This program is free software. You can redistribute it and/or under the terms of the GNU General Public License as published by the Free Software Foundation - either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful but without any warranty, without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details. You may re-distribute this software provided you preserve this README file.

History

Version 1.0: First public release, Ken Turner, 7th July 2005

Version 2.0: Ken Turner, 19th July 2009

Version 2.1: Ken Turner, 18th February 2013

Version 3.0: Ken Turner, 13th February 2017

Version 4.0: Ken Turner, 1st March 2021