Database Structure

How are the data organized

There are seven tables of data, each of which is relationally linked to the others. The tables, their columns, primary, and secondary keys are shown in the figure below. The redundancy in the table "stations" is to make it easier to query for stations from a particular river, system, or basin without having to do joins to all the tables. The redunancy in "front_of_books" is the same - it enables queries by station, station type, or system without excessive joins.

Database structure
Details of database design. Table names are dark green, primary keys are light green, and secondary keys are blue.

List of tables and their items with descriptions

    river_systems

    This table gives the basic information about what river districts (such as the Yangtze, the Yunnan-Tibet International Rivers, or the Yellow River) are numbered in the database.

  1. system_id (PK) - this is the database number for the river system and the primary key for the table
  2. system_name - this is the name for the river system
  3. system_code - this is the Chinese book code for the river system

    basins

    This table defines all the basins for which data are available (for example, the Lancang Jiang (or Mekong))

  1. basin_id (PK) - this is the database number for the basin and the primary key for the table
  2. basin - this is the name of the basin
  3. system_id - this is the secondary key to the river_systems table

    rivers

    This is the list of rivers and which basins and systems they are in. There is some duplicating of river names because some small tributaries are given non-unique names such as "left branch" or "right branch".

  1. river_id (PK) - this is the database number for the river
  2. river_name - this is the name of the river
  3. system_id (SK) - this is the secondary key to the river_systems table
  4. basin_id (SK) - this is the secondary key to the basins table

    types

    This is the definitions of codes for hydrology and weather stations.

  1. station_type (PK) - this is a code for whether a station is a hydrology station or a weather station and is the primary key for this table
  2. type_description - this defines the two station_types

    stations

    This is the list of stations, what type they are, and what river, basin, and system they are in. The redunancy is to make it easy to search for all stations in a particular system, basin, or river, or of a certain type without too many joins.

  1. station_number (PK) - this is the database number for the station and the primary key for the table.
  2. station_name - this is the name of the station
  3. river_id (SK) - this is the secondary key to the rivers table
  4. basin_id (SK) - this is the secondary key to the basins table
  5. system_id (SK) - this is the secondary key to the river_systems table
  6. station_type (SK) - this is the secondary key to the types table

    front_of_books

    This is the information from the first few pages of each of the data books and gives the locations of all of the stations, their names, what rivers they are on, and what data are available for each. The records are unique for each station type, system, year, and book number. There is also a dummy primary key called record_number so that each row has a unique identifier.

  1. record_number (PK) - this is the dummy, unique, primary key for the table
  2. station_type - this is the secondary key to the types table
  3. system_id - this is the secondary key to the river_systems table
  4. station_number - this is the secondary key to the stations table
  5. year - this is the year the book records data for; if a book has multiple years, it is the first year in the range
  6. book_number - this is the unique number given to a station for that particular book; they are not consistent from book to book
  7. longitude - reported longitude of station (if given)
  8. latitude - reported latitude of station (if given)
  9. area - reported upstream area of station (if given)
  10. start_year - year the station started recording data
  11. start_month - month the station started recording data
  12. stage - this is a 0 for no data and a 1 for recorded data on river stage; it is null for weather stations
  13. flow - this is a 0 for no data and a 1 for recorded data on river discharge; it is null for weather stations
  14. sed - this is a 0 for no data and a 1 for recorded data on total suspended sediment; it is null for weather stations
  15. rain - this is a 0 for no data and a 1 for recorded data on precipitation; it is null for hydrology stations
  16. evap - this is a 0 for no data and a 1 for recorded data on evaporation; it is null for hydrology station

    data_table

    This is the bulk of the data and contains all the daily values which have been digitized. They are organized by station, year, month, and day and data is unique for each combination of these; there is also a dummy primary key for a unique single identifier for each record. The date information are not in date format so that it is easier to search for just a particular year or one month in all years.

  1. data_number (PK) - this is the dummy primary key for the table
  2. station_number (SK) - this is the secondary key to the stations table
  3. year - this is the year of the data record
  4. month - this is the month of the data record
  5. day - this is the day of the data record
  6. discharge - this is the discharge for the date at that station (null for weather stations or not recorded on that day)
  7. sed - this is the total suspended sediment for that date at that station (null for weather stations or if data are not recorded for that station on that day)
  8. rain - total precipitation for that date at that station (null for hydrology stations or if not recorded; 0 for no precipitation); this one is a bit confusing because sometimes rain was only recorded for half the year and entries are left blank for either not recorded or no precipitation - we used our best judgment to determine which was the case for each data point
  9. evap - total evaporation for that date at that station (null for hydrology stations)

 

Home  |  Overview  |  Stations  |  Structure  |  Directions  |  Data Sources

Web Curators