Howto – GIS mapping

Priority: high
Updating: mature

This is an evolved version of the workplan task for mapping.

Change log:

When Who Comment
2021 08 13 Sp17 Created from workplan and accumulated practice within the project to date.
2021 09 15 Sp17 Updated for connecting from QAPP.
2021 09 29 Sp17 More flesh on the bones. This is adequate for review with the QAPP.
2021 10 01 Sp17 Reformatted for export in DOCX.
2022 01 04 Sp17 Updates, bump to 90%.
2023 06 04 Sp17 Mapping has been stable in general for a long time; this edit brings the documentation up to date cumulatively before the DEC review, bump to 95%.
2023 06 15 Sp17 Convert to Markdown, minor updates.
2023 06 21 Sp17 Began a new section about non-confidential exporting.

We are using QGIS for GIS (Welcome to the QGIS project!) and SQLite (SQLite Home Page; DB Browser https://sqlitebrowser.org/) for certain tabular data that join tables to maps.

See also:

1. Objectives

  • Provide digital mapping infrastructure for upstate-wide and individual sampling areas at three geographic scales: Statewide, area (multi-site), and individual site.
  • Document data quality considerations.
  • Provide a way to export non-confidential data for DEC staff

2. Data included in NYSDEC 2021-2025 pesticides project geographic collection

There are three geographic levels of map datasets (see section 4 for more detail): - Statewide simple. - Three substate regions, to be able to use county NRCS soil maps. - Individual sites. Derived from their substate regional map.

All map layers are downloaded to laptop in advance except for a few indicated in the list below.

The map layers in these mapsets are: - Aquifer locations and characteristics. - Principal aquifer moderate resolution map (USGS). - NYS State Geologist’s surficial geology maps. State is divided into several regions.

  • DEC well registry. Includes lat/long, usually total depth, depth to water, depth of screen. Source of some candidate wells.

    • Scanned well completion reports from drillers can be obtained from NYS DEC Division of Water using a FOIL request.
    • The completion reports include a strata log. These can be excerpted, geocoded, and arranged to pop up when a point is clicked on a map. This uses the ImportPhotos QGIS extension.
    • Map is periodically updated by DEC. We made a shapshot of this in 2021 and have not updated it, because almost none of our candidate sites are in the registry.
  • Point shapefile map of which upstate lakes have a FOLA member association; lake watershed maps for selected lakes are in CSLAP lake reports (non GIS format). From Division of Water.

  • USDA NASS Cropscape for agricultural land uses. Have 2017-2019.

  • Topography: USGS topographic map images, loaded on the fly from internet. Good for basemaps.

  • Geology - NYS geology map. Rarely used. Surficial geology is much more used.

  • Hydrography - federal or NYS source. Rarely used.

  • Official watershed boundaries, as needed for EQUIS location tagging. We start with a 12-digit HUC map. Rarely used.

  • NRCS digital soils maps, per county. Omitted from Statewide simple mapset to speed map loading.

  • US EPA ecological regions.

  • Active railways map, from NYS DOT.

  • Base mapping: aerial photos (have color infrared and true color; both loaded on the fly from internet), county boundaries, roads, Census Places

  • Property ownership (tax) maps for selected areas. These are for individual counties and vary in vintage and availability. These are included only in individual site map sets.

  • LIDAR based topographic maps where available. This requires downloading per county, and has been in contours rather than DEM format. These are included only in individual site maps.

  • We made categorical candidate maps using Google Earth searches. Searches there do address geocoding and put results in KML text format in the Google Earth clipboard. Search county by county for key phrase (“sod farms in Onondaga County, New York”), append consecutive KML results into a text file, edit the resulting appended search results to be a single KML point file. Is helps to have a KML syntax-highlighting text editor like Microsoft Visual Studio Code for editing. This will contain duplicates that can be cleaned up via editing in QGIS. Sort the records on site name and delete all but the first of each.

  • PSUR zip code level data: active ingredients and products. (Omitted from site mapsets.)

    • Five years coverage. Have data through 2018; 2018 data are preliminary from PSUR Cornell group as a prerelease made in 2021.
      • May update this during project as more years of data are released.
    • Zip code polygons matching the PSUR zip codes. [HAVE 2000 census ones]. The zip codes in the tabular data are supposed to be from the application destination of use or sales [need to verify]; however there are clearly high density areas in sales data that suggest the pesticide seller in a populated area is reporting their own zip code.
      • Zip code areas are not polygons to the US Postal Service, they are road extents. The US Census Bureau makes an approximation to use polygons.
  • These data have many caveats for completeness and accuracy. They come from pesticide users and sellers. There are quite a few raw entries with invalid zip codes, or invalid EPA product numbers. We use them primarily for relative pesticide intensity mapping, showing as 10 percentile ranges rather than absolute values. This is an undemanding use.

  • In this project we are mainly interested in average to higher pesticide intensity, broadly defined rather than pesticide by pesticide. Thus we mainly have a two or three interval quantity of interest. The data will surely be reasonable at this level of coarseness.

  • The only confidential layers are:

    • Point map of categorical sites, with locations digitized manually on aerial photo base. Lakes are included for convenience. This is stored in the Excel categorical site tracking spreadsheet.
    • Point map of long term sites. Made by address geocoding with Google Maps. In the Excel longterm site tracking spreadsheet.
    • Point map of categorical site and lake sampling locations within sites. Locations digitized manually on aerial photo base. Some sampling locations provided by volunteers. In project’s tabular database and a CSV file.

3. Resource people and entities

  • Cornell PMEP: PSUR zip code level data. (Robert Warfield)
  • Cornell IRIS (Susan Hoskins, Diane Ayers) - technical advice
  • Cornell Mann Library CUGIR - online index to map data for New York
  • State GIS clearinghouse - various data, notably digital ortho aerials
  • DEC Division of Water for the well registry. We will need a means to contact the owners of registered wells for discussions about access for sampling. This can start with FOIL-obtained well completion reports which include the original owner identity, which can be 20 years out of date.
  • FOLA, and member lake associations. DEC Water Division publishes a lake map. Probably this uses lake outlets as points.
  • Federal agencies, mainly USGS, EPA, USDA.

4. The three geographic levels

Coarsest: For statewide. This does not include County soil data. We typically do not include Long Island and New York City in any accumulated maps in this set. However the statewide layers do include the entire state boundaries. The Statewide map includes the ecoregions and active railways layers; these are not in any other maps. The statewide map imports cooperator site geographic coordinates from the Excel spreadsheet we use for tracking status, allowing us to map (confidentially) where our sites are.

The midrange level is pragmatic, forced by the complexity and scale of adding county digital soil surveys. There are three vertical strips of the state – western, center, and eastern – with some overlap. All three contain the same statewide layers, only the County soil maps included differ.

The Statewide maps in the set contain New York City and Long Island, which is sometimes useful for base maps.

Smallest (finest) level, the individual categorical site, are derived from one of the midrange maps. These are primarily for site assessment and reporting (to owner). County soil maps are trimmed back to the site’s county and adjacent counties. Selected site QGIS maps add property tax parcel and LIDAR topographic layers.

Site GIS data were originally partly intended to be taken onsite with landowners; however this has not been necessary; we have never used a computer with an owner while onsite. If we ever activate this, all data will need to be usable without an internet connection. The primary data sets that need to have special offline usability are the digital aerial photos and USGS topographic maps. In practice, site-level QGIS excerpts have been started as simple copies of one of the three substate regional maps, trimming out extraneous county soil maps and opening by default to the zoomed position of the site.

5. Prior practice this inherited

  • Map set statewide in QGIS formats.
    • This provides examples of how to create most of what is needed in this project but some of it is out of date, some would be done better if done a second time from scratch, and some is too much based on a prior ArcGIS collection. None had systematic metadata. So we start from a fresh empty map.
  • Sqlite 3.x database of pesticide products and active ingredients at zip code level from DEC’s Pesticide Sales and Use Reporting (PSUR) system, year by year. Early years include our conversion of product amounts to active ingredient amounts, later years include PMEP-provided active ingredients. Formerly this was in MS Access but we outgrew that and moved to SQLite 3.x. The 2019 version of this database held data through 2016. We consider this part of the project’s geographic database because we only use it for mapping, not the project’s tabular database.

6. Data access and protection

The QGIS maps are stored in Box for easy sharing across desktop and laptop computers, and across Cornell participants. We test this as we evolve the folder structure and bring in selective collaborators. An Extension collaborator in Horticulture was able to use a subset of the maps. In 2021 we exported a map subset for a groundwater class, because a teaching assistant was able to work in QGIS and provide support to students.

The data privacy implications are not yet worked out as of 2023 06 15. Box allows folder by folder access rights setting: none, read-only, read-write (edit). There is no standard encryption mechanism available in Box data stored on a local computer, though online the data are encrypted.

Data stored in Box is replicated selectively across systems. It is stored in the Box internet cloud, where it is encrypted. The geographic data folders are marked for offline storage on three of Pacenka’s computers at Cornell and home. Other group staff tend not to mark Box folders for offline access, instead using the online Box instance via a web browser, and downloading.

Box Drive on a PC is necessary for QGIS and project database usage, and it is most time efficient to designate at least the maps folder for offline use so that data may be loaded and maps browsed without a fast internet connection. Box will download on the fly. Note that some map layers are not stored locally, such as digital aerial photographs and USGS topo maps.

“Box: DEC / maps / common” contains most map layer data.

“Box: DEC / maps / testing” contains the actual statewide simple mapset file and the three substate mapset files. It also contains the project tabular database.

“Box: DEC / maps / individual” contains the site mapsets. There is a deeper folder for some sites when there are LIDAR or property tax polygons incorporated for the site.

7. Exporting the collection for others

(section is a beginning, not a high priority)

The non confidential parts of the spatial database assembled for the project could be helpful to NYSDEC, and perhaps to the public. Most of the data layers are drawn from public sources, thus it is the combination where there could be extra value.

There is one possibly non-public part of the collection, the captures of candidate categorical site locations from Google Earth. Google obtains its content from open web sources, then mines it for addresses and geocodes them to obtain latitude and longitude. We then copy/paste them and merge into a 50+county combination. We need to investigate if Google places any copyright restrictions on its compilations.