Our adventures in groundwater data reconnaissance has brought us to the gates of CASGEM. The Department of Water Resources (DWR) has provided a mapping and data portal for groundwater elevation data. CASGEM or "California Statewide Groundwater Elevation Monitoring" was launched in December of 2010 to help local agencies comply with Water Code Section 10927.
Generally, CASGEM has three different users in mind: entities attempting to meet compliance standards with the state Water Code, DWR managers, and the public. The later, which is the role I assume for the sake of this blog post is the least developed use-case. I will highlight the barriers, and suggest opportunities for improvement. That said, the fact that this service exists, and that this data is even available in the first place is remarkable. However, the classic open data conundrum applies "just because the data is made available, doesn't mean the data is reasonably accessible or "open". To aid in this disparity, the Sunlight Foundation has published "Guidelines for Open Data Policies", which provides a vision for open data policies, and serves as living document to chronicle evolving priorities for data shared by government agencies.
During our Summer of Groundwater initiative, we here at the New California Water Atlas will be conflating discrete water elevation data from both DWR and the USGS. While the USGS has far less groundwater monitoring wells, it is the only source of instantaneous data (more on this later). Using the bulk data download option (see image and notes below to navigate the two tracks), I have retrieved (as a test) 1 years worth of elevation data for the county of Fresno. 1612 rows of time-series data are returned. Unfortunately with this method, no latitude or longitude columns are returned. If it were easy, I wouldn't be writing this :)
Steps 1-6 is a series of login and required registration fields. Apart from DWR staff and partner organizations, these registration steps pose a considerable barrier to the public user. Users who simply visit the site to download data should not have to preregister. With proper analytics in place, tracking relevant user information and behaviors should be sufficient feedback for later improvements. Nevertheless, once logged-in, there are two routes to downloading data (denoted here as track "a" and "b" respectively).
Track "a" is a mapping interface, where after various filter criteria are entered and point data is displayed on the map. Steps 7a through 12a illustrate this workflow. Track "a" for my geospatial purposes is insufficient. The mapping portal limits downloads exceeding 1000 rows. As mentioned above, Fresno over the course of one year returns 1612 rows. As a user, I am interested in obtaining data for five years let alone one. This track does not work for a serious study of groundwater elevation, which requires a few years of remote sensor data to properly evaluate trends.
Track "b" is a series of dialogues that will allow the user to export a csv file with no size limits. One catch though, the mapping interface in track "a" exports a csv with latitude and longitude columns. The bulk download in track "b" contains no latitude or longitude columns :) We will revisit this issue in my forthcoming blog post "Data Reconnaissance: Groundwater 2", where I will split the CASGEM_ID into lat. long. columns using Google Refine.
CASGEM could improve how it serves its public users by eliminating a login step, or providing a lighter login. A less obtrusive login scenario could ask users to confirm by email, or utilize OAuth. Either way, any extra steps beyond a single login step, or asking public users to fill out multiple fields in addition to email is asking too much and amounts to an accessibility hurdle.