DS IN THE REAL WORLD

Capturing Climate & Digging Up Data

How to use two U.S. Databases (NOAA & USDA) to access climate records and soil surveys for your Data Science Project

Photo by Diego Romeo on Unsplash

I recently completed an analysis of climate and soil data for a blog about vineyards in Napa Valley, using two different U.S. Databases.

The websites for both databases are relatively straight forward. After speaking with a few peers, I was surprised that none of them had considered these sites for data before.

So if you are in need of data for an analysis on climate change, or if you are curious about the soil of a several geographic locations, or if you find Earth Day is just around the corner and you are feeling inspired for your next project and don’t know where to begin, then this post is for you.

The following is a guided tour of extracting data from both the NOAA and USDA Web Soil Survey sites.

PART 1: Accessing Data from the National Oceanic and Atmospheric Administration (NOAA) Website

Once you get the hang of it, it is easy to make a request from NOAA’s website and access the data from the National Centers for Environmental Information. This includes information from weather stations around the country, allowing you to pull current and historic data on maximum and minimum temperatures, precipitation, weather events, and a host of other data points.

PRO TIP: Not all stations carry the same data, so it makes sense to do a few searches and be certain the data you seek is aligned between stations and time frames.

STEP 1:

Start here at the Climate Data Online Search page.

Source: NOAA Climate Data Online (https://www.ncdc.noaa.gov/cdo-web/)
  • You will choose your Weather Observation Type/Dataset — for example, Global Summary of the Year
  • Select your date range
  • Select a search level — you can look for individual stations by name or code, or search by city, county, state, or climate or hydrologically categorized divisions and regions
  • Enter a search term (a location name or other identifier)

STEP 2:

This takes you to the search results map. It provides a list of areas of interest based on your search on the left along with key information, such as location ID, period of record, and a preview of the full details based on the data points tracked for each location.

Source: NOAA Climate Data Online (https://www.ncdc.noaa.gov/cdo-web/)
  • The map is great because it provides you with a satellite view
  • If you are looking for a specific location and having issues figuring where it lies within their map, I would suggest opening another browser window and accessing Google Maps.
  • This allows you to have two visual references to compare as you analyze what stations are closest to your location of interest.
  • Click the link for your location in the column on the left

STEP 3:

You have now arrived at a view of your Location Details. This will provide information such as the number of stations collecting data in the area, and the period of record.

Source: NOAA Climate Data Online (https://www.ncdc.noaa.gov/cdo-web/)

In this example for New York City, there are 136 stations included and the period of record goes back to 1869.

  • Scrolling down, you will also see a list of stations and the summarized data inventory
  • This will provide more information on the data points you will receive in your report, broken down by which data types are available
  • For example, if you are looking into Air Temperature, there will be subcategory columns on your report that will show you the Maximum (TMAX) and Minimum (TMIN) temperatures for the data range you selected
  • To select this data set, hit ADD TO CART (Do not worry, it is FREE!)

STEP 4:

When you get to the CART, you get to select options. As you are likely handling data that you will manipulate later:

  • Choose the option for a .csv file
  • Confirm your date range
  • Hit the CONTINUE button
Source: NOAA Climate Data Online (https://www.ncdc.noaa.gov/cdo-web/)

STEP 5:

On the next page, you will be able to set some custom options.

Source: NOAA Climate Data Online (https://www.ncdc.noaa.gov/cdo-web/)
  • Select the station name and geographic location (you never know what you might want to do with longitude and latitude later, and you are here now, so….)
  • Include Data Flags
  • Choose the units of measure — Standard or Metric. Why is this important? If you are making a comparison between how warm London has become versus New York, data for London will be recorded in degrees Celsius, so to be consistent it’s easiest to keep everything in Celsius and covert later in your spreadsheet, if necessary.
  • Finally, what do you want in your report? Air temperature, Precipitation, Sunshine, Wind, or any or all the above? Just check them off and hit CONTINUE

PRO TIP: In some instances, your data request may exceed the threshold of station years (their limit is 1000).

  • In these cases, adjust your date range to split your data into manageable bite sized chunks that the system can handle. You may have to pull multiple times to cover the period of time you are analyzing.
  • You will then need to stitch together the raw data on your own later in Python Pandas or Excel, depending on what you are using it for.

STEP 6:

On the last page, you will receive a review for your order. If everything checks out, then just enter your email address and your .csv file will be processed and sent to you. For the most part, this happens very quickly.

I usually received my reports within 15 to 30 minutes (unless I hit a day when they were performing system maintenance, which they will warn you about on the home page).

Source: NOAA Climate Data Online (https://www.ncdc.noaa.gov/cdo-web/)

WHAT’S THE NEXT STEP?

You will receive an email confirming your submission, and later a second email with a link to the data file (which is generally good for about 5 days). The email also includes a link to the summary data of what you ordered and a link to the most helpful documentation to explain and decode the header columns of each report.

Source: NOAA Climate Data Online (https://www.ncdc.noaa.gov/cdo-web/)

Download the .csv file and run it through EDA as you need to.

You can do this in Python with Pandas, or if you are using Tableau you might as well pull it straight into Excel and save it separately as an Excel file. Keep your raw .csv data safe and manipulate the Excel file. Eventually you will be moving this into your Tableau repository Datasource file.

Overall, if you are looking to access climate data for a data science project, this is a great resource to turn to. You have a lot of flexibility, and a great depth of geographic locations and history to pull from.

PART 2: Accessing Data from the USDA’s Web Soil Survey Website

The United States Department of Agriculture (USDA) provides a Web Soil Survey site through the Natural Resources Conservation Service. It can be accessed here.

Obtaining data from this website is a little more complex than the NOAA site, but with patience and practice you can navigate it fairly easily to scratch the surface or dig as deep as you want (yeah…sorry about that one…soil pun for my daughter….couldn’t resist).

STEP 1: Defining an Area of Interest (AOI)

There are a few methods to achieve this, depending on what you are looking for. You will be able to access drop down menus ranging from Latitude and Longitude location or State and County inputs, to a range of Department of Defense locations:

  • The system allows you to create a Shapefile (.shp or .shx) that you can use to repeatedly access for your specific soil survey. If you are looking at returning to assess the same location repeatedly, I would recommend it and the site will walk you through how to do it.
  • For our purposes, go to the first tab marked Area of Interest (AOI) and scroll down the left navigation bar to Quick Navigation >> Address. In the following example, I have input the address for the Mumm Napa Vineyard, and it returns a satellite image of the location.
  • Directly over the image are the controls for the AOI Interactive Map. Select one of the two red buttons labeled AOI to bring up your selection marker. Your cursor will change, and you can draw a rectangle over your selected area.
  • Once selected, you can move on to the next tab at the top (Soil Map)

STEP 2: Soil Map

The Soil Map will instantly generate numbered map unit symbols within your area of interest, as well as generate information about each map unit, accessible at the left of your screen.

For example, Map Unit 105 is at the center of the screen:

  • The legend tells us that the Map Unit Name is “Bale Clay Loam, 2 to 5 percent slopes”
  • There are 12.6 acres that fit this description in our AOI, comprising 3.9% of the AOI overall
  • If you click on the link in the legend (Map Unit Name) for that unit, you will get a detailed report including information about elevation, precipitation, air temperature, the frost-free period, landform, salinity, drainage, and more (the window on the right in the image above).

STEP 3: Soil Data Explorer

Now let’s focus on the next tab to the right of “Soil Map” and move to the Explorer.

This section has four parts:

  • Intro to Soils — Provides a great overview and understanding of the information and terminology you will encounter in the report.
  • Suitabilities and Limitations for Use — Provides a drop-down menu that helps you assess the AOI and how it is rated for various uses. In the photo below you can see we selected Vegetative Productivity to see what is going on in the Vineyards at Mumm Napa, which leads us to several measures of American Wine Grape Site Desirability and a Crop Productivity Index. We can select one of these measures and view an image of the rating and a corresponding report of these Map Units to understand the findings of the survey.
  • Soil Properties and Qualities — This section explains everything we need to know about the chemical and physical properties, erosion factors, soil health and other qualities and features. For example, below is the rating of pH (1 to 1 Water) in each of the map unit areas. Acidity and Alkalinity is an important soil measurement in selecting crops or other plants, as it will affect vegetative fertility and stabilization. Here, Map Unit Area 103 has the lowest rating at 6.1, and Map Unit Area 171 has the highest at 7.1.
  • Soil Reports — This is a great feature that provides the ability to generate focused reports specific to your area of interest, so you have a detailed reference. The data is generated under the interactive map but can also be produced as a pdf.

STEP 4: Download Soils Data

After the Soil Data Explorer, you can obtain a file at the Download Soils Data tab for the specific AOI you have selected

  • To get the file, you just click the download link. It will generate a zip file
  • PRO TIP: The system only generates the data in MICROSOFT ACCESS, and its template database is synced up with the 2003 version. So…
  • You will need access to MICROSOFT ACCESS
  • You can successfully download the file, open it in MICROSOFT ACCESS, and generate a report
  • You can convert that report to Microsoft Excel to use in Data Visualizations in Tableau
  • You can also save it in Excel as a .csv to bring it back to Python/Pandas

STEP 5: Shopping Cart

Yes, it is a shopping cart, but the data is still free. This provides you with an opportunity to generate a custom soil resource report for your selected Area of Interest as a .pdf file.

Time to UNEARTH some data…

I’ve often found that having an opportunity to obtain perspective from someone who has walked the path before is always helpful.

My first time through both sites was only hindered by taking a little extra time to understand the terminology of each site and the interface so I could consider how best to collect and leverage the data going forward.

If you would like to see how I leveraged this data in my analysis, please check out my blog on Napa Valley.

My hope is that this overview helps you get to your data faster, and allows you more time to shine some sunlight on your research and dig deeper into your insights.

Until next time…

Cheers!

Tony

As an experienced marketer and business consultant to global brands, Tony embraced Data Science to help them drive new insights and visualize opportunities.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store