[GEE] 4. Data import and export in Google Earth Engine

1Introduction

In this module we will discuss the following concepts:

  1. How to bring your own data sets into GEE.
  2. How to relate values from remotely sensed data to your own data.
  3. How to export features from GEE.

2Background

Understanding how animals respond to their environment is critical to understanding how to manage these species. While animals are forced to make choices to meet their basic needs, their choices are likely also influenced by dynamic factors such as local weather conditions. Beyond direct observation, it is difficult to link animal behavior to weather conditions. In this unit we integrate GPS collar data collected from mountain lions with daily temperature estimates from the Daymet climate dataset accessed via GEE.

This will require us to bring our own data into GEE, connect weather values to point locations, and bring this value-added data back out of GEE for further analysis.

Camera trap photos of mountain lions taken near one of the top tourist destinations in Los Angeles, California. Photo: Earth Island Magazine

2.1 GPS positioning data

Mahoney et al 2016 One study used GPS collars to track the movements of two mountain lions and 16 coyotes in central Utah. These data were used to understand some of the behavioral patterns of individuals of both species. The data these researchers collected during the study can be accessed by anyone on the Movebank website. This website hosts animal movement datasets from around the world. While some Movebank datasets only list the author’s contact information, others allow you to display the information on their web map, and still others allow you to download the data.

An example of an interactive graph fromMovebank.com that allows you to search for data on animal movement.

2.2 Daymet Weather Data

The Daymet dataset provides gridded estimates of daily weather parameters. Seven surface weather parameters are available for each day with a spatial resolution of 1 km x 1 km and a spatial extent of North America. ORNL DAAC provides access to Daymet datasets through a variety of tools and formats, providing a rich resource of daily surface meteorology. Source: Daymet/NASA

Daymet data provide daily data at a spatial resolution of 1 km x 1 km and are an important resource on the temporal and spatial scales of cougar interactions with the landscape. There are seven measurements in total. This allows us to examine multiple aspects of weather to assess how it affects behavior.

Metadata associated with Daymet images in GEE.

If you are interested in learning more about climate data available around the world, check out Unit 6.

3Bring your own data into Earth Engine

In this exercise, we’ll discuss how to move your own data to GEE, extract values from a dataset, and export those values from GEE. The process of bringing data into GEE has been changing rapidly, and like most things, it’s best to go directly to the documentation to see the latest updates. This information can be found here.

3.1 Cleaning data

Animal movement data are downloaded as csv files. To bring them into GEE we need to convert them to shapefiles. While there are many ways to convert a csv file to a shapefile, we will use R. The code below contains everything needed to make this conversion. Details on how to convert csv files to shapefiles in R can be found here.

Some of the complexity in the code comes from renaming the columns to remove the “.” This is necessary in order to comply with GEE’s requirements for naming conventions. While this specific detail is not in the documentation, it is described in a post on the help forum.

You do not need to run this code, it is provided for your future reference

# Load necessary libraries
library(sp)
library(rgdal)
library(dplyr)
 
# read in CSV of data
baseDir <- "The folder your csv is held in"
data <- read.csv(paste0( baseDir, "/Site fidelity in cougars and coyotes, Utah_Idaho USA (data from Mahoney et al. 2016).csv"))
 
# convert to spatial points data frame
# remove all NA values from lat and long columns
names(data)
noNAs <- data[complete.cases(data[ , 4:5]),]
# filter to select animal of interest
glimpse(noNAs)
cougarF53 <- noNAs %>%
  filter(individual.local.identifier == "F53") %>%
  dplyr::select("event.id", "timestamp", "location.long","location.lat")
 
#UniqueGEEissue
# GEE does not accept column names with dots So we will rename our columnsn
colnames(cougarF53) <- c("id", "timestamp", "lon", "lat")
# check the time line of data collection so we can match those dates in GEE
timeframe <- sort(cougarF53$timestamp)
 
print(paste0("The first time stamp is ", timeframe[1], " the last collection is ", timeframe[length(timeframe)] ))
 
 
# Create a spatial feature with the sp package
# only keep unique id as data
cougarF53Spatial <- sp::SpatialPointsDataFrame(coords = cougarF53[,3:4], data = cougarF53[,1])
# set coordinate reference system to WGS84, using the proj4str
crs(cougarF53Spatial) <- " + proj=longlat + ellps=WGS84 + datum=WGS84 + no_defs "
 
# Export as shapefile
# write a shapefile
writeOGR(cougarF53Spatial, baseDir, "/cougarF53Locations", driver="ESRI Shapefile")

We wrote the shapefile with only one column per row and a unique id. We did this because we planned to do most of the analysis outside of GEE, so there was no need to load all the extra data. The unique ID will allow us to connect the value-added data from GEE with the original dataset.

3.2 Introduction of assets

  • Coordinate Reference System: First, it is important to note that the projection used by GEE is WGS 1984 EPSG: 4326. Therefore, all data you want to bring into GEE requires the same coordinate reference system. Remember that WGS1984 is a geographic coordinate system. You don’t want a projected coordinate system on your data.

  • Upload shapefile: In the R code above, we convert the csv file of the data into a shapefile and define the coordinate reference system (CRS) to match what is expected from GEE (WGS 1984). When you load features into Google Earth Engine, you will add the personal assets associated with your GEE account.

?

You will be able to monitor the upload progress in the task pane.

?

Once uploaded, you can edit the asset through the assets pane on the left side of the code editor. This allows you to set sharing parameters. For this example, anyone can read the asset. This means that anyone running the code will be able to use the dataset, even if they don’t own it or download it.

?

Examples of Sharing Personal Assets.

The process of uploading the shapefile can take a while, so instead of walking you through the process, we’ve provided a link to a script that already loads the data needed for this course. Code with preloaded dataset. Please use this script as a starting point for the rest of this lesson.

After running this, we recommend loading your own shapefile. This can be your data, or if you want something quick and easy, try using shapefiles from Natural Earth Data. This is a great site for geographic data at various map scales. The 1:110m physics vectors that the link above will take you to are very generic and therefore load much faster than more data-rich layers.

Each asset has shared preferences similar to other features you might have on Google Drive.

import allows you to add newly acquired assets to your script. This is very similar to importing an imageCollection into a script.

share allows you to define who can view and edit the asset.

deleteUse this to clear space, but remember, what’s gone is never returned.

Once the asset is loaded, import it into the script by double-clicking the asset name in the Assets panel or pressing the small arrow icon that appears to the right of the feature when hovering over the name. Rename the feature to something descriptive. Then visualize it on a map to make sure the feature looks how you expect it to.

In the preloaded script you can see that we have completed these steps. We also added a print statement to access the data structure.

</code>
  
  <ol class="hljs-ln"><li>
    
    
     
     
    
    
    
    
     
      
     
     
    
    </li><li>
    
    
     
     
    
    
    
    
     
     
      
      // Imported the data and not add it to the map and print.
     
     
    
    </li><li>
    
    
     
     
    
    
    
    
     
     
      
      Map.
      
      addLayer(cougarF53, {},
      
      "cougar presence data");
     
     
    
    </li><li>
    
    
     
     
    
    
    
    
     
     
      
      print(cougarF53,
      
      "cougar data");
     
     
    
    </li></ol>
  
  

You can use the Inspector tool to view the property data associated with the new asset.

After visualizing these points, make a geometric feature that contains our area of interest. We will use geometric features to filter our climate data.

You can do this by selecting the square geometric features and drawing a box containing the points.

?

Draw geometric features around these points to filter climate data.

3.2.1 Upload Raster

The process of bringing in the raster is the same as what we just did with the shapefile. Image collections (raster collections) are a more complex data type and have some additional requirements, which you can read here.

3.3 Define weather variables

In this lesson, we use Google Earth Engine as a way to relate remotely sensed data (i.e. our raster) to our point locations. While the process is conceptually straightforward, it does require some work to complete. After loading our points, the next step is to import the Daymet weather variables.

3.3.1 Climate data call day

We used the NASA-derived dataset Daymet V3 because it has a spatial resolution of 1 km and because it measures the environmental conditions experienced by mountain lions. We’ll import it by calling the dataset’s unique ID and filtering it to our bounding box geometry.

// Call in image and filter.
var Daymet = ee.ImageCollection("NASA/ORNL/DAYMET_V3")
.filterDate("2014-01-01", "2017-08-01")
.filterBounds(geometry)
.select('tmin')
.map(function(image){<!-- -->return image.clip(geometry)});
 
print(Daymet,"Daymet");
Map.addLayer(Daymet, {<!-- -->}, "Daymet");

?

View of the Daymet data structure in the print statement.

From the print statement, we can see that this is an image collection containing 267 images (although your total number of images may vary depending on your dataset). Each image has seven bands associated with specific weather measurements. Now that both datasets are loaded, we will correlate the cougar occurrence data with the weather data.

3.4 Extract value

With our points and image loaded, we can call a function to extract values from the underlying raster based on the mountain lion’s known location. We will use the ee.Image.sampleRegions function to do this. Search for the ee.Image.sampleRegions() function under the Docs tab to familiarize yourself with the parameters it requires.

ee.Image.sampleRegions() is an image function, so if we try to call it on Daymet, ImageCollection we will get an error. To solve this problem, we will convert the collection of Daymet images into multiband images. Each of the seven measurements per day will become a specific band in our multiband image. This process will ultimately help us as each band is defined by the collection date and displayed variables. We can use this information to determine what data is relevant to the cougar’s location on a specific date.

IMPORTANT: In an image collection where there are many images, we will create a single image with a large number of bands. Because GEE is very good at data manipulation, it can handle this type of request.

// Convert to a multiband image and clip.
var DaymetImage = Daymet
  .toBands()
  .filterBounds(geometry);
 
print(DaymetImage, "DaymetImage");

?

Print statement that displays the result of converting a collection of images into a multiband image.

Now that we have a multiband image, we can use the sampleRegions function. There are three arguments to this function that you need to consider.

CollectionThe vector dataset to which the sampled data will be associated.

PropertiesDefines which columns of the vector dataset will be retained. In this case, we want to keep the “id” column because we will use it to join this dataset back to the original data outside of GEE.

Scale: This refers to the spatial scale (cell size) of the dataset. The scale should always match the resolution of the raster data. If you are not sure what the resolution of your raster is, use the search bar to search for the dataset and the information will appear in the document.

// Call the sample regions function.
var samples = DaymetImage.sampleRegions({<!-- -->
  collection: cougarF53,
  properties: ['id'],
  scale: 1000 });
print(samples,'samples');

?

From the print statement, we can see that our point location now has a weather measurement associated with it. Again, your results may look slightly different.

3.5 Export

3.5.1 Export points as Shapefile

We have a range of daily weather data related to Cougar 57’s known locations. While we can make more use of these data in GEE, they are easily imported into R or Excel. There are several options to define the final location of the exported data. Generally speaking, saving data to a Google Drive account is a safe option. We will use a dictionary (indicated by curly braces) to define the parameters of the export.table.toDrive() function.

Shapefile field limit: A shapefile can only contain 255 fields; there are 1869 of these. Therefore, we exported the data as a csv file.

// Export value added data to your Google Drive.
Export.table.toDrive({<!-- -->
  collection: samples,
  description:'cougarDaymetToDriveExample',
  fileFormat: 'csv'
});

When you export something, your task pane will light up. You need to run the task individually by selecting the Run button.


?

exportTaskbar example after running a script with a function.

When you select the Run button, the following popup will appear. This allows you to edit the exported details.

?

Example of user-defined parameters when exporting features from GEE.

3.5.2 Export Raster

While working with all this spatial data, you may have realized that a raster showing the median value over the time period during which the data was collected on the Cougar could be very useful information. For more information on using rasters, see 5

To do this, we will apply a reducer function to the collection of median()Daymet images to generate a median value for each parameter in each cell. Just like the tabular data, we will export this multiband image to Google Drive. Once we use this function to convert the image collection into an image median(), we can clip it into a geometry feature object. This feature will be exported as a multiband raster.

// Apply a median reducer to the dataset.
var Daymet1 = Daymet
  .median()
  .clip(geometry);
  
print(Daymet1);
 
// Export the image to drive.
Export.image.toDrive({<!-- -->
  image: Daymet1,
  description: 'MedianValueForStudyArea',
  scale: 1000,
  region: geometry,
  maxPixels: 1e9
});

There are many options. One of the most important options is the max.pixels setting. Generally speaking, GEE does not allow you to export rasters larger than 10^9 pixels. Using the max.pixels parameter you can increase this to approximately 10^12 pixels per image. If you are exporting data for an area larger than 10^12 pixels, you will need to get creative about how to get the information out of the GEE. Sometimes this involves segmenting the image into smaller parts, or reassessing the usefulness of such a large image outside of GEE.

4Conclusion

While Google Earth Engine can be used for planetary-scale analysis, it is also an efficient resource for quickly accessing and analyzing large amounts of information using your own data. The methods described in this module are a great way to add value to your own data sets. In this example we used weather data, but this is by no means the only option! You can connect your data to many other datasets in Google Earth Engine. It’s up to you to decide what’s important and why.

4.1 Complete code

// Imported the data and not add it to the map and print.
Map.addLayer(cougarF53, {}, "cougar presence data");
print(cougarF53, "cougar data");
 
// Call in image and filter.
var Daymet = ee.ImageCollection("NASA/ORNL/DAYMET_V3")
.filterDate("2014-01-01", "2017-08-01")
.filterBounds(geometry)
.select('tmin')
.map(function(image){<!-- -->return image.clip(geometry)});
 
print(Daymet,"Daymet");
Map.addLayer(Daymet, {<!-- -->}, "Daymet");
 
// Convert to a multiband image and clip.
var DaymetImage = Daymet
  .toBands()
  .clip(geometry);
 
print(DaymetImage, "DaymetImage");
 
// Call the sample regions function.
var samples = DaymetImage.sampleRegions({<!-- -->
  collection: cougarF53,
  properties: ['id'],
  scale: 1000 });
print(samples,'samples');
 
// Export value added data to your Google Drive.
Export.table.toDrive({<!-- -->
  collection: samples,
  description:'cougarDaymetToDriveExample',
  fileFormat: 'csv'
});
 
// Apply a median reducer to the dataset.
var Daymet1 = Daymet
  .median()
  .clip(geometry);
print(Daymet1);
 
// Export the image to drive.
Export.image.toDrive({
  image: Daymet1,
  description: 'MedianValueForStudyArea',
  scale: 1000,
  region: geometry,
  maxPixels: 1e9
});