Paper Review 1

By: Chengyi (Jeff) Chen

Article: Representativeness-based sampling network design for the State of Alaska


What is the main problem/task addressed by the paper?

To figure out the optimal (best representation of the current state of the environment) areas to deploy equipment to keep track of the impact of climate changes on the ecological state of Alaska.


What was done before, and how does this paper improve on it?

Expert-derived ecoregion maps are static and have boundaries based on subjective consideration of geographic properties and expert judgment. In contrast, statistically derived ecoregions can vary with time and are delineated in the data space or state space representing all the characteristics under consideration. Moreover, the state space resolution can be varied by selecting different values of k, the level of division in the clustering algorithm.

  • Contrasting the 32 ecoregions in Alaska that have been identified by Nowacki, the authors of this study propose the statistical identification of ecoregions that are dynamically changing over time (attributed to increasing temperatures and reduced permafrost conditions) in order to minimize the amount of resources required to be deployed around alaska in order to get an accurate tracking of the climate changes occurring in these areas, especially under the constraints of limited resources and logistical accessibility.

In this study, MSTC was applied to derive ecoregions based on climate and topographic factors for the present and the future at multiple levels of division. The climate and topographic factors discussed in the “Input data layers” section describe the environmental conditions of each map cell and are the most important drivers controlling vegetation and primary production. Thus, groupings or clusters of similarly characterized map cells delineated based on these variables define unique ecoregions.

List of features used:

  1. Monthly mean air temperature

  2. Monthly mean precipitation

  3. Day of freeze

  4. Day of thaw

  5. Length of growing season

  6. Maximum active layer thickness

  7. Warming effect of snow

  8. Mean annual ground temperature at bottom of active layer

  9. Mean annual ground surface temperature

  10. Thermal offset

  11. Limnicity

  12. Elevation

  • After standardizing the features, they are fed into the k-means model with varying k-values ( \({k = 5,10,20,50,100,200,500,1000}\) ecoregions ). The study further contrasted the degree of overlap between the ecoregions identified when \({k = 10}\) and Nowacki’s “eight dominantly associated Level 2 ecological groups consisting of the 32 ecoregions defined”.

This point-to-point analysis through time is a novel method for quantifying relationships between sampling locations and how those relationships evolve over time due to environmental change.

  • The key takeaway point from this research study was that because ecoregions that displayed similar ecological characteristics do change frequently over time, which necessitates a model that could dynamically find the key locations that are most representative of the changing climate in that region, which was probably lacking in previous research studies. By using a model to track the evolution of ecoregions over time, we can more efficiently deploy resources to the most representative areas and get alerts faster when it is time to adjust the location of equipment to another spot more representative of the new ecoregion.


What is the one cool technique/idea/finding that was learned from this paper?

The primary technique used in the paper is Multivariate spatiotemporal clustering (MSTC), the k-means clustering algorithm used to identify k ecoregion clusters that have sufficiently distinct features which likely warrant the allocation of resources to observe the climate changes of that ecoregion so as to better track ecological changes in that cluster.


What part of the paper was difficult to understand?

A few key terms mostly specific to ecology were hard to digest:

  1. Upscaling / Downscaling measurements

  2. Edaphic

  3. Down-scaled general circulation model (GCM)

  4. Nominal resolution

  5. A1B emissions scenario

  6. Limnicity

  7. Hypervolume of environmental gradients

Areas of confusion: I couldn’t manage to find out how they predicted the future ecoregions. I understand that the present-time period data was extracted through the various data sources mentioned, but nothing much was explained about how the future ecoregion clusters were predicted - was there another model used to predict the future clusters, or were the predictions part of the MSTC procedure?


Brainstorming ideas: What generalization or extension of the paper could be done? Or what are other applications that can benefit from the described approach? What more could have been done in this application space that could leverage AI? Does this give you an idea about a possible project a student could do?

Alaska exhibits wide ranging heterogeneity in environmental conditions, which can be resolved by selecting larger numbers of clusters in the MSTC algorithm. While MSTC is a non-hierarchical procedure, inherently hierarchical relationships within the combinations of state variables automatically emerge when increasing the level of division.

  • Perhaps hierarchical clustering might be a better option to explore for the dataset. Depending on the just how much data was utilized for the study, hierarchical clustering, though \({O(n^2)}\) compared to the \({O(n)}\) K-means, might reveal clusters that have greater dissimilarity with each other and hence, give a better representation of where the equipment should be deployed.

  • I felt that more could have been done with the actual time series data received from the study. A possible use for that data would be to use a recurrent net like a Long Short-Term Memory Unit (LSTM) to predict and contrast the future features of the original 32 ecoregions presented by Nowacki, which can serve as a key predictive model to understand how the ecoregions will evolve over time and identify any underlying trends in the climate change affecting specific parts of the ecoregion like mapping out the rate in which permafrost melts over time - this is important to estimate the amount of carbon that would potentially be released and at when. An Autoregressive Integrated Moving Average Model (ARIMA) might also be useful in assisting to identify these trends if they do exist.

  • A very important application that can benefit from this study would be the conservation of wildlife that are native to each ecoregion. By forecasting potential changes in the climate of each ecoregion, we can better invest more effort into areas that are both sensitive to changes in the climate as observed through the data collected, as well as home to endangered / threatened species. For example, in the case of the Short-tailed albatross, by tracking trends in their movement / migration patterns across Alaska and by forecasting when major climate changes might occur in specific ecoregions, we can better protect these birds by first scouting their next most likely location to setup equipment to monitor their habitat.

  • As mentioned above, I believe that the clustering analysis performed in this research study serves as a solid bedrock for identifying “where” we should allocate efforts to. However, what’s even more important after this would be to create specific models for each of these key ecoregions to predict how the features used in the MSTC model like mean air temperature, mean precipitation, and other exogenous factors might evolve over time. This would be an extremely worthy project for a student to take on. However, it does require for a time series dataset which I’m not sure if it is readily available.