Currently, there are 9 functions associated with the
sample
verb in the sgsR
package:
Algorithm | Description | Reference |
---|---|---|
sample_srs() |
Simple random | |
sample_systematic() |
Systematic | |
sample_strat() |
Stratified | Queinnec, White, & Coops (2021) |
sample_sys_strat() |
Systematic Stratified | |
sample_nc() |
Nearest centroid | Melville & Stone (2016) |
sample_clhs() |
Conditioned Latin hypercube | Minasny & McBratney (2006) |
sample_balanced() |
Balanced sampling | Grafström, A. Lisic, J (2018) |
sample_ahels() |
Adapted hypercube evaluation of a legacy sample | Malone, Minasny, & Brungard (2019) |
sample_existing() |
Sub-sampling an existing sample |
sample_srs
We have demonstrated a simple example of using the
sample_srs()
function in vignette("sgsR")
. We
will demonstrate additional examples below.
raster
The input required for sample_srs()
is a
raster
. This means that sraster
and
mraster
are supported for this function.
#--- perform simple random sampling ---#
sample_srs(
raster = sraster, # input sraster
nSamp = 200, # number of desired sample units
plot = TRUE
) # plot
#> Simple feature collection with 200 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431190 ymin: 5337710 xmax: 438450 ymax: 5343210
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> geometry
#> 1 POINT (432850 5340850)
#> 2 POINT (431910 5337930)
#> 3 POINT (435670 5342790)
#> 4 POINT (437350 5340290)
#> 5 POINT (435050 5338330)
#> 6 POINT (431370 5341410)
#> 7 POINT (432430 5338430)
#> 8 POINT (434550 5339410)
#> 9 POINT (435750 5340230)
#> 10 POINT (435090 5343150)
sample_srs(
raster = mraster, # input mraster
nSamp = 200, # number of desired sample units
access = access, # define access road network
mindist = 200, # minimum distance sample units must be apart from one another
buff_inner = 50, # inner buffer - no sample units within this distance from road
buff_outer = 200, # outer buffer - no sample units further than this distance from road
plot = TRUE
) # plot
#> Simple feature collection with 200 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431330 ymin: 5337730 xmax: 438550 ymax: 5343230
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> geometry
#> 1 POINT (437050 5338350)
#> 2 POINT (432790 5340390)
#> 3 POINT (433670 5340750)
#> 4 POINT (435910 5343170)
#> 5 POINT (434470 5339710)
#> 6 POINT (432630 5343090)
#> 7 POINT (433990 5341390)
#> 8 POINT (437150 5337790)
#> 9 POINT (436210 5340810)
#> 10 POINT (433210 5341090)
sample_systematic
The sample_systematic()
function applies systematic
sampling across an area with the cellsize
parameter
defining the resolution of the tessellation. The tessellation shape can
be modified using the square
parameter. Assigning
TRUE
(default) to the square
parameter results
in a regular grid and assigning FALSE
results in a
hexagonal grid.
The location of sample units can also be adjusted using the
locations
parameter, where centers
takes the
center, corners
takes all corners, and random
takes a random location within each tessellation. Random start points
and translations are applied when the function is called.
#--- perform grid sampling ---#
sample_systematic(
raster = sraster, # input sraster
cellsize = 1000, # grid distance
plot = TRUE
) # plot
#> Simple feature collection with 35 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431296.7 ymin: 5337958 xmax: 438290.4 ymax: 5343023
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> geometry
#> 1 POINT (431296.7 5342688)
#> 2 POINT (432248 5342380)
#> 3 POINT (431939.7 5341428)
#> 4 POINT (431631.4 5340477)
#> 5 POINT (431323.1 5339526)
#> 6 POINT (433507.6 5343023)
#> 7 POINT (433199.3 5342071)
#> 8 POINT (432891 5341120)
#> 9 POINT (432582.7 5340169)
#> 10 POINT (432274.3 5339217)
#--- perform grid sampling ---#
sample_systematic(
raster = sraster, # input sraster
cellsize = 500, # grid distance
square = FALSE, # hexagonal tessellation
location = "random", # randomly sample within tessellation
plot = TRUE
) # plot
#> Simple feature collection with 176 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431104.1 ymin: 5337709 xmax: 438558.2 ymax: 5343198
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> geometry
#> 1 POINT (438309.9 5342777)
#> 2 POINT (438002.8 5343094)
#> 3 POINT (438039.9 5342311)
#> 4 POINT (437798.8 5342708)
#> 5 POINT (437979.7 5341729)
#> 6 POINT (438316.6 5340927)
#> 7 POINT (438433.1 5340167)
#> 8 POINT (437368.4 5343111)
#> 9 POINT (437840.1 5342171)
#> 10 POINT (437898.7 5341320)
sample_systematic(
raster = sraster, # input sraster
cellsize = 500, # grid distance
access = access, # define access road network
buff_outer = 200, # outer buffer - no sample units further than this distance from road
square = FALSE, # hexagonal tessellation
location = "corners", # take corners instead of centers
plot = TRUE
)
#> Simple feature collection with 614 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431273 ymin: 5337747 xmax: 438558.9 ymax: 5343235
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> geometry
#> 1 POINT (437694.1 5342885)
#> 2 POINT (437586.1 5343153)
#> 3 POINT (438018.4 5342082)
#> 4 POINT (437910.3 5342350)
#> 5 POINT (438558.9 5340744)
#> 6 POINT (437586.1 5343153)
#> 7 POINT (437300.2 5343193)
#> 8 POINT (437694.1 5342885)
#> 9 POINT (437910.3 5342350)
#> 10 POINT (437624.5 5342390)
sample_strat
The sample_strat()
contains two method
s to
perform sampling:
"Queinnec"
- Hierarchical sampling using a focal window to isolate contiguous groups of stratum pixels, which was originally developed by Martin Queinnec."random"
- Traditional stratified random sampling. Thismethod
ignores much of the functionality of the algorithm to allow users the capability to use standard stratified random sampling approaches without the use of a focal window to locate contiguous stratum cells.
method = "Queinnec"
Queinnec, M., White, J. C., & Coops, N. C. (2021). Comparing airborne and spaceborne photon-counting LiDAR canopy structural estimates across different boreal forest types. Remote Sensing of Environment, 262(August 2020), 112510.
This algorithm uses moving window (wrow
and
wcol
parameters) to filter the input sraster
to prioritize sample unit allocation to where stratum pixels are
spatially grouped, rather than dispersed individuals across the
landscape.
Sampling is performed using 2 rules:
Rule 1 - Sample within spatially grouped stratum pixels. Moving window defined by
wrow
andwcol
.Rule 2 - If no additional sample units exist to satisfy desired sample size(
nSamp
), individual stratum pixels are sampled.
The rule applied to a select each sample unit is defined in the
rule
attribute of output samples. We give a few examples
below:
#--- perform stratified sampling random sampling ---#
sample_strat(
sraster = sraster, # input sraster
nSamp = 200
) # desired sample size # plot
#> Simple feature collection with 200 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431330 ymin: 5337730 xmax: 438410 ymax: 5343210
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata type rule geometry
#> x 1 new rule1 POINT (436850 5338090)
#> x1 1 new rule1 POINT (433670 5341390)
#> x2 1 new rule1 POINT (433930 5341130)
#> x3 1 new rule1 POINT (437790 5338330)
#> x4 1 new rule1 POINT (437830 5339250)
#> x5 1 new rule1 POINT (437810 5342710)
#> x6 1 new rule1 POINT (436790 5339410)
#> x7 1 new rule1 POINT (436770 5338190)
#> x8 1 new rule1 POINT (434270 5343150)
#> x9 1 new rule1 POINT (437670 5339150)
In some cases, users might want to include an existing
sample within the algorithm. In order to adjust the total number of
sample units needed per stratum to reflect those already present in
existing
, we can use the intermediate function
extract_strata()
.
This function uses the sraster
and existing
sample units and extracts the stratum for each. These sample units can
be included within sample_strat()
, which adjusts total
sample units required per class based on representation in
existing
.
#--- extract strata values to existing samples ---#
e.sr <- extract_strata(
sraster = sraster, # input sraster
existing = existing
) # existing samples to add strata value to
TIP!
sample_strat()
requires the sraster
input
to have an attribute named strata
and will give an error if
it doesn’t.
sample_strat(
sraster = sraster, # input sraster
nSamp = 200, # desired sample size
access = access, # define access road network
existing = e.sr, # existing sample with strata values
mindist = 200, # minimum distance sample units must be apart from one another
buff_inner = 50, # inner buffer - no sample units within this distance from road
buff_outer = 200, # outer buffer - no sample units further than this distance from road
plot = TRUE
) # plot
#> Simple feature collection with 400 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431130 ymin: 5337730 xmax: 438510 ymax: 5343230
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata type rule geometry
#> 1 1 existing existing POINT (437630 5339110)
#> 2 1 existing existing POINT (433810 5341230)
#> 3 1 existing existing POINT (437630 5338910)
#> 4 1 existing existing POINT (433830 5340510)
#> 5 1 existing existing POINT (436050 5342170)
#> 6 1 existing existing POINT (435730 5342710)
#> 7 1 existing existing POINT (434830 5341970)
#> 8 1 existing existing POINT (433610 5341390)
#> 9 1 existing existing POINT (434230 5340730)
#> 10 1 existing existing POINT (434830 5341830)
The code in the example above defined the mindist
parameter, which specifies the minimum euclidean distance that new
sample units must be apart from one another.
Notice that the sample units have type
and
rule
attributes which outline whether they are
existing
or new
, and whether
rule1
or rule2
were used to select them. If
type
is existing (a user provided
existing
sample), rule
will be
existing as well as seen above.
sample_strat(
sraster = sraster, # input
nSamp = 200, # desired sample size
access = access, # define access road network
existing = e.sr, # existing samples with strata values
include = TRUE, # include existing sample in nSamp total
buff_outer = 200, # outer buffer - no samples further than this distance from road
plot = TRUE
) # plot
#> Simple feature collection with 200 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431190 ymin: 5337770 xmax: 438510 ymax: 5343190
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata type rule geometry
#> 1 1 existing existing POINT (437630 5339110)
#> 2 1 existing existing POINT (433810 5341230)
#> 3 1 existing existing POINT (437630 5338910)
#> 4 1 existing existing POINT (433830 5340510)
#> 5 1 existing existing POINT (436050 5342170)
#> 6 1 existing existing POINT (435730 5342710)
#> 7 1 existing existing POINT (434830 5341970)
#> 8 1 existing existing POINT (433610 5341390)
#> 9 1 existing existing POINT (434230 5340730)
#> 10 1 existing existing POINT (434830 5341830)
The include
parameter determines whether
existing
sample units should be included in the total
sample size defined by nSamp
. By default, the
include
parameter is set as FALSE
.
method = "random
Stratified random sampling with equal probability for all cells
(using default algorithm values for mindist
and no use of
access
functionality). In essence this method perform the
sample_srs
algorithm for each stratum separately to meet
the specified sample size.
#--- perform stratified sampling random sampling ---#
sample_strat(
sraster = sraster, # input sraster
method = "random", # stratified random sampling
nSamp = 200, # desired sample size
plot = TRUE
) # plot
#> Simple feature collection with 200 features and 2 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337710 xmax: 438550 ymax: 5343210
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata type geometry
#> x 1 new POINT (434230 5340790)
#> x1 1 new POINT (435410 5342650)
#> x2 1 new POINT (434010 5341070)
#> x3 1 new POINT (435190 5338790)
#> x4 1 new POINT (437970 5339330)
#> x5 1 new POINT (434250 5340770)
#> x6 1 new POINT (434970 5342750)
#> x7 1 new POINT (436350 5339450)
#> x8 1 new POINT (437630 5339010)
#> x9 1 new POINT (433610 5341510)
sample_sys_strat
sample_sys_strat()
function implements systematic
stratified sampling on an sraster
. This function uses the
same functionality as sample_systematic()
but takes an
sraster
as input and performs sampling on each stratum
iteratively.
#--- perform grid sampling on each stratum separately ---#
sample_sys_strat(
sraster = sraster, # input sraster with 4 strata
cellsize = 1000, # grid size
plot = TRUE # plot output
)
#> Warning: [readStart] source already open for reading
#> Processing strata : 1
#> Warning: [extract] source already open for reading
#> Processing strata : 2
#> Warning: [extract] source already open for reading
#> Processing strata : 3
#> Warning: [extract] source already open for reading
#> Processing strata : 4
#> Warning: [extract] source already open for reading
#> Simple feature collection with 32 features and 1 field
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431398.7 ymin: 5337793 xmax: 438220.9 ymax: 5343199
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata geometry
#> 1 1 POINT (437666.9 5342715)
#> 2 1 POINT (436816.9 5339669)
#> 3 1 POINT (435818.2 5339620)
#> 4 1 POINT (435669.4 5342616)
#> 5 1 POINT (434769.8 5340569)
#> 6 1 POINT (434720.2 5341568)
#> 7 1 POINT (433820.6 5339520)
#> 8 1 POINT (433771 5340519)
#> 9 1 POINT (433721.4 5341518)
#> 10 2 POINT (437817.9 5342199)
Just like with sample_systematic()
we can specify where
we want our samples to fall within our tessellations. We specify
location = "corners"
below. Note that the tesselations are
all saved to a list file when details = TRUE
should the
user want to save them.
sample_sys_strat(
sraster = sraster, # input sraster with 4 strata
cellsize = 500, # grid size
square = FALSE, # hexagon tessellation
location = "corners", # samples on tessellation corners
plot = TRUE # plot output
)
#> Processing strata : 1
#> Warning: [extract] source already open for reading
#> Processing strata : 2
#> Warning: [extract] source already open for reading
#> Processing strata : 3
#> Warning: [extract] source already open for reading
#> Processing strata : 4
#> Warning: [extract] source already open for reading
#> Simple feature collection with 1148 features and 1 field
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431127.3 ymin: 5337743 xmax: 438490.8 ymax: 5343211
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata geometry
#> 1 1 POINT (431445.5 5340817)
#> 2 1 POINT (431299.7 5340568)
#> 3 1 POINT (432757.6 5343060)
#> 4 1 POINT (431299.7 5340568)
#> 5 1 POINT (431445.5 5340817)
#> 6 1 POINT (431445.5 5340817)
#> 7 1 POINT (432754.7 5342560)
#> 8 1 POINT (432757.6 5343060)
#> 9 1 POINT (433046.3 5343058)
#> 10 1 POINT (432757.6 5343060)
This sampling approach could be especially useful incombination with
strat_poly()
to ensure consistency of sampling accross
specific management units.
#--- read polygon coverage ---#
poly <- system.file("extdata", "inventory_polygons.shp", package = "sgsR")
fri <- sf::st_read(poly)
#> Reading layer `inventory_polygons' from data source
#> `/home/runner/work/_temp/Library/sgsR/extdata/inventory_polygons.shp'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 632 features and 3 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 431100 ymin: 5337700 xmax: 438560 ymax: 5343240
#> Projected CRS: UTM_Zone_17_Northern_Hemisphere
#--- stratify polygon coverage ---#
#--- specify polygon attribute to stratify ---#
attribute <- "NUTRIENTS"
#--- specify features within attribute & how they should be grouped ---#
#--- as a single vector ---#
features <- c("poor", "rich", "medium")
#--- get polygon stratification ---#
srasterpoly <- strat_poly(
poly = fri,
attribute = attribute,
features = features,
raster = sraster
)
#--- systematatic stratified sampling for each stratum ---#
sample_sys_strat(
sraster = srasterpoly, # input sraster from strat_poly() with 3 strata
cellsize = 500, # grid size
square = FALSE, # hexagon tessellation
location = "random", # randomize plot location
plot = TRUE # plot output
)
#> Processing strata : 1
#> Warning: [extract] source already open for reading
#> Processing strata : 2
#> Warning: [extract] source already open for reading
#> Processing strata : 3
#> Warning: [extract] source already open for reading
#> Simple feature collection with 162 features and 1 field
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431132 ymin: 5337707 xmax: 438552.8 ymax: 5343187
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata geometry
#> 1 1 POINT (432124.2 5338968)
#> 2 1 POINT (432299 5339100)
#> 3 1 POINT (433755.7 5338427)
#> 4 1 POINT (432326.4 5339544)
#> 5 1 POINT (431200.5 5340662)
#> 6 1 POINT (433257.9 5339380)
#> 7 1 POINT (432028.6 5340405)
#> 8 1 POINT (435138.1 5337844)
#> 9 1 POINT (434337.2 5338793)
#> 10 1 POINT (435576.3 5338083)
sample_nc
sample_nc()
function implements the Nearest Centroid
sampling algorithm described in Melville &
Stone (2016). The algorithm uses kmeans clustering where the number
of clusters (centroids) is equal to the desired sample size
(nSamp
).
Cluster centers are located, which then prompts the nearest neighbour
mraster
pixel for each cluster to be selected (assuming
default k
parameter). These nearest neighbours are the
output sample units.
#--- perform simple random sampling ---#
sample_nc(
mraster = mraster, # input
nSamp = 25, # desired sample size
plot = TRUE
)
#> K-means being performed on 3 layers with 25 centers.
#> Simple feature collection with 25 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431210 ymin: 5338110 xmax: 438510 ymax: 5343190
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> zq90 pzabove2 zsd kcenter geometry
#> 37981 19.80 92.2 4.25 1 POINT (437250 5341210)
#> 60474 18.60 85.6 5.03 2 POINT (432050 5339990)
#> 49010 4.55 30.7 1.02 3 POINT (434030 5340610)
#> 8226 20.50 84.8 6.01 4 POINT (431490 5342790)
#> 95852 12.20 17.6 3.59 5 POINT (438370 5338110)
#> 57393 26.50 85.7 8.38 6 POINT (437570 5340170)
#> 64549 15.30 95.6 2.70 7 POINT (431490 5339770)
#> 29427 13.30 72.8 3.41 8 POINT (437750 5341670)
#> 21282 10.00 68.9 2.58 9 POINT (431510 5342090)
#> 19838 17.70 93.3 3.43 10 POINT (432470 5342170)
Altering the k
parameter leads to a multiplicative
increase in output sample units where total output samples = \(nSamp * k\).
#--- perform simple random sampling ---#
samples <- sample_nc(
mraster = mraster, # input
k = 2, # number of nearest neighbours to take for each kmeans center
nSamp = 25, # desired sample size
plot = TRUE
)
#> K-means being performed on 3 layers with 25 centers.
#--- total samples = nSamp * k (25 * 2) = 50 ---#
nrow(samples)
#> [1] 50
Visualizing what the kmeans centers and sample units looks like is
possible when using details = TRUE
. The $kplot
output provides a quick visualization of where the centers are based on
a scatter plot of the first 2 layers in mraster
. Notice
that the centers are well distributed in covariate space and chosen
sample units are the closest pixels to each center (nearest
neighbours).
#--- perform simple random sampling with details ---#
details <- sample_nc(
mraster = mraster, # input
nSamp = 25, # desired sample number
details = TRUE
)
#> K-means being performed on 3 layers with 25 centers.
#--- plot ggplot output ---#
details$kplot
sample_clhs
sample_clhs()
function implements conditioned Latin
hypercube (clhs) sampling methodology from the clhs
package.
TIP!
A number of other functions in the sgsR
package help to
provide guidance on clhs sampling including calculate_pop()
and calculate_lhsOpt()
. Check out these functions to better
understand how sample numbers could be optimized.
The syntax for this function is similar to others shown above,
although parameters like iter
, which define the number of
iterations within the Metropolis-Hastings process are important to
consider. In these examples we use a low iter
value for
efficiency. Default values for iter
within the
clhs
package are 10,000.
sample_clhs(
mraster = mraster, # input
nSamp = 200, # desired sample size
plot = TRUE, # plot
iter = 100
) # number of iterations
The cost
parameter defines the mraster
covariate, which is used to constrain the clhs sampling. An example
could be the distance a pixel is from road access
(e.g. from calculate_distance()
see example below), terrain
slope, the output from calculate_coobs()
, or many
others.
#--- cost constrained examples ---#
#--- calculate distance to access layer for each pixel in mr ---#
mr.c <- calculate_distance(
raster = mraster, # input
access = access, # define access road network
plot = TRUE
) # plot
#>
|---------|---------|---------|---------|
=========================================
sample_clhs(
mraster = mr.c, # input
nSamp = 250, # desired sample size
iter = 100, # number of iterations
cost = "dist2access", # cost parameter - name defined in calculate_distance()
plot = TRUE
) # plot
sample_balanced
The sample_balanced()
algorithm performs a balanced
sampling methodology from the stratifyR / SamplingBigData
packages.
sample_balanced(
mraster = mraster, # input
nSamp = 200, # desired sample size
plot = TRUE
) # plot
#> Simple feature collection with 200 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431130 ymin: 5337710 xmax: 438550 ymax: 5343230
#> Projected CRS: +proj=utm +zone=17 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (435050 5343230)
#> 2 POINT (436050 5343230)
#> 3 POINT (437870 5343230)
#> 4 POINT (438390 5343230)
#> 5 POINT (433890 5343210)
#> 6 POINT (434530 5343210)
#> 7 POINT (431330 5343150)
#> 8 POINT (434110 5343150)
#> 9 POINT (433830 5343110)
#> 10 POINT (438450 5343110)
sample_balanced(
mraster = mraster, # input
nSamp = 100, # desired sample size
algorithm = "lcube", # algorithm type
access = access, # define access road network
buff_inner = 50, # inner buffer - no sample units within this distance from road
buff_outer = 200
) # outer buffer - no sample units further than this distance from road
#> Simple feature collection with 100 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431210 ymin: 5337770 xmax: 438350 ymax: 5343230
#> Projected CRS: +proj=utm +zone=17 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (431790 5343230)
#> 2 POINT (435750 5343190)
#> 3 POINT (437210 5343170)
#> 4 POINT (437990 5343050)
#> 5 POINT (431910 5343010)
#> 6 POINT (437430 5342990)
#> 7 POINT (436490 5342950)
#> 8 POINT (437790 5342950)
#> 9 POINT (435990 5342930)
#> 10 POINT (437750 5342910)
sample_ahels
The sample_ahels()
function performs the adapted
Hypercube Evaluation of a Legacy Sample (ahels) algorithm
usingexisting
sample data and an mraster
. New
sample units are allocated based on quantile ratios between the
existing
sample and mraster
covariate
dataset.
This algorithm was adapted from that presented in the paper below, which we highly recommend.
Malone BP, Minansy B, Brungard C. 2019. Some methods to improve the utility of conditioned Latin hypercube sampling. PeerJ 7:e6451 DOI 10.7717/peerj.6451
This algorithm:
Determines the quantile distributions of
existing
sample units andmraster
covariates.Determines quantiles where there is a disparity between sample units and covariates.
Prioritizes sampling within those quantile to improve representation.
To use this function, user must first specify the number of quantiles
(nQuant
) followed by either the nSamp
(total
number of desired sample units to be added) or the
threshold
(sampling ratio vs. covariate coverage ratio for
quantiles - default is 0.9) parameters.
#--- remove `type` variable from existing - causes plotting issues ---#
existing <- existing %>% select(-type)
sample_ahels(
mraster = mraster,
existing = existing, # existing sample
plot = TRUE
) # plot
#> Simple feature collection with 300 features and 7 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431130 ymin: 5337710 xmax: 438510 ymax: 5343210
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> type.x zq90 pzabove2 zsd strata type.y rule
#> 1 existing 2.44 0.9 0.36 1 new rule1
#> 2 existing 6.14 45.9 1.47 1 new rule1
#> 3 existing 2.46 4.1 0.36 1 new rule1
#> 4 existing 5.42 25.3 1.41 1 new rule1
#> 5 existing 7.35 48.3 1.92 1 new rule1
#> 6 existing 3.54 25.4 0.70 1 new rule1
#> 7 existing 5.65 42.5 1.32 1 new rule1
#> 8 existing 4.70 34.6 1.16 1 new rule1
#> 9 existing 3.62 15.8 0.77 1 new rule1
#> 10 existing 6.07 31.2 1.50 1 new rule1
#> geometry
#> 1 POINT (437630 5339110)
#> 2 POINT (433810 5341230)
#> 3 POINT (437630 5338910)
#> 4 POINT (433830 5340510)
#> 5 POINT (436050 5342170)
#> 6 POINT (435730 5342710)
#> 7 POINT (434830 5341970)
#> 8 POINT (433610 5341390)
#> 9 POINT (434230 5340730)
#> 10 POINT (434830 5341830)
TIP!
Notice that no threshold
, nSamp
, or
nQuant
were defined. That is because the default setting
for threshold = 0.9
and nQuant = 10
.
The first matrix output shows the quantile ratios between the sample and the covariates. A value of 1.0 indicates that the sample is representative of quantile coverage. Values > 1.0 indicate over representation of sample units, while < 1.0 indicate under representation.
sample_ahels(
mraster = mraster,
existing = existing, # existing sample
nQuant = 20, # define 20 quantiles
nSamp = 300
) # desired sample size
#> Simple feature collection with 500 features and 7 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337710 xmax: 438510 ymax: 5343230
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> type.x zq90 pzabove2 zsd strata type.y rule
#> 1 existing 2.44 0.9 0.36 1 new rule1
#> 2 existing 6.14 45.9 1.47 1 new rule1
#> 3 existing 2.46 4.1 0.36 1 new rule1
#> 4 existing 5.42 25.3 1.41 1 new rule1
#> 5 existing 7.35 48.3 1.92 1 new rule1
#> 6 existing 3.54 25.4 0.70 1 new rule1
#> 7 existing 5.65 42.5 1.32 1 new rule1
#> 8 existing 4.70 34.6 1.16 1 new rule1
#> 9 existing 3.62 15.8 0.77 1 new rule1
#> 10 existing 6.07 31.2 1.50 1 new rule1
#> geometry
#> 1 POINT (437630 5339110)
#> 2 POINT (433810 5341230)
#> 3 POINT (437630 5338910)
#> 4 POINT (433830 5340510)
#> 5 POINT (436050 5342170)
#> 6 POINT (435730 5342710)
#> 7 POINT (434830 5341970)
#> 8 POINT (433610 5341390)
#> 9 POINT (434230 5340730)
#> 10 POINT (434830 5341830)
Notice that the total number of samples is 500. This value is the sum
of existing units (200) and number of sample units defined by
nSamp = 300
.
sample_existing
Acknowledging that existing
sample networks are common
is important. There is significant investment into these samples, and in
order to keep inventories up-to-date, we often need to collect new data
for sample units. The sample_existing
algorithm provides
the user with methods for sub-sampling an existing
sample
network should the financial / logistical resources not be available to
collect data at all sample units. The functions allows users to choose
between algorithm types using (type = "clhs"
- default,
type = "balanced"
, type = "srs"
,
type = "strat"
). Differences in type result in calling
internal sample_existing_*()
functions
(sample_existing_clhs()
(default),
sample_existing_balanced()
,
sample_existing_srs()
,
sample_existing_strat()
). These functions are not exported
to be used stand-alone, however they employ the same functionality as
their sample_clhs()
etc counterparts.
While using sample_existing()
, should the user wish to
specify algorithm specific parameters
(e.g. algorithm = "lcube"
in sample_balanced()
or allocation = "equal"
in sample_strat()
),
they can specify within sample_existing()
as if calling the
function directly.
I give applied examples for all methods below that are based on the following scenario:
We have a systematic sample where sample units are 200m apart.
We know we only have resources to sample 300 of them.
We have some ALS data available (
mraster
), which we can use to improve knowledge of the metric populations.
See our existing
sample for the scenario below.
#--- generate existing samples and extract metrics ---#
existing <- sample_systematic(raster = mraster, cellsize = 200, plot = TRUE)
#--- sub sample using ---#
e <- existing %>%
extract_metrics(mraster = mraster, existing = .)
sample_existing(type = "clhs")
The algorithm is unique in that it has two fundamental approaches:
- Sample exclusively using
existing
and the attributes it contains.
#--- sub sample using ---#
sample_existing(existing = e, nSamp = 300, type = "clhs")
#> Simple feature collection with 300 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431131.6 ymin: 5337705 xmax: 438535.1 ymax: 5343211
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> zq90 pzabove2 zsd geometry
#> 614 9.88 10.3 2.98 POINT (434614.2 5339481)
#> 601 18.20 77.0 5.75 POINT (437458.9 5340434)
#> 190 2.77 18.0 0.44 POINT (437644.9 5343027)
#> 149 22.80 81.3 7.17 POINT (431828.5 5341501)
#> 757 9.12 30.1 2.42 POINT (435626.9 5338977)
#> 508 17.70 84.6 4.54 POINT (437458 5341066)
#> 467 15.90 91.3 4.39 POINT (432210.5 5339731)
#> 523 17.00 86.9 4.34 POINT (434234 5339987)
#> 215 14.10 91.7 2.74 POINT (432714.1 5341376)
#> 350 21.50 81.0 4.83 POINT (434864.7 5341252)
- Sub-sampling using
raster
distributions
Our systematic sample of ~900 plots is fairly comprehensive, however
we can generate a true population distribution through the inclusion of
the ALS metrics in the sampling process. The metrics will be included in
internal latin hypercube sampling to help guide sub-sampling of
existing
.
#--- sub sample using ---#
sample_existing(
existing = existing, # our existing sample
nSamp = 300, # desired sample size
raster = mraster, # include mraster metrics to guide sampling of existing
plot = TRUE
) # plot
#> Simple feature collection with 300 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431133.5 ymin: 5337708 xmax: 438536.9 ymax: 5343211
#> Projected CRS: +proj=utm +zone=17 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
#> First 10 features:
#> zq90 pzabove2 zsd geometry
#> 92093 6.36 80.4 1.44 POINT (436640.4 5337840)
#> 92000 8.86 94.3 2.07 POINT (437840 5339296)
#> 91230 11.50 41.7 3.55 POINT (431510.9 5342449)
#> 91477 7.99 27.7 1.98 POINT (433789.4 5341314)
#> 91517 10.90 57.8 2.90 POINT (433094.3 5340870)
#> 91320 21.50 59.1 5.87 POINT (436380 5343026)
#> 92053 8.37 51.5 2.27 POINT (437651.3 5338600)
#> 91617 4.37 12.5 1.00 POINT (434233.1 5340619)
#> 91659 18.40 93.8 3.10 POINT (432779.5 5339921)
#> 91949 16.50 88.9 3.82 POINT (436195.8 5339167)
The sample distribution again mimics the population distribution quite well! Now lets try using a cost variable to constrain the sub-sample.
#--- create distance from roads metric ---#
dist <- calculate_distance(raster = mraster, access = access)
#>
|---------|---------|---------|---------|
=========================================
#--- sub sample using ---#
sample_existing(
existing = existing, # our existing sample
nSamp = 300, # desired sample size
raster = dist, # include mraster metrics to guide sampling of existing
cost = 4, # either provide the index (band number) or the name of the cost layer
plot = TRUE
) # plot
#> Simple feature collection with 300 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431131.6 ymin: 5337711 xmax: 438536.9 ymax: 5343218
#> Projected CRS: +proj=utm +zone=17 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
#> First 10 features:
#> zq90 pzabove2 zsd dist2access geometry
#> 91792 2.74 4.2 0.43 21.34254 POINT (438217.5 5340688)
#> 91483 9.61 39.3 2.40 8.01561 POINT (432651.5 5340933)
#> 91303 12.20 44.9 3.38 41.11341 POINT (434420 5342580)
#> 91760 11.70 65.0 2.48 57.54618 POINT (438343.6 5340941)
#> 91465 8.00 37.2 2.08 151.00008 POINT (436065.2 5342076)
#> 91839 11.40 45.7 3.33 299.13053 POINT (435815.6 5339673)
#> 91322 5.31 27.3 1.26 54.09515 POINT (436000.7 5342898)
#> 91307 17.50 85.6 3.67 130.48382 POINT (433661.5 5342326)
#> 91860 17.90 93.4 4.19 183.37888 POINT (431643.4 5338275)
#> 91536 24.80 91.3 5.64 320.29967 POINT (436761.1 5341888)
Finally, should the user wish to further constrain the sample based
on access
like other sampling approaches in
sgsR
that is also possible.
#--- ensure access and existing are in the same CRS ---#
sf::st_crs(existing) <- sf::st_crs(access)
#--- sub sample using ---#
sample_existing(
existing = existing, # our existing sample
nSamp = 300, # desired sample size
raster = dist, # include mraster metrics to guide sampling of existing
cost = 4, # either provide the index (band number) or the name of the cost layer
access = access, # roads layer
buff_inner = 50, # inner buffer - no sample units within this distance from road
buff_outer = 300, # outer buffer - no sample units further than this distance from road
plot = TRUE
) # plot
#> Simple feature collection with 300 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431134.4 ymin: 5337706 xmax: 438537.8 ymax: 5343215
#> Projected CRS: +proj=utm +zone=17 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
#> First 10 features:
#> zq90 pzabove2 zsd dist2access geometry
#> 91335 20.10 86.4 4.50 83.06137 POINT (432588 5341123)
#> 91562 18.80 92.9 4.24 69.15124 POINT (436132.3 5339357)
#> 91464 20.00 90.8 4.18 60.43957 POINT (432843 5339732)
#> 91457 13.10 80.1 3.17 274.05754 POINT (437773.8 5341383)
#> 91577 16.50 88.9 3.82 163.44897 POINT (436195.8 5339167)
#> 91331 2.92 7.1 0.48 46.70992 POINT (433536.2 5341440)
#> 91495 17.80 87.4 3.88 137.60996 POINT (432780.4 5339289)
#> 91297 5.61 39.5 1.37 71.33287 POINT (435938.1 5342456)
#> 91670 8.84 77.1 2.09 106.37089 POINT (437905.3 5337842)
#> 91228 17.70 66.0 4.37 282.42278 POINT (432965.5 5342515)
TIP!
The greater constraints we add to sampling, the less likely we will have strong correlations between the population and sample, so its always important to understand these limitations and plan accordingly.
sample_existing(type = "balanced")
When type = "balanced"
users can define all parameters
that are found within sample_balanced()
. This means that
one can change the algorithm
, p
etc.
sample_existing(existing = e, nSamp = 300, type = "balanced")
#> Simple feature collection with 300 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431131.6 ymin: 5337706 xmax: 438535.1 ymax: 5343214
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> zq90 pzabove2 zsd geometry
#> 7 12.30 47.9 3.20 POINT (432142.5 5343083)
#> 8 20.80 81.4 5.61 POINT (431952.8 5343019)
#> 13 6.05 83.0 1.37 POINT (433154.2 5343211)
#> 17 14.00 52.9 3.87 POINT (432395.6 5342957)
#> 18 11.70 65.5 2.97 POINT (432206 5342893)
#> 26 12.50 2.2 3.48 POINT (433407.4 5343084)
#> 30 10.10 52.3 2.40 POINT (432648.8 5342830)
#> 37 12.60 70.0 3.29 POINT (431131.6 5342322)
#> 39 6.63 73.5 1.43 POINT (434229.5 5343149)
#> 42 4.19 34.3 0.89 POINT (433660.6 5342958)
sample_existing(existing = e, nSamp = 300, type = "balanced", algorithm = "lcube")
#> Simple feature collection with 300 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431132.5 ymin: 5337706 xmax: 438537.8 ymax: 5343213
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> zq90 pzabove2 zsd geometry
#> 2 1.47 0.0 0.05 POINT (431889.3 5343209)
#> 19 12.70 67.7 3.18 POINT (432016.4 5342830)
#> 22 21.10 86.0 5.23 POINT (431447.4 5342639)
#> 24 5.20 39.1 1.16 POINT (433786.7 5343212)
#> 27 6.54 80.3 1.36 POINT (433217.7 5343021)
#> 28 16.30 87.1 3.91 POINT (433028.1 5342957)
#> 39 6.63 73.5 1.43 POINT (434229.5 5343149)
#> 41 13.20 69.6 3.51 POINT (433850.2 5343022)
#> 47 15.70 79.2 4.56 POINT (432522.7 5342577)
#> 50 15.30 94.2 2.87 POINT (431953.7 5342387)
sample_existing(type = "srs")
The simplest, type = srs
, randomly selects sample
units.
sample_existing(existing = e, nSamp = 300, type = "srs")
#> Simple feature collection with 300 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431132.5 ymin: 5337705 xmax: 438536.9 ymax: 5343215
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> zq90 pzabove2 zsd geometry
#> 1 9.26 36.9 2.44 POINT (433790.3 5340682)
#> 2 13.70 3.8 3.47 POINT (435184.1 5339039)
#> 3 19.00 84.2 5.37 POINT (437142.2 5340750)
#> 4 22.20 94.1 5.15 POINT (432023.6 5337770)
#> 5 17.70 95.5 3.84 POINT (431451.9 5339477)
#> 6 14.20 97.6 3.42 POINT (437715.7 5337778)
#> 7 4.72 13.6 0.93 POINT (434549.8 5340303)
#> 8 7.23 30.0 1.83 POINT (434106.1 5340998)
#> 9 15.10 71.7 4.42 POINT (436254.8 5342140)
#> 10 13.00 78.3 3.39 POINT (431893.8 5340047)
sample_existing(type = "strat")
When type = "strat"
, existing
must have an
attribute named strata
(just like how
sample_strat()
requires a strata
layer). If it
doesnt exist you will get an error. Lets define an sraster
so that we are compliant.
sraster <- strat_kmeans(mraster = mraster, nStrata = 4)
e_strata <- extract_strata(sraster = sraster, existing = e)
When we do have a strata attribute, the function works very much the
same as sample_strat()
in that is allows the user to define
the allocation
method ("prop"
- defaults,
"optim"
, "manual"
, "equal"
).
#--- proportional stratified sampling of existing ---#
sample_existing(existing = e_strata, nSamp = 300, type = "strat", allocation = "prop")
#> Simple feature collection with 300 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431132.5 ymin: 5337707 xmax: 438536 ymax: 5343217
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata zq90 pzabove2 zsd geometry
#> 1 3 17.4 92.8 4.00 POINT (438407.1 5340752)
#> 2 3 15.5 88.1 3.73 POINT (432145.2 5341185)
#> 3 3 18.7 91.6 4.43 POINT (435939.9 5341191)
#> 4 3 16.8 77.6 4.92 POINT (434672.3 5343086)
#> 5 3 15.9 91.3 4.39 POINT (432210.5 5339731)
#> 6 3 13.6 97.7 2.50 POINT (435560.6 5341064)
#> 7 3 18.3 91.0 4.73 POINT (433538 5340175)
#> 8 3 17.4 84.7 4.26 POINT (436762.9 5340623)
#> 9 3 15.4 96.1 2.22 POINT (433914.6 5342200)
#> 10 3 19.0 90.6 2.97 POINT (431892.9 5340679)
TIP!
Remember that when allocation = "equal"
, the
nSamp
value will be allocated for each strata.
We get 400 sample units in our output below because we have 4 strata
and nSamp = 100
.
#--- equal stratified sampling of existing ---#
sample_existing(existing = e_strata, nSamp = 100, type = "strat", allocation = "equal")
#> Simple feature collection with 400 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431132.5 ymin: 5337708 xmax: 438536 ymax: 5343218
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata zq90 pzabove2 zsd geometry
#> 1 3 16.7 76.9 4.38 POINT (438277.4 5343028)
#> 2 3 14.6 78.8 3.01 POINT (433158.8 5340048)
#> 3 3 13.6 95.9 2.72 POINT (437900.8 5341004)
#> 4 3 16.7 88.3 3.96 POINT (431701.5 5341880)
#> 5 3 20.7 95.9 4.43 POINT (434423.7 5340050)
#> 6 3 16.2 89.2 4.22 POINT (436067.9 5340179)
#> 7 3 18.0 86.0 4.44 POINT (436443.5 5342836)
#> 8 3 15.6 90.9 3.33 POINT (432147 5339920)
#> 9 3 14.6 89.4 4.41 POINT (437140.4 5342015)
#> 10 3 16.2 76.5 4.80 POINT (432399.3 5340427)
#--- manual stratified sampling of existing with user defined weights ---#
s <- sample_existing(existing = e_strata, nSamp = 100, type = "strat", allocation = "manual", weights = c(0.2, 0.6, 0.1, 0.1))
We can check the proportion of samples from each strata with:
#--- check proportions match weights ---#
table(s$strata) / 100
#>
#> 1 2 3 4
#> 0.2 0.6 0.1 0.1
Finally, type = "optim
allows for the user to define a
raster
metric to be used to optimize within strata
variances.
#--- manual stratified sampling of existing with user defined weights ---#
sample_existing(existing = e_strata, nSamp = 100, type = "strat", allocation = "optim", raster = mraster, metric = "zq90")
#> Simple feature collection with 99 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431137.1 ymin: 5337711 xmax: 438537.8 ymax: 5343217
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata zq90 pzabove2 zsd geometry
#> 1 3 14.8 85.1 3.45 POINT (438151.3 5342775)
#> 2 3 15.8 90.1 3.50 POINT (435628.7 5337712)
#> 3 3 15.2 92.0 3.74 POINT (435115.1 5343024)
#> 4 3 15.8 92.6 3.23 POINT (434173.2 5338279)
#> 5 3 17.8 70.4 3.97 POINT (432144.3 5341818)
#> 6 3 16.1 84.7 3.68 POINT (436191.3 5342330)
#> 7 3 20.5 79.1 4.07 POINT (432843.9 5339099)
#> 8 3 14.8 72.6 3.74 POINT (434677.7 5339292)
#> 9 3 14.0 89.8 3.37 POINT (437519.7 5342142)
#> 10 3 15.8 89.2 3.45 POINT (436319.2 5341318)
We see from the output that we get 300 sample units that are a
sub-sample of existing
. The plotted output shows cumulative
frequency distributions of the population (all existing
samples) and the sub-sample (the 300 samples we requested).