Title: | Functions for Optimal Matching |
---|---|
Description: | Distance based bipartite matching using minimum cost flow, oriented to matching of treatment and control groups in observational studies ('Hansen' and 'Klopfer' 2006 <doi:10.1198/106186006X137047>). Routines are provided to generate distances from generalised linear models (propensity score matching), formulas giving variables on which to limit matched distances, stratified or exact matching directives, or calipers, alone or in combination. |
Authors: | Ben Hansen [aut], Mark Fredrickson [aut], Josh Errickson [cre, aut], Josh Buckner [aut], Adam Rauh [ctb] |
Maintainer: | Josh Errickson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.10.8.9001 |
Built: | 2024-11-19 17:28:29 UTC |
Source: | https://github.com/markmfredrickson/optmatch |
e1 + e2
returns the element-wise sum of
two InfinitySparseMatrix objects.
If either element is inf then
the resulting element will be inf.
e1 - e2
returns the element-wise subtraction of
two InfinitySparseMatrix objects.
If either element is inf then
the resulting element will be inf.
e1 * e2
returns the element-wise multiplication of
two InfinitySparseMatrix objects.
If either element is inf then
the resulting element will be inf.
e1 / e2
returns the element-wise division of
two InfinitySparseMatrix objects.
If either element is inf then
the resulting element will be inf.
## S4 method for signature 'InfinitySparseMatrix,InfinitySparseMatrix' e1 + e2 ## S4 method for signature 'InfinitySparseMatrix,InfinitySparseMatrix' e1 - e2 ## S4 method for signature 'InfinitySparseMatrix,InfinitySparseMatrix' e1 * e2 ## S4 method for signature 'InfinitySparseMatrix,InfinitySparseMatrix' e1 / e2
## S4 method for signature 'InfinitySparseMatrix,InfinitySparseMatrix' e1 + e2 ## S4 method for signature 'InfinitySparseMatrix,InfinitySparseMatrix' e1 - e2 ## S4 method for signature 'InfinitySparseMatrix,InfinitySparseMatrix' e1 * e2 ## S4 method for signature 'InfinitySparseMatrix,InfinitySparseMatrix' e1 / e2
e1 |
an InfinitySparseMatrix object |
e2 |
an InfinitySparseMatrix object |
an InfinitySparseMatrix object representing the element-wise sum of the two ISM objects
This function builds a distance specification where treated units are infinitely far away from control units that share the same level of a given factor variable. This can be useful for ensuring that matched groups come from qualitatively different groups.
antiExactMatch(x, z)
antiExactMatch(x, z)
x |
A factor across which matches should be allowed. |
z |
A logical or binary vector the same length as |
The exactMatch
function provides a way of specifying
a matching problem where only units within a factor level may be
matched. This function provides the reverse scenario: a matching
problem in which only units across factor levels are permitted to
match. Like exactMatch
, the results of this function will
most often be used as a within
argument to
match_on
or another distance specification creation
function to limit the scope of the final distance specification
(i.e., disallowing any match between units with the same value on
the factor variable x
).
A distance specification that encodes the across factor level constraint.
exactMatch
, match_on
, caliper
, fullmatch
, pairmatch
data(nuclearplants) # force entries to be within the same factor: em <- fullmatch(exactMatch(pr ~ pt, data = nuclearplants), data = nuclearplants) table(nuclearplants$pt, em) # force treated and control units to have different values of `pt`: z <- nuclearplants$pr names(z) <- rownames(nuclearplants) aem <- fullmatch(antiExactMatch(nuclearplants$pt, z), data = nuclearplants) table(nuclearplants$pt, aem)
data(nuclearplants) # force entries to be within the same factor: em <- fullmatch(exactMatch(pr ~ pt, data = nuclearplants), data = nuclearplants) table(nuclearplants$pt, em) # force treated and control units to have different values of `pt`: z <- nuclearplants$pr names(z) <- rownames(nuclearplants) aem <- fullmatch(antiExactMatch(nuclearplants$pt, z), data = nuclearplants) table(nuclearplants$pt, aem)
Convert an object to InfinitySparseMatrix
as.InfinitySparseMatrix(x)
as.InfinitySparseMatrix(x)
x |
An object which can be coerced into InfinitySparseMatrix, typically a matrix. |
An InfinitySparseMatrix
Splits a BlockedInfinitySparseMatrix into a list of InfinitySparseMatrices
## S3 method for class 'BlockedInfinitySparseMatrix' as.list(x, ...)
## S3 method for class 'BlockedInfinitySparseMatrix' as.list(x, ...)
x |
a BlockedInfinitySparseMatrix |
... |
Ignored |
A list of InfinitySparseMatrices
Blocked Infinity Sparse Matrices are similar to Infinity Sparse Matrices, but they also keep track of the groups of units via an additional slot, groups
groups
factor vector containing groups, with unit names as labels, when possible
colnames
vector containing names for all control units. This will either be a character vector or NULL if units have no names
rownames
vector containing names for all treated units. This will either be a character vector or NULL if units have no names
cols
vector of integers corresponding to control units
rows
vector of integers corresponding to treated units
dimension
integer vector containing the number of treated and control units, in that order
call
function call used to create the InfinitySparseMatrix
Mark M. Fredrickson
match_on
, exactMatch
, fullmatch
, InfinitySparseMatrix-class
Combine objects
## S4 method for signature 'SubProbInfo' c(x, ...) ## S4 method for signature 'NodeInfo' c(x, ...) ## S4 method for signature 'ArcInfo' c(x, ...) ## S4 method for signature 'MCFSolutions' c(x, ...) ## S4 method for signature 'FullmatchMCFSolutions' c(x, ...)
## S4 method for signature 'SubProbInfo' c(x, ...) ## S4 method for signature 'NodeInfo' c(x, ...) ## S4 method for signature 'ArcInfo' c(x, ...) ## S4 method for signature 'MCFSolutions' c(x, ...) ## S4 method for signature 'FullmatchMCFSolutions' c(x, ...)
x |
object of particular class |
... |
Various objects |
Combined objects
Combine Optmatch objects
## S3 method for class 'optmatch' c(...)
## S3 method for class 'optmatch' c(...)
... |
Optmatch objects to be concatenated |
A combined Optmatch object
Encodes calipers, or maximum allowable distances within which to
match. The result of a call to caliper
is itself a distance specification between
treated and control units that can be used with
pairmatch()
or fullmatch()
. Calipers can also be combined with
other distance specifications for richer matching problems.
caliper(x, width, exclude = c(), compare = `<=`, values = FALSE) ## S4 method for signature 'InfinitySparseMatrix' caliper(x, width, exclude = c(), compare = `<=`, values = FALSE) ## S4 method for signature 'matrix' caliper(x, width, exclude = c(), compare = `<=`, values = FALSE) ## S4 method for signature 'optmatch.dlist' caliper(x, width, exclude = c(), compare = `<=`, values = FALSE)
caliper(x, width, exclude = c(), compare = `<=`, values = FALSE) ## S4 method for signature 'InfinitySparseMatrix' caliper(x, width, exclude = c(), compare = `<=`, values = FALSE) ## S4 method for signature 'matrix' caliper(x, width, exclude = c(), compare = `<=`, values = FALSE) ## S4 method for signature 'optmatch.dlist' caliper(x, width, exclude = c(), compare = `<=`, values = FALSE)
x |
A distance specification created with |
width |
The width of the caliper: how wide of a margin to
allow in matches. Be careful in setting the width. Vector valued
arguments will be recycled for each of the finite entries in
|
exclude |
(Optional) A character vector of observations (corresponding to row and column names) to exclude from the caliper. |
compare |
A function that decides that whether two
observations are with the caliper. The default is
|
values |
Should the returned object be made of all zeros
( |
caliper
is a generic function with methods for any of the allowed distance
specifications: user created matrices, the results of match_on
, the results
of exactMatch
, or combinations (using `+`
) of these objects.
width
provides the size of the caliper, the allowable distance for
matching. If the distance between a treated and control pair is less than or
equal to this distance, it is allowed kept; otherwise, the pair is discarded
from future matching. The default comparison of "equal or less than can" be
changed to any other comparison function using the comparison
argument.
It is important to understand that width
argument is defined on the
scale of these distances. For univariate distances such as propensity scores,
it is common to specify calipers in standard deviations. If a caliper of
this nature is desired, you must either find the standard deviation directly
or use the match_on
function with its caliper
argument.
Since match_on
has access to the underlying univariate scores, for
example for the GLM method, it can determine the caliper width in standard
deviations.
If you wish to exclude specific units from the caliper requirements, pass the names of
these units in the exclude
argument. These units will be allowed to match any other
unit.
A matrix like object that is suitable to be given
as distance
argument to fullmatch
or
pairmatch
. The caliper will be only zeros and Inf
values,
indicating a possible match or no possible match, respectively.
You can combine the results of caliper
with other distances using the
`+`
operator. See the examples for usage.
Mark M. Fredrickson and Ben B. Hansen
P.~R. Rosenbaum and D.~B. Rubin (1985), ‘Constructing a control group using multivariate matched sampling methods that incorporate the propensity score’, The American Statistician, 39 33–38.
exactMatch
, match_on
, fullmatch
, pairmatch
data(nuclearplants) ### Caliper of 100 MWe on plant capacity caliper(match_on(pr~cap, data=nuclearplants, method="euclidean"), width=100) ### Caliper of 1/2 a pooled SD of plant capacity caliper(match_on(pr~cap, data=nuclearplants), width=.5) ### Caliper of .2 pooled SDs in the propensity score ppty <- glm(pr ~ . - (pr + cost), family = binomial(), data = nuclearplants) ppty.dist <- match_on(ppty) pptycaliper <- caliper(ppty.dist, width = .2) ### caliper on the Mahalanobis distance caliper(match_on(pr ~ t1 + t2, data = nuclearplants), width = 3) ### Combining a Mahalanobis distance matching with a caliper ### of 1 pooled SD in the propensity score: mhd.pptyc <- caliper(ppty.dist, width = 1) + match_on(pr ~ t1 + t2, data = nuclearplants) pairmatch(mhd.pptyc, data = nuclearplants) ### Excluding observations from caliper requirements: caliper(match_on(pr ~ t1 + t2, data = nuclearplants), width = 3, exclude = c("A", "f")) ### Returning values directly (equal up to the the attributes) all(abs((caliper(ppty.dist, 1) + ppty.dist) - caliper(ppty.dist, 1, values = TRUE)) < .Machine$Double.eps)
data(nuclearplants) ### Caliper of 100 MWe on plant capacity caliper(match_on(pr~cap, data=nuclearplants, method="euclidean"), width=100) ### Caliper of 1/2 a pooled SD of plant capacity caliper(match_on(pr~cap, data=nuclearplants), width=.5) ### Caliper of .2 pooled SDs in the propensity score ppty <- glm(pr ~ . - (pr + cost), family = binomial(), data = nuclearplants) ppty.dist <- match_on(ppty) pptycaliper <- caliper(ppty.dist, width = .2) ### caliper on the Mahalanobis distance caliper(match_on(pr ~ t1 + t2, data = nuclearplants), width = 3) ### Combining a Mahalanobis distance matching with a caliper ### of 1 pooled SD in the propensity score: mhd.pptyc <- caliper(ppty.dist, width = 1) + match_on(pr ~ t1 + t2, data = nuclearplants) pairmatch(mhd.pptyc, data = nuclearplants) ### Excluding observations from caliper requirements: caliper(match_on(pr ~ t1 + t2, data = nuclearplants), width = 3, exclude = c("A", "f")) ### Returning values directly (equal up to the the attributes) all(abs((caliper(ppty.dist, 1) + ppty.dist) - caliper(ppty.dist, 1, values = TRUE)) < .Machine$Double.eps)
This matches the syntax and semantics of cbind and rbind for matrices.
## S3 method for class 'InfinitySparseMatrix' cbind(x, y, ...) ## S3 method for class 'InfinitySparseMatrix' rbind(x, y, ...) ## S3 method for class 'BlockedInfinitySparseMatrix' cbind(x, y, ...) ## S3 method for class 'BlockedInfinitySparseMatrix' rbind(x, y, ...)
## S3 method for class 'InfinitySparseMatrix' cbind(x, y, ...) ## S3 method for class 'InfinitySparseMatrix' rbind(x, y, ...) ## S3 method for class 'BlockedInfinitySparseMatrix' cbind(x, y, ...) ## S3 method for class 'BlockedInfinitySparseMatrix' rbind(x, y, ...)
x |
An InfinitySparseMatrix or BlockedInfinitySparseMatrix,
agreeing with |
y |
An InfinitySparseMatrix or BlockedInfinitySparseMatrix,
agreeing with |
... |
Other arguments ignored. |
A combined InfinitySparseMatrix or BlockedInfinitySparseMatrix
Mark Fredrickson
This checks the equality of two optmatch objects. The only bits that matter are unit names and the grouping. Other bits such as attributes, group names, order, etc are ignored.
compare_optmatch(o1, o2)
compare_optmatch(o1, o2)
o1 |
First optmatch object. |
o2 |
Second optmatch object. |
The names of the units can differ on any unmatched units, e.g., units whose value in the optmatch
object is NA
. If matched objects have differing names, this is automatically FALSE
.
Note this ignores the names of the subgroups. So four members in subgroups either
c("a", "a", "b", "b")
or c("b", "b", "a", "a")
would be identical to this call.
TRUE if the two matches have the same memberships.
This function generates a single block-diagonal distance matrix given several distance matrices defined on subgroups.
dbind(..., force_unique_names = FALSE)
dbind(..., force_unique_names = FALSE)
... |
Any number of distance objects which can be converted to
|
force_unique_names |
Default |
When you've generated several distances matrices on subgroups in your
analysis, you may wish to combine them into a single block-diagonal distance
matrix. The dbind
function facilitates this.
Any BlockedInfinitySparseMatrix
include in ...
will be broken
into individual InfinitySparseMatrix
before being joined back
together. For example, if b
is a BlockedInfinitySparseMatrix
with 2 subgroups and m
is a distance without subgroups, then
dbind(b, m)
will be a BlockedInfinitySparseMatrix
with 3
subgroups.
If there are any shared names (either row or column) among all distances
passed in, by default all matrices will be renamed to ensure unique names by
appending "X." to each distance, where "X" is ascending lower case letters
("a.", "b.", etc). Setting the force_unique_names
argument to
TRUE
errors on this instead.
If the matrices need to be renamed and there are more than 26 separate matrices, after the first 26 single "X." prefixs, they will continue as "YX.", e.g "aa.", "ab.", "ac.". If more than 676 separate matrices, the prefix wil continue to "ZYX.", e.g. "aaa.", "aab.", "aac.". This scheme supports up to 18,278 unique matrices.
Note that you do not have to combine subgroup distances into a
single blocked distance using this function to ultimately obtain a single
matching set. Instead, take a look at the vignette
vignette("matching-within-subgroups", package = "optmatch")
for
details on combining multiple matches.
A BlockedInfinitySparseMatrix
containing a block-diagonal
distance matrix. If only a single distance is passed to dbind
and
it is not already a BlockedInfinitySparseMatrix
, the result will be
an InfinitySparseMatrix
instead.
data(nuclearplants) m1 <- match_on(pr ~ cost, data = subset(nuclearplants, pt == 0), caliper = 1) m2 <- match_on(pr ~ cost, data = subset(nuclearplants, pt == 1), caliper = 1.3) blocked <- dbind(m1, m2) dists <- list(m1, m2) blocked2 <- dbind(dists) identical(blocked, blocked2)
data(nuclearplants) m1 <- match_on(pr ~ cost, data = subset(nuclearplants, pt == 0), caliper = 1) m2 <- match_on(pr ~ cost, data = subset(nuclearplants, pt == 1), caliper = 1.3) blocked <- dbind(m1, m2) dists <- list(m1, m2) blocked2 <- dbind(dists) identical(blocked, blocked2)
InfinitySparseMatrix objects represent sparse matching problems with treated units as rows of a matrix and controls units as the columns of the matrix. The names of the units can be retrieved and set using these methods.
## S4 method for signature 'InfinitySparseMatrix' dimnames(x) ## S4 replacement method for signature 'InfinitySparseMatrix,list' dimnames(x) <- value ## S4 replacement method for signature 'InfinitySparseMatrix,NULL' dimnames(x) <- value
## S4 method for signature 'InfinitySparseMatrix' dimnames(x) ## S4 replacement method for signature 'InfinitySparseMatrix,list' dimnames(x) <- value ## S4 replacement method for signature 'InfinitySparseMatrix,NULL' dimnames(x) <- value
x |
An InfinitySparseMatrix object. |
value |
A list with two entries: the treated names and control names, respectively. |
A list with treated and control names.
Creates a new distance specification from the union of two or more
distance specifications. The constituent distances specifications
may have overlapping treated and control units (identified by the
rownames
and colnames
respectively).
distUnion(...)
distUnion(...)
... |
The distance specifications (as created with with
|
For combining multiple distance specifications with common
controls, but different treated units, rbind
provides
a way to combine the different objects. Likewise,
cbind
provides a way to combine distance
specifications over common treated units, but different control
units.
distUnion
can combine distance units that have common
treated and control units into a coherent single distance
object. If there are duplicate treated-control entries in multiple
input distances, the first entry will be used.
An InfinitySparseMatrix object with all treated and control
units from the arguments combined. Duplicate entries are resolved
in favor of the earliest argument (e.g., distUnion(A, B)
will favor entries in A
over entries in B
).
match_on
, exactMatch
,
fullmatch
, pairmatch
,
cbind
, rbind
The effective sample size is the sum of the harmonic means of the number units in treatment and control for each matched group. For k matched pairs, the effective sample size is k. As matched groups become more unbalanced, the effective sample size decreases.
effectiveSampleSize(x, z = NULL) ## S3 method for class 'factor' effectiveSampleSize(x, z = NULL) ## Default S3 method: effectiveSampleSize(x, z = NULL) ## S3 method for class 'table' effectiveSampleSize(x, z = NULL)
effectiveSampleSize(x, z = NULL) ## S3 method for class 'factor' effectiveSampleSize(x, z = NULL) ## Default S3 method: effectiveSampleSize(x, z = NULL) ## S3 method for class 'table' effectiveSampleSize(x, z = NULL)
x |
|
z |
A treatment indicator, a vector the same length as |
The equivalent number of pairs in this match.
summary.optmatch
, stratumStructure
Compute value of primal problem given flows and arc costs
evaluate_primal(distances, solution)
evaluate_primal(distances, solution)
distances |
An InfinitySparseMatrix giving distances |
solution |
A MCFSolutions object |
The value of the primal problem, i.e. sum of
products of distances
with flow along arcs in solution
Hansen
An exact match is one based on a factor. Within a level, all observations are allowed to be matched. An exact match can be combined with another distance matrix to create a set of matching subproblems.
exactMatch(x, ...) ## S4 method for signature 'vector' exactMatch(x, treatment) ## S4 method for signature 'formula' exactMatch(x, data = NULL, subset = NULL, na.action = NULL, ...)
exactMatch(x, ...) ## S4 method for signature 'vector' exactMatch(x, treatment) ## S4 method for signature 'formula' exactMatch(x, data = NULL, subset = NULL, na.action = NULL, ...)
x |
A factor vector or a formula, used to select method. |
... |
Additional arguments for methods. |
treatment |
A logical or binary vector the same length as |
data |
A |
subset |
an optional vector specifying a subset of observations to be used |
na.action |
A function which indicates what should happen when
the data contain |
exactMatch
creates a block diagonal matrix of 0s and
Inf
s. The pairs with 0 entries are within the same level of
the factor and legitimate matches. Inf
indicates units in
different levels. exactMatch
replaces the
structure.fmla
argument to several functions in previous
versions of optmatch. For the factor
method, the two
vectors x
and treatment
must be the same length. The
vector x
is interpreted as indicating the grouping factors
for the data, and the vector treatment
indicates whether a
unit is in the treatment or control groups. At least one of these
two vectors must have names. For the formula
method, the
data
argument may be omitted, in which case the method
attempts to find the variables in the environment from which the
function was called. This behavior, and the arguments subset
and na.action
, mimics the behavior of lm
.
A matrix like object, which is suitable to be given as
distance
argument to fullmatch
or
pairmatch
. The exact match will be only zeros and
Inf
values, indicating a possible match or no possible
match, respectively. It can be added to a another distance matrix
to create a subclassed matching problem.
Mark M. Fredrickson
caliper
, antiExactMatch
,
match_on
, fullmatch
,
pairmatch
data(nuclearplants) ### First generate a standard propensity score ppty <- glm(pr~.-(pr+cost), family = binomial(), data = nuclearplants) ppty.distances <- match_on(ppty) ### Only allow matches within the partial turn key plants pt.em <- exactMatch(pr ~ pt, data = nuclearplants) as.matrix(pt.em) ### Blunt matches: match.pt.em <- fullmatch(pt.em) print(match.pt.em, grouped = TRUE) ### Combine the propensity scores with the subclasses: match.ppty.em <- fullmatch(ppty.distances + pt.em) print(match.ppty.em, grouped = TRUE)
data(nuclearplants) ### First generate a standard propensity score ppty <- glm(pr~.-(pr+cost), family = binomial(), data = nuclearplants) ppty.distances <- match_on(ppty) ### Only allow matches within the partial turn key plants pt.em <- exactMatch(pr ~ pt, data = nuclearplants) as.matrix(pt.em) ### Blunt matches: match.pt.em <- fullmatch(pt.em) print(match.pt.em, grouped = TRUE) ### Combine the propensity scores with the subclasses: match.ppty.em <- fullmatch(ppty.distances + pt.em) print(match.ppty.em, grouped = TRUE)
Given a data.frame
or formula
and data,
fill.NAs()
returns an expanded data frame, including a new
missingness flag for each variable with missing values and
replacing each missing entry with a value representing a reasonable
default for missing values in its column. Functions in the formula
are supported, with transformations happening before NA
replacement. The expanded data frame is useful for propensity
modeling and balance checking when there are covariates with
missing values.
fill.NAs(x, data = NULL, all.covs = FALSE, contrasts.arg = NULL)
fill.NAs(x, data = NULL, all.covs = FALSE, contrasts.arg = NULL)
x |
Can be either a data frame (in which case the data
argument should be |
data |
If x is a formula, this must be a data.frame. Otherwise it will be ignored. |
all.covs |
Should the response variable be imputed? For
formula |
contrasts.arg |
(from |
fill.NAs
prepares data for use in a model or matching
procedure by filling in missing values with minimally invasive
substitutes. Fill-in is performed column-wise, with each column
being treated individually. For each column that is missing, a new
column is created of the form “ColumnName.NA” with
indicators for each observation that is missing a value for
“ColumnName”. Rosenbaum and Rubin (1984, Sec. 2.4 and
Appendix B) discuss propensity score models using this data
structure.
The replacement value used to fill in a missing value is simple
mean replacement. For transformations of variables, e.g. y ~
x1 * x2
, the transformation occurs first. The transformation
column will be NA
if any of the base columns are
NA
. Fill-in occurs next, replacing all missing values with
the observed column mean. This includes transformation columns.
Data can be passed to fill.NAs
in two ways. First, you can
simply pass a data.frame
object and fill.NAs
will
fill every column. Alternatively, you can pass a formula
and
a data.frame
. Fill-in will only be applied to columns
specifically used in the formula. Prior to fill-in, any functions
in the formula will be expanded. If any arguments to the functions
are NA
, the function value will also be NA
and
subject to fill-in.
By default, fill.NAs
does not impute the response
variable. This is to encourage more sophisticated imputation
schemes when the response is a treatment indicator in a matching
problem. This behavior can be overridden by setting all.covs
= TRUE
.
A data.frame
with all NA
values replaced with
mean values and additional indicator columns for each column
including missing values. Suitable for directly passing to
lm
or other model building functions to build
propensity scores.
Mark M. Fredrickson and Jake Bowers
Rosenbaum, Paul R. and Rubin, Donald B. (1984) ‘Reducing Bias in Observational Studies using Subclassification on the Propensity Score,’ Journal of the American Statistical Association, 79, 516 – 524.
Von Hipple, Paul T. (2009) ‘How to impute interactions, squares, and other transformed variables,’ Sociological Methodology, 39(1), 265 – 291.
data(nuclearplants) ### Extract some representative covariates: np.missing <- nuclearplants[c('t1', 't2', 'ne', 'ct', 'cum.n')] ### create some missingness in the covariates n <- dim(np.missing)[1] k <- dim(np.missing)[2] for (i in 1:n) { missing <- rbinom(1, prob = .1, size = k) if (missing > 0) { np.missing[i, sample(k, missing)] <- NA } } ### Restore outcome and treatment variables: np.missing <- data.frame(nuclearplants[c('cost', 'pr')], np.missing) ### Fit a propensity score but with missing covariate data flagged ### and filled in, as in Rosenbaum and Rubin (1984, Appendix): np.filled <- fill.NAs(pr ~ t1 * t2, np.missing) # Look at np.filled to establish what missingness flags were created head(np.filled) (np.glm <- glm(pr ~ ., family=binomial, data=np.filled)) (glm(pr ~ t1 + t2 + `t1:t2` + t1.NA + t2.NA, family=binomial, data=np.filled)) # In a non-interactive session, the following may help, as long as # the formula passed to `fill.NAs` (plus any missingness flags) is # the desired formula for the glm. (glm(formula(terms(np.filled)), family=binomial, data=np.filled)) ### produce a matrix of propensity distances based on the propensity model ### with fill-in and flagging. Then perform pair matching on it: pairmatch(match_on(np.glm, data=np.filled), data=np.filled) ## fill NAs without using treatment contrasts by making a list of contrasts for ## each factor ## following hints from https://stackoverflow.com/a/4569239/161808 np.missing$t1F<-factor(np.missing$t1) cov.factors <- sapply(np.missing[,c("t1F","t2")],is.factor) cov.contrasts <- lapply( np.missing[,names(cov.factors)[cov.factors],drop=FALSE], contrasts, contrasts = FALSE) ## make a data frame filling the missing covariate values, but without ## excluding any levels of any factors np.noNA2<-fill.NAs(pr~t1F+t2,data=np.missing,contrasts.arg=cov.contrasts)
data(nuclearplants) ### Extract some representative covariates: np.missing <- nuclearplants[c('t1', 't2', 'ne', 'ct', 'cum.n')] ### create some missingness in the covariates n <- dim(np.missing)[1] k <- dim(np.missing)[2] for (i in 1:n) { missing <- rbinom(1, prob = .1, size = k) if (missing > 0) { np.missing[i, sample(k, missing)] <- NA } } ### Restore outcome and treatment variables: np.missing <- data.frame(nuclearplants[c('cost', 'pr')], np.missing) ### Fit a propensity score but with missing covariate data flagged ### and filled in, as in Rosenbaum and Rubin (1984, Appendix): np.filled <- fill.NAs(pr ~ t1 * t2, np.missing) # Look at np.filled to establish what missingness flags were created head(np.filled) (np.glm <- glm(pr ~ ., family=binomial, data=np.filled)) (glm(pr ~ t1 + t2 + `t1:t2` + t1.NA + t2.NA, family=binomial, data=np.filled)) # In a non-interactive session, the following may help, as long as # the formula passed to `fill.NAs` (plus any missingness flags) is # the desired formula for the glm. (glm(formula(terms(np.filled)), family=binomial, data=np.filled)) ### produce a matrix of propensity distances based on the propensity model ### with fill-in and flagging. Then perform pair matching on it: pairmatch(match_on(np.glm, data=np.filled), data=np.filled) ## fill NAs without using treatment contrasts by making a list of contrasts for ## each factor ## following hints from https://stackoverflow.com/a/4569239/161808 np.missing$t1F<-factor(np.missing$t1) cov.factors <- sapply(np.missing[,c("t1F","t2")],is.factor) cov.contrasts <- lapply( np.missing[,names(cov.factors)[cov.factors],drop=FALSE], contrasts, contrasts = FALSE) ## make a data frame filling the missing covariate values, but without ## excluding any levels of any factors np.noNA2<-fill.NAs(pr~t1F+t2,data=np.missing,contrasts.arg=cov.contrasts)
Get all the subproblems from a distance specification
findSubproblems(d)
findSubproblems(d)
d |
a distance specification |
list of distance specifications
Mark M. Fredrickson
Given two groups, such as a treatment and a control group, and a method of creating a treatment-by-control discrepancy matrix indicating desirability and permissibility of potential matches (or optionally an already created such discrepancy matrix), create optimal full matches of members of the groups. Optionally, incorporate restrictions on matched sets' ratios of treatment to control units.
fullmatch( x, min.controls = 0, max.controls = Inf, omit.fraction = NULL, mean.controls = NULL, tol = 0.001, data = NULL, solver = "", ... ) full( x, min.controls = 0, max.controls = Inf, omit.fraction = NULL, mean.controls = NULL, tol = 0.001, data = NULL, solver = "", ... )
fullmatch( x, min.controls = 0, max.controls = Inf, omit.fraction = NULL, mean.controls = NULL, tol = 0.001, data = NULL, solver = "", ... ) full( x, min.controls = 0, max.controls = Inf, omit.fraction = NULL, mean.controls = NULL, tol = 0.001, data = NULL, solver = "", ... )
x |
Any valid input to If Alternatively, a precomputed distance may be entered. A matrix of
non-negative discrepancies, each indicating the permissibility and
desirability of matching the unit corresponding to its row (a 'treatment') to
the unit corresponding to its column (a 'control'); or, better, a distance
specification as produced by |
min.controls |
The minimum ratio of controls to treatments that is to
be permitted within a matched set: should be non-negative and finite. If
When matching within subclasses (such as those created by
|
max.controls |
The maximum ratio of controls to treatments that is
to be permitted within a matched set: should be positive and numeric.
If When matching within subclasses (such as those created by
|
omit.fraction |
Optionally, specify what fraction of controls or treated
subjects are to be rejected. If When matching within subclasses (such as those created by
At most one of |
mean.controls |
Optionally, specify the average number of controls per
treatment to be matched. Must be no less than than When matching within subclasses (such as those created by
At most one of |
tol |
Because of internal rounding, |
data |
Optional |
solver |
Choose which solver to use. Currently implemented are RELAX-IV
and LEMON. Default of To explicitly use RELAX-IV, pass string "RELAX-IV". To use LEMON, pass string "LEMON". Optionally, to specify which algorithm LEMON will use, pass the function LEMON with argument for the algorithm name, "CycleCancelling", "CapacityScaling", "CostScaling", and "NetworkSimplex". See this site for details on their differences: https://lemon.cs.elte.hu/pub/doc/latest/a00606.html. CycleCancelling is the default. The CycleCancelling algorithm seems to produce results most closely
resembling those of optmatch versions prior to 1.0. We have observed the
other LEMON algorithms to produce different results when the
|
... |
Additional arguments, passed to |
If passing an already created discrepancy matrix, finite entries indicate permissible matches, with smaller discrepancies indicating more desirable matches. The matrix must have row and column names.
If it is desirable to create the discrepancies matrix beforehand (for example,
if planning on running several different matching schemes), consider using
match_on
to generate the distances. This generic function has
several useful methods for handling propensity score models, computing
Mahalanobis distances (and other arbitrary distances), and using user supplied
functions. These distances can also be combined with those generated by
exactMatch
and caliper
to create very nuanced
matching specifications.
The value of tol
can have a substantial effect on computation time;
with smaller values, computation takes longer. Not every tolerance can be
met, and how small a tolerance is too small varies with the machine and with
the details of the problem. If fullmatch
can't guarantee that the
tolerance is as small as the given value of argument tol
, then
matching proceeds but a warning is issued.
By default, fullmatch
will attempt, if the given constraints are
infeasible, to find a feasible problem using the same constraints. This
will almost surely involve using a more restrictive omit.fraction
or
mean.controls
. (This will never automatically omit treatment units.)
Note that this does not guarantee that the returned match has the least
possible number of omitted subjects, it only gives a match that is feasible
within the given constraints. It may often be possible to loosen the
omit.fraction
or mean.controls
constraint and still find a
feasible match. The auto recovery is controlled by
options("fullmatch_try_recovery")
.
In full matching problems permitting many-one matches (min.controls
less than 1), the number of controls contributing to matches can exceed
what was requested by setting a value of mean.controls
or
omit.fraction
. I.e., in this setting mean.controls
sets
the minimum ratio of number of controls to number of treatments placed
into matched sets.
If the program detects that (what it thinks is) a large problem,
a warning is issued. Unless you have an older computer, there's a good
chance that you can handle larger problems (at the cost of increased
computation time). To check the large problem threshold, use
getMaxProblemSize
; to re-set it, use
setMaxProblemSize
.
A optmatch
object (factor
) indicating matched groups.
Hansen, B.B. and Klopfer, S.O. (2006), ‘ Optimal full matching and related designs via network flows’, Journal of Computational and Graphical Statistics, 15, 609–627.
Hansen, B.B. (2004), ‘Full Matching in an Observational Study of Coaching for the SAT’, Journal of the American Statistical Association, 99, 609–618.
Rosenbaum, P. (1991), ‘A Characterization of Optimal Designs for Observational Studies’, Journal of the Royal Statistical Society, Series B, 53, 597–610.
data(nuclearplants) ### Full matching on a Mahalanobis distance. ( fm1 <- fullmatch(pr ~ t1 + t2, data = nuclearplants) ) summary(fm1) ### Full matching with restrictions. ( fm2 <- fullmatch(pr ~ t1 + t2, min.controls = .5, max.controls = 4, data = nuclearplants) ) summary(fm2) ### Full matching to half of available controls. ( fm3 <- fullmatch(pr ~ t1 + t2, omit.fraction = .5, data = nuclearplants) ) summary(fm3) ### Full matching attempts recovery when the initial restrictions are infeasible. ### Limiting max.controls = 1 allows use of only 10 of 22 controls. ( fm4 <- fullmatch(pr ~ t1 + t2, max.controls = 1, data=nuclearplants) ) summary(fm4) ### To recover restrictions optmatch_restrictions(fm4) ### Full matching within a propensity score caliper. ppty <- glm(pr ~ . - (pr + cost), family = binomial(), data = nuclearplants) ### Note that units without counterparts within the caliper are automatically dropped. ### For more complicated models, create a distance matrix and pass it to fullmatch. mhd <- match_on(pr ~ t1 + t2, data = nuclearplants) + caliper(match_on(ppty), width = 1) ( fm5 <- fullmatch(mhd, data = nuclearplants) ) summary(fm5) ### Propensity balance assessment. Requires RItools package. if (require(RItools)) summary(fm5,ppty) ### The order of the names in the match factor is the same ### as the nuclearplants data.frame since we used the data argument ### when calling fullmatch. The order would be unspecified otherwise. cbind(nuclearplants, matches = fm5) ### Match in subgroups only. There are a few ways to specify this. m1 <- fullmatch(pr ~ t1 + t2, data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m2 <- fullmatch(pr ~ t1 + t2 + strata(pt), data=nuclearplants) ### Matching on propensity scores within matching in subgroups only: m3 <- fullmatch(glm(pr ~ t1 + t2, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m4 <- fullmatch(glm(pr ~ t1 + t2 + pt, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m5 <- fullmatch(glm(pr ~ t1 + t2 + strata(pt), data=nuclearplants, family=binomial), data=nuclearplants) # Including `strata(foo)` inside a glm uses `foo` in the model as # well, so here m4 and m5 are equivalent. m3 differs in that it does # not include `pt` in the glm.
data(nuclearplants) ### Full matching on a Mahalanobis distance. ( fm1 <- fullmatch(pr ~ t1 + t2, data = nuclearplants) ) summary(fm1) ### Full matching with restrictions. ( fm2 <- fullmatch(pr ~ t1 + t2, min.controls = .5, max.controls = 4, data = nuclearplants) ) summary(fm2) ### Full matching to half of available controls. ( fm3 <- fullmatch(pr ~ t1 + t2, omit.fraction = .5, data = nuclearplants) ) summary(fm3) ### Full matching attempts recovery when the initial restrictions are infeasible. ### Limiting max.controls = 1 allows use of only 10 of 22 controls. ( fm4 <- fullmatch(pr ~ t1 + t2, max.controls = 1, data=nuclearplants) ) summary(fm4) ### To recover restrictions optmatch_restrictions(fm4) ### Full matching within a propensity score caliper. ppty <- glm(pr ~ . - (pr + cost), family = binomial(), data = nuclearplants) ### Note that units without counterparts within the caliper are automatically dropped. ### For more complicated models, create a distance matrix and pass it to fullmatch. mhd <- match_on(pr ~ t1 + t2, data = nuclearplants) + caliper(match_on(ppty), width = 1) ( fm5 <- fullmatch(mhd, data = nuclearplants) ) summary(fm5) ### Propensity balance assessment. Requires RItools package. if (require(RItools)) summary(fm5,ppty) ### The order of the names in the match factor is the same ### as the nuclearplants data.frame since we used the data argument ### when calling fullmatch. The order would be unspecified otherwise. cbind(nuclearplants, matches = fm5) ### Match in subgroups only. There are a few ways to specify this. m1 <- fullmatch(pr ~ t1 + t2, data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m2 <- fullmatch(pr ~ t1 + t2 + strata(pt), data=nuclearplants) ### Matching on propensity scores within matching in subgroups only: m3 <- fullmatch(glm(pr ~ t1 + t2, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m4 <- fullmatch(glm(pr ~ t1 + t2 + pt, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m5 <- fullmatch(glm(pr ~ t1 + t2 + strata(pt), data=nuclearplants, family=binomial), data=nuclearplants) # Including `strata(foo)` inside a glm uses `foo` in the model as # well, so here m4 and m5 are equivalent. m3 differs in that it does # not include `pt` in the glm.
To prevent users from starting excessively large matching problems, the
maximum problem size is limited by options("optmatch_max_problem_size")
.
This function a quick helper to assist fetching this value as a scalar. If the
option isn't set, the function falls back to the default value, hard coded in
the optmatch
package.
getMaxProblemSize()
getMaxProblemSize()
logical
optmatch:::getMaxProblemSize() > 1 & optmatch:::getMaxProblemSize() < 1e100
optmatch:::getMaxProblemSize() > 1 & optmatch:::getMaxProblemSize() < 1e100
InfinitySparseMatrix
is a special class of distance specifications. Finite entries
indicate possible matches, while infinite or NA entries indicated non-allowed
matches. This data type can be more space efficient for sparse matching
problems.
Usually, users will create distance specification using match_on
, caliper
, or
exactMatch
. The ordering of units in an InfinitySparseMatrix
is not guaranteed to be maintained after subsetting and/or other operations are performed.
colnames
vector containing names for all control units. This will either be a character vector or NULL if units have no names
rownames
vector containing names for all treated units. This will either be a character vector or NULL if units have no names
cols
vector of integers corresponding to control units
rows
vector of integers corresponding to treated units
dimension
integer vector containing the number of treated and control units, in that order
call
function call used to create the InfinitySparseMatrix
Mark M. Fredrickson
match_on
, caliper
, exactMatch
, fullmatch
, pairmatch
(Internal) Helper function for accessing algorithms in LEMON solver
LEMON(algorithm = "CycleCancelling")
LEMON(algorithm = "CycleCancelling")
algorithm |
LEMON algorithm to use. Choices are "CycleCancelling", "CapacityScaling", "CostScaling", "NetworkSimplex". Default is "CycleCancelling". |
String of the form "LEMON.<algorithm>"
A function with which to produce matching distances, for instance Mahalanobis
distances, propensity score discrepancies or calipers, or combinations
thereof, for pairmatch
or fullmatch
to
subsequently “match on”. Conceptually, the result of a call
match_on
is a treatment-by-control matrix of distances. Because these
matrices can grow quite large, in practice match_on
produces either an
ordinary dense matrix or a special sparse matrix structure (that can make use
of caliper and exact matching constraints to reduce storage requirements).
Methods are supplied for these sparse structures,
InfinitySparseMatrix
es, so that they can be manipulated and modified
in much the same way as dense matrices.
match_on(x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, ...) ## S3 method for class 'glm' match_on( x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, standardization.scale = NULL, ... ) ## S3 method for class 'bigglm' match_on( x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, standardization.scale = NULL, ... ) ## S3 method for class 'formula' match_on( x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, subset = NULL, method = "mahalanobis", ... ) ## S3 method for class ''function'' match_on( x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, z = NULL, ... ) ## S3 method for class 'numeric' match_on(x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, z, ...) ## S3 method for class 'InfinitySparseMatrix' match_on(x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, ...) ## S3 method for class 'matrix' match_on(x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, ...)
match_on(x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, ...) ## S3 method for class 'glm' match_on( x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, standardization.scale = NULL, ... ) ## S3 method for class 'bigglm' match_on( x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, standardization.scale = NULL, ... ) ## S3 method for class 'formula' match_on( x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, subset = NULL, method = "mahalanobis", ... ) ## S3 method for class ''function'' match_on( x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, z = NULL, ... ) ## S3 method for class 'numeric' match_on(x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, z, ...) ## S3 method for class 'InfinitySparseMatrix' match_on(x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, ...) ## S3 method for class 'matrix' match_on(x, within = NULL, caliper = NULL, exclude = NULL, data = NULL, ...)
x |
A model formula, fitted glm or other object implicitly specifying a distance; see blurbs on specific methods in Details. |
within |
A valid distance specification, such as the result of
|
caliper |
The width of a caliper to use to exclude treated-control pairs
with values greater than the width. For some methods, there may be a speed
advantage to passing a width rather than using the |
exclude |
A list of units (treated or control) to exclude from the
|
data |
An optional data frame. |
... |
Other arguments for methods. |
standardization.scale |
Function for rescaling of |
subset |
A subset of the data to use in creating the distance specification. |
method |
A string indicating which method to use in computing the
distances from the data. The current possibilities are
|
z |
A logical or binary vector indicating treatment and control for each unit in the study. TRUE or 1 represents a treatment unit, FALSE of 0 represents a control unit. Any unit with NA treatment status will be excluded from the distance matrix. |
match_on
is generic. There are several supplied methods, all providing
the same basic output: a matrix (or similar) object with treated units on the
rows and control units on the columns. Each cell [i,j] then indicates the
distance from a treated unit i to control unit j. Entries that are Inf
are said to be unmatchable. Such units are guaranteed to never be in a
matched set. For problems with many Inf
entries, so called sparse
matching problems, match_on
uses a special data type that is more
space efficient than a standard R matrix
. When problems are not
sparse (i.e. dense), match_on
uses the standard matrix
type.
match_on
methods differ on the types of arguments they take, making
the function a one-stop location of many different ways of specifying
matches: using functions, formulas, models, and even simple scores. Many of
the methods require additional arguments, detailed below. All methods take a
within
argument, a distance specification made using
exactMatch
or caliper
(or some additive
combination of these or other distance creating functions). All
match_on
methods will use the finite entries in the within
argument as a guide for producing the new distance. Any entry that is
Inf
in within
will be Inf
in the distance matrix
returned by match_on
. This argument can reduce the processing time
needed to compute sparse distance matrices.
Details for each particular first type of argument follow:
First argument (x
): glm
. The model is assumed to be
a fitted propensity score model. From this it extracts distances on the
linear propensity score: fitted values of the linear predictor, the
link function applied to the estimated conditional probabilities, as opposed
to the estimated conditional probabilities themselves (Rosenbaum & Rubin,
1985). For example, a logistic model (glm
with
family=binomial()
) has the logit function as its link, so from such
models match_on
computes distances in terms of logits of the
estimated conditional probabilities, i.e. the estimated log odds.
Optionally these distances are also rescaled. The default is to rescale, by
the reciprocal of an outlier-resistant variant of the pooled s.d. of
propensity scores; see standardization_scale
. (The
standardization.scale
argument of this function can be used to
change how this dispersion is calculated, e.g. to calculate an ordinary not
an outlier-resistant s.d.; it will be passed down
to standardization_scale
as its standardizer
argument.)
To skip rescaling, set argument standardization.scale
to 1.
The overall result records
absolute differences between treated and control units on linear, possibly
rescaled, propensity scores.
In addition, one can impose a caliper in terms of these distances by
providing a scalar as a caliper
argument, forbidding matches between
treatment and control units differing in the calculated propensity score by
more than the specified caliper. For example, Rosenbaum and Rubin's (1985)
caliper of one-fifth of a pooled propensity score s.d. would be imposed by
specifying caliper=.2
, in tandem either with the default rescaling
or, to follow their example even more closely, with the additional
specification standardization.scale=sd
. Propensity calipers are
beneficial computationally as well as statistically, for reasons indicated
in the below discussion of the numeric
method.
One can also specify exactMatching criteria by using strata(foo)
inside
the formula to build the glm
. For example, passing
glm(y ~ x + strata(s))
to match_on
is equivalent to passing
within=exactMatch(y ~ strata(s))
. Note that when combining with
the caliper
argument, the standard deviation used for the caliper will be
computed across all strata, not within each strata.
If data used to fit the glm have missing values in the left-hand side
(dependent) variable, these observations are omitted from the output of
match_on. If there are observations with missing values in right hand
side (independent) variables, then a re-fit of the model after imputing
these variables using a simple scheme and adding indicator variables of
missingness will be attempted, via the scores
function.
First argument (x
): bigglm
. This method works
analogously to the glm
method, but with bigglm
objects,
created by the bigglm
function from package ‘biglm’, which
can handle bigger data sets than the ordinary glm function can.
First argument (x
): formula
. The formula must have
Z
, the treatment indicator (Z=0
indicates control group,
Z=1
indicates treatment group), on the left hand side, and any
variables to compute a distance on on the right hand side. E.g. Z ~ X1
+ X2
. The Mahalanobis distance is calculated as the square root of d'Cd,
where d is the vector of X-differences on a pair of observations and C is an
inverse (generalized inverse) of the pooled covariance of Xes. (The pooling
is of the covariance of X within the subset defined by Z==0
and
within the complement of that subset. This is similar to a Euclidean
distance calculated after reexpressing the Xes in standard units, such that
the reexpressed variables all have pooled SDs of 1; except that it addresses
redundancies among the variables by scaling down variables contributions in
proportion to their correlations with other included variables.)
Euclidean distance is also available, via method="euclidean"
, and
ranked, Mahalanobis distance, via method="rank_mahalanobis"
.
The treatment indicator Z
as noted above must either be numeric
(1 representing treated units and 0 control units) or logical
(TRUE
for treated, FALSE
for controls). (Earlier versions of
the software accepted factor variables and other types of numeric variable; you
may have to update existing scripts to get them to run.)
As an alternative to specifying a within
argument, when x
is
a formula, the strata
command can be used inside the formula to specify
exact matching. For example, rather than using within=exactMatch(y ~
z, data=data)
, you may update your formula as y ~ x + strata(z)
. Do
not use both methods (within
and strata
simultaneously. Note
that when combining with the caliper
argument, the standard
deviation used for the caliper will be computed across all strata, not
separately by stratum.
A unit with NA treatment status (Z
) is ignored and will not be included in the distance output.
Missing values in variables on the right hand side of the formula are handled as follows. By default
match_on
will (1) create a matrix of distances between observations which
have only valid values for **all** covariates and then (2) append matrices of Inf values
for distances between observations either of which has a missing values on any of the right-hand-side variables.
(I.e., observations with missing values are retained in the output, but
matches involving them are forbidden.)
First argument (x
): function
. The passed function
must take arguments: index
, data
, and z
. The
data
and z
arguments will be the same as those passed directly
to match_on
. The index
argument is a matrix of two columns,
representing the pairs of treated and control units that are valid
comparisons (given any within
arguments). The first column is the row
name or id of the treated unit in the data
object. The second column
is the id for the control unit, again in the data
object. For each of
these pairs, the function should return the distance between the treated
unit and control unit. This may sound complicated, but is simple to
use. For example, a function that returned the absolute difference between
two units using a vector of data would be f <- function(index, data,
z) { abs(data[index[,1]] - data[index[,2]]) }
. (Note: This simple case is
precisely handled by the numeric
method.)
First argument (x
): numeric
. This returns
absolute differences between treated and control units' values of x
.
If a caliper is specified, pairings with x
-differences greater than
it are forbidden. Conceptually, those distances are set to Inf
;
computationally, if either of caliper
and within
has been
specified then only information about permissible pairings will be stored,
so the forbidden pairings are simply omitted. Providing a caliper
argument here, as opposed to omitting it and afterward applying the
caliper
function, reduces storage requirements and may
otherwise improve performance, particularly in larger problems.
For the numeric method, x
must have names. If z
is named
it must have the same names as x
, though it allows for a different
ordering of names. x
's name ordering is considered canonical.
First argument (x
): matrix
or InfinitySparseMatrix
. These just return their
arguments as these objects are already valid distance specifications.
A distance specification (a matrix or similar object) which is
suitable to be given as the distance
argument to
fullmatch
or pairmatch
.
P.~R. Rosenbaum and D.~B. Rubin (1985), ‘Constructing a control group using multivariate matched sampling methods that incorporate the propensity score’, The American Statistician, 39 33–38.
fullmatch
, pairmatch
,
exactMatch
, caliper
data(nuclearplants) match_on.examples <- list() ### Propensity score distances. ### Recommended approach: (aGlm <- glm(pr~.-(pr+cost), family=binomial(), data=nuclearplants)) match_on.examples$ps1 <- match_on(aGlm) ### A second approach: first extract propensity scores, then separately ### create a distance from them. (Useful when importing propensity ### scores from an external program.) plantsPS <- predict(aGlm) match_on.examples$ps2 <- match_on(pr~plantsPS, data=nuclearplants) ### Full matching on the propensity score. fm1 <- fullmatch(match_on.examples$ps1, data = nuclearplants) fm2 <- fullmatch(match_on.examples$ps2, data = nuclearplants) ### Because match_on.glm uses robust estimates of spread, ### the results differ in detail -- but they are close enough ### to yield similar optimal matches. all(fm1 == fm2) # The same ### Mahalanobis distance: match_on.examples$mh1 <- match_on(pr ~ t1 + t2, data = nuclearplants) ### Absolute differences on a scalar: tmp <- nuclearplants$t1 names(tmp) <- rownames(nuclearplants) (absdist <- match_on(tmp, z = nuclearplants$pr, within = exactMatch(pr ~ pt, nuclearplants))) ### Pair matching on the variable `t1`: pairmatch(absdist, data = nuclearplants) ### Propensity score matching within subgroups: match_on.examples$ps3 <- match_on(aGlm, exactMatch(pr ~ pt, nuclearplants)) fullmatch(match_on.examples$ps3, data = nuclearplants) ### Propensity score matching with a propensity score caliper: match_on.examples$pscal <- match_on.examples$ps1 + caliper(match_on.examples$ps1, 1) fullmatch(match_on.examples$pscal, data = nuclearplants) # Note that the caliper excludes some units ### A Mahalanobis distance for matching within subgroups: match_on.examples$mh2 <- match_on(pr ~ t1 + t2 , data = nuclearplants, within = exactMatch(pr ~ pt, nuclearplants)) ### Mahalanobis matching within subgroups, with a propensity score ### caliper: fullmatch(match_on.examples$mh2 + caliper(match_on.examples$ps3, 1), data = nuclearplants) ### Alternative methods to matching without groups (exact matching) m1 <- match_on(pr ~ t1 + t2, data=nuclearplants, within=exactMatch(pr ~ pt, nuclearplants)) m2 <- match_on(pr ~ t1 + t2 + strata(pt), data=nuclearplants) # m1 and m2 are identical m3 <- match_on(glm(pr ~ t1 + t2 + cost, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m4 <- match_on(glm(pr ~ t1 + t2 + cost + pt, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m5 <- match_on(glm(pr ~ t1 + t2 + cost + strata(pt), data=nuclearplants, family=binomial), data=nuclearplants) # Including `strata(foo)` inside a glm uses `foo` in the model as # well, so here m4 and m5 are equivalent. m3 differs in that it does # not include `pt` in the glm.
data(nuclearplants) match_on.examples <- list() ### Propensity score distances. ### Recommended approach: (aGlm <- glm(pr~.-(pr+cost), family=binomial(), data=nuclearplants)) match_on.examples$ps1 <- match_on(aGlm) ### A second approach: first extract propensity scores, then separately ### create a distance from them. (Useful when importing propensity ### scores from an external program.) plantsPS <- predict(aGlm) match_on.examples$ps2 <- match_on(pr~plantsPS, data=nuclearplants) ### Full matching on the propensity score. fm1 <- fullmatch(match_on.examples$ps1, data = nuclearplants) fm2 <- fullmatch(match_on.examples$ps2, data = nuclearplants) ### Because match_on.glm uses robust estimates of spread, ### the results differ in detail -- but they are close enough ### to yield similar optimal matches. all(fm1 == fm2) # The same ### Mahalanobis distance: match_on.examples$mh1 <- match_on(pr ~ t1 + t2, data = nuclearplants) ### Absolute differences on a scalar: tmp <- nuclearplants$t1 names(tmp) <- rownames(nuclearplants) (absdist <- match_on(tmp, z = nuclearplants$pr, within = exactMatch(pr ~ pt, nuclearplants))) ### Pair matching on the variable `t1`: pairmatch(absdist, data = nuclearplants) ### Propensity score matching within subgroups: match_on.examples$ps3 <- match_on(aGlm, exactMatch(pr ~ pt, nuclearplants)) fullmatch(match_on.examples$ps3, data = nuclearplants) ### Propensity score matching with a propensity score caliper: match_on.examples$pscal <- match_on.examples$ps1 + caliper(match_on.examples$ps1, 1) fullmatch(match_on.examples$pscal, data = nuclearplants) # Note that the caliper excludes some units ### A Mahalanobis distance for matching within subgroups: match_on.examples$mh2 <- match_on(pr ~ t1 + t2 , data = nuclearplants, within = exactMatch(pr ~ pt, nuclearplants)) ### Mahalanobis matching within subgroups, with a propensity score ### caliper: fullmatch(match_on.examples$mh2 + caliper(match_on.examples$ps3, 1), data = nuclearplants) ### Alternative methods to matching without groups (exact matching) m1 <- match_on(pr ~ t1 + t2, data=nuclearplants, within=exactMatch(pr ~ pt, nuclearplants)) m2 <- match_on(pr ~ t1 + t2 + strata(pt), data=nuclearplants) # m1 and m2 are identical m3 <- match_on(glm(pr ~ t1 + t2 + cost, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m4 <- match_on(glm(pr ~ t1 + t2 + cost + pt, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m5 <- match_on(glm(pr ~ t1 + t2 + cost + strata(pt), data=nuclearplants, family=binomial), data=nuclearplants) # Including `strata(foo)` inside a glm uses `foo` in the model as # well, so here m4 and m5 are equivalent. m3 differs in that it does # not include `pt` in the glm.
Given a bipartite matching (object of class optmatch
),
create a logical vector of the same length indicating which units
were and were not placed into matched sets.
matched(x) unmatched() matchfailed(x)
matched(x) unmatched() matchfailed(x)
x |
Vector of class |
matched
and unmatched
indicate which elements of
x
do and do not belong to matched sets, as indicated by
their character representations in x
.
When fullmatch
has been presented with an inconsistent
combination of constraints and discrepancies between potential
matches, so that there exists no matching (i) with finite total
discrepancy within matched sets that (ii) respects the given
constraints, then the matching problem is said to be infeasible.
TRUE
s in the output of matchfailed
indicate that
this has occurred.
A logical vector (without names).
To understand the output of matchfailed
element-wise,
note that fullmatch
handles a matching problem in three
steps. First, if fullmatch
has been directed to match
within subclasses, then it divides its matching problem into a
subproblem for each subclass. Second, fullmatch
removes
from each subproblem those individual units that lack
permissible potential matches (i.e. potential matches from which
they are separated by a finite discrepancy). Such "isolated"
units are flagged in such a way as to be indicated by
unmatched
, but not by matchfailed
. Third,
fullmatch
presents each subproblem, with isolated
elements removed, to an optimal matching routine. If such a
reduced subproblem is found at this stage to be infeasible, then
each unit contributing to it is so flagged as to be indicated by
matchfailed
.
Ben Hansen
data(plantdist) mxpl.fm0 <- fullmatch(plantdist) # A feasible matching problem c(sum(matched(mxpl.fm0)), sum(unmatched(mxpl.fm0))) sum(matchfailed(mxpl.fm0)) mxpl.fm1 <- fullmatch(plantdist, # An infeasible problem max.controls=3, min.controls=3) c(sum(matched(mxpl.fm1)), sum(unmatched(mxpl.fm1))) sum(matchfailed(mxpl.fm1)) mxpl.si <- factor(c('a', 'a', 'c', rep('d',4), 'b', 'c', 'c', rep('d', 16))) names(mxpl.si) <- LETTERS[1:26] mxpl.exactmatch <- exactMatch(mxpl.si, c(rep(1, 7), rep(0, 26 - 7))) # Subclass a contains two treated units but no controls; # subclass b contains only a control unit; # subclass c contains one treated and two control units; # subclass d contains the remaining twenty units. # only valid subproblems will be used mcl <- c(1, Inf) mxpl.fm2 <- fullmatch(plantdist + mxpl.exactmatch, max.controls=mcl) sum(matched(mxpl.fm2)) table(unmatched(mxpl.fm2), matchfailed(mxpl.fm2)) mxpl.fm2[matchfailed(mxpl.fm2)] mxpl.fm2[unmatched(mxpl.fm2) & # isolated units return as !matchfailed(mxpl.fm2)] # unmatched but not matchfailed
data(plantdist) mxpl.fm0 <- fullmatch(plantdist) # A feasible matching problem c(sum(matched(mxpl.fm0)), sum(unmatched(mxpl.fm0))) sum(matchfailed(mxpl.fm0)) mxpl.fm1 <- fullmatch(plantdist, # An infeasible problem max.controls=3, min.controls=3) c(sum(matched(mxpl.fm1)), sum(unmatched(mxpl.fm1))) sum(matchfailed(mxpl.fm1)) mxpl.si <- factor(c('a', 'a', 'c', rep('d',4), 'b', 'c', 'c', rep('d', 16))) names(mxpl.si) <- LETTERS[1:26] mxpl.exactmatch <- exactMatch(mxpl.si, c(rep(1, 7), rep(0, 26 - 7))) # Subclass a contains two treated units but no controls; # subclass b contains only a control unit; # subclass c contains one treated and two control units; # subclass d contains the remaining twenty units. # only valid subproblems will be used mcl <- c(1, Inf) mxpl.fm2 <- fullmatch(plantdist + mxpl.exactmatch, max.controls=mcl) sum(matched(mxpl.fm2)) table(unmatched(mxpl.fm2), matchfailed(mxpl.fm2)) mxpl.fm2[matchfailed(mxpl.fm2)] mxpl.fm2[unmatched(mxpl.fm2) & # isolated units return as !matchfailed(mxpl.fm2)] # unmatched but not matchfailed
From a match (as produced by pairmatch
or fullmatch
)
and a distance, extract the distances of matched units from their
matched counterparts.
matched.distances(matchobj, distance, preserve.unit.names = FALSE)
matched.distances(matchobj, distance, preserve.unit.names = FALSE)
matchobj |
Value of a call to |
distance |
Either a distance matrix or the value of a call to
or |
preserve.unit.names |
Logical. If TRUE, for each matched set
|
From a match (as produced by pairmatch
or fullmatch
)
and a distance, extract the distances of matched units from their
matched counterparts.
A list of numeric vectors (or matrices) of distances, one for each matched set. Note that a matched set with 1 treatment and k controls, or with k treatments and 1 control, has k, not k+1, distances.
Ben B. Hansen
data(plantdist) plantsfm <- fullmatch(plantdist) (plantsfm.d <- matched.distances(plantsfm,plantdist,pres=TRUE)) unlist(lapply(plantsfm.d, max)) mean(unlist(plantsfm.d))
data(plantdist) plantsfm <- fullmatch(plantdist) (plantsfm.d <- matched.distances(plantsfm,plantdist,pres=TRUE)) unlist(lapply(plantsfm.d, max)) mean(unlist(plantsfm.d))
Larger calipers permit more possible matches between treated and control
groups, which can be better for creating matches with larger effective sample
sizes. The downside is that wide calipers may make the matching problem too big
for processor or memory constraints. maxCaliper
attempts to find a
caliper value, for a given vector of scores and a treatment indicator, that
will be possible given the maximum problem size constraints imposed by
fullmatch
and pairmatch
.
maxCaliper(scores, z, widths, structure = NULL, exact = TRUE)
maxCaliper(scores, z, widths, structure = NULL, exact = TRUE)
scores |
A numeric vector of scores providing 1-D position of units |
z |
Treatment indicator vector |
widths |
A vector of caliper widths to try, will be sorted largest to smallest. |
structure |
Optional factor variable that groups the scores, as would
be used by |
exact |
A logical indicating if the exact problem size should be
computed ( |
numeric The value of the largest caliper that creates a feasible
problem. If no such caliper exists in widths
, an error will be
generated.
Functions to find the largest value of min.controls, or the smallest value of max.controls, for which a full matching problem is feasible. These are determined by constraints embedded in the matching problem's distance matrix.
maxControlsCap(distance, min.controls = NULL, solver = "") minControlsCap(distance, max.controls = NULL, solver = "")
maxControlsCap(distance, min.controls = NULL, solver = "") minControlsCap(distance, max.controls = NULL, solver = "")
distance |
Either a matrix of non-negative, numeric
discrepancies, or a list of such matrices. (See
|
min.controls |
Optionally, set limits on the minimum number
of controls per matched set. (Only makes sense for
|
solver |
Choose which solver to use. See |
max.controls |
Optionally, set limits on the maximum number
of controls per matched set. (Only makes sense for
|
The function works by repeated application of full matching, so on large problems it can be time-consuming.
For minControlsCap
,
strictest.feasible.min.controls
and
given.max.controls
. For maxControlsCap
,
given.min.controls
and
strictest.feasible.max.controls
.
strictest.feasible.min.controls |
The largest values of the
|
given.max.controls |
The |
given.min.controls |
The |
strictest.feasible.max.controls |
The smallest values of
the |
Essentially this is just a line search. I've done several
things to speed it up, but not everything that might be done.
At present, not very thoroughly tested either: you might check
the final results to make sure that fullmatch
works with the values of min.controls
(or
max.controls
) suggested by these functions, and that it
ceases to work if you increase (decrease) those values.
Comments appreciated.
Ben B. Hansen
Hansen, B.B. and S. Olsen Klopfer (2006), ‘Optimal full matching and related designs via network flows’, Journal of Computational and Graphical Statistics 15, 609–627.
match_on
) Create
matching distancesDeprecated in favor of match_on
mdist(x, structure.fmla = NULL, ...) ## S3 method for class 'optmatch.dlist' mdist(x, structure.fmla = NULL, ...) ## S3 method for class ''function'' mdist(x, structure.fmla = NULL, data = NULL, ...) ## S3 method for class 'formula' mdist(x, structure.fmla = NULL, data = NULL, subset = NULL, ...) ## S3 method for class 'glm' mdist(x, structure.fmla = NULL, standardization.scale = mad, ...) ## S3 method for class 'bigglm' mdist(x, structure.fmla = NULL, data = NULL, standardization.scale = mad, ...) ## S3 method for class 'numeric' mdist(x, structure.fmla = NULL, trtgrp = NULL, ...)
mdist(x, structure.fmla = NULL, ...) ## S3 method for class 'optmatch.dlist' mdist(x, structure.fmla = NULL, ...) ## S3 method for class ''function'' mdist(x, structure.fmla = NULL, data = NULL, ...) ## S3 method for class 'formula' mdist(x, structure.fmla = NULL, data = NULL, subset = NULL, ...) ## S3 method for class 'glm' mdist(x, structure.fmla = NULL, standardization.scale = mad, ...) ## S3 method for class 'bigglm' mdist(x, structure.fmla = NULL, data = NULL, standardization.scale = mad, ...) ## S3 method for class 'numeric' mdist(x, structure.fmla = NULL, trtgrp = NULL, ...)
x |
The object to use as the basis for forming the mdist. Methods exist for formulas, functions, and generalized linear models. |
structure.fmla |
A formula denoting the treatment variable on
the left hand side and an optional grouping expression on the
right hand side. For example, |
... |
Additional method arguments. Most methods require a 'data' argument. |
data |
Data where the variables references in |
subset |
If non- |
standardization.scale |
A function to scale the distances; by
default uses |
trtgrp |
Dummy variable for treatment group membership. |
The mdist
method provides three ways to construct a
matching distance (i.e., a distance matrix or suitably organized
list of such matrices): guided by a function, by a fitted model,
or by a formula. The class of the first argument given to
mdist
determines which of these methods is invoked.
The mdist.function
method takes a function of two
arguments. When called, this function will receive the treatment
observations as the first argument and the control observations as
the second argument. As an example, the following computes the raw
differences between values of t1
for treatment units (here,
nuclear plants with pr==1
) and controls (here, plants with
pr==0
), returning the result as a distance matrix:
sdiffs <- function(treatments, controls) {
abs(outer(treatments$t1, controls$t1, `-`))
}
The mdist.function
method does similar things as the
earlier optmatch function makedist
, although the interface
is a bit different.
The mdist.formula
method computes the squared Mahalanobis
distance between observations, with the right-hand side of the
formula determining which variables contribute to the Mahalanobis
distance. If matching is to be done within strata, the
stratification can be communicated using either the
structure.fmla
argument (e.g. ~ grp
) or as part of
the main formula (e.g. z ~ x1 + x2 | grp
).
An mdist.glm
method takes an argument of class glm
as first argument. It assumes that this object is a fitted
propensity model, extracting distances on the linear propensity
score (logits of the estimated conditional probabilities) and, by
default, rescaling the distances by the reciprocal of the pooled
s.d. of treatment- and control-group propensity scores. (The
scaling uses mad
, for resistance to outliers, by default;
this can be changed to the actual s.d., or rescaling can be
skipped entirely, by setting argument
standardization.scale
to sd
or NULL
,
respectively.) A mdist.bigglm
method works analogously
with bigglm
objects, created by the bigglm
function
from package ‘biglm’, which can handle bigger data sets
than the ordinary glm function can. In contrast with
mdist.glm
it requires additional data
and
structure.fmla
arguments. (If you have enough data to
have to use bigglm
, then you'll probably have to subgroup
before matching to avoid memory problems. So you'll have to use
the structure.fmla
argument anyway.)
Object of class optmatch.dlist
, which is suitable
to be given as distance
argument to
fullmatch
or pairmatch
.
Mark M. Fredrickson
P.~R. Rosenbaum and D.~B. Rubin (1985), ‘Constructing a control group using multivariate matched sampling methods that incorporate the propensity score’, The American Statistician, 39 33–38.
fullmatch
, pairmatch
,
match_on
The exactMatch
function creates a smaller matching problem by
stratifying observations into smaller groups. For a problem that is larger
than maximum allowed size, minExactMatch
provides a way to find the
smallest exact matching problem that will allow for matching.
minExactMatch(x, scores = NULL, width = NULL, maxarcs = 1e+07, ...)
minExactMatch(x, scores = NULL, width = NULL, maxarcs = 1e+07, ...)
x |
The object for dispatching. |
scores |
Optional vector of scores that will be checked against a caliper width. |
width |
Optional width of a caliper to place on the scores. |
maxarcs |
The maximum problem size to attempt to fit. |
... |
Additional arguments for methods. |
x
is a formula of the form Z ~ X1 + X2
, where
Z
is indicates treatment or control status, and X1
and X2
are variables
can be converted to factors. Any additional arguments are passed to model.frame
(e.g., a data
argument containing Z
, X1
, and X2
).
The the arguments scores
and width
must be passed together.
The function will apply the caliper implied by the scores and the width while
also adding in blocking factors.
A factor grouping units, suitable for exactMatch
.
The data relate to the construction of 32 light water reactor (LWR) plants constructed in the U.S.A in the late 1960's and early 1970's. The data was collected with the aim of predicting the cost of construction of further LWR plants. 6 of the power plants had partial turnkey guarantees and it is possible that, for these plants, some manufacturers' subsidies may be hidden in the quoted capital costs.
nuclearplants
nuclearplants
A data frame with 32 rows and 11 columns
cost: The capital cost of construction in millions of dollars adjusted to 1976 base.
date: The date on which the construction permit was issued. The data are measured in years since January 1 1990 to the nearest month.
t1: The time between application for and issue of the construction permit.
t2: The time between issue of operating license and construction permit.
cap: The net capacity of the power plant (MWe).
pr: A binary variable where 1
indicates the prior
existence of a LWR plant at the same site.
ne: A binary variable where 1
indicates that the
plant was constructed in the north-east region of the U.S.A.
ct: A binary variable where 1
indicates the use of a
cooling tower in the plant.
bw: A binary variable where 1
indicates that the
nuclear steam supply system was manufactured by Babcock-Wilcox.
cum.n: The cumulative number of power plants constructed by each architect-engineer.
pt: A binary variable where 1
indicates those plants
with partial turnkey guarantees.
The data were obtained from the boot
package, for
which they were in turn taken from Cox and Snell (1981). Although
the data themselves are the same as those in the nuclear
data frame in the boot
package, the row names of the data
frame have been changed. (The new row names were selected to
ease certain demonstrations in optmatch
.)
This documentation page is also adapted from the boot
package, written by Angelo Canty and ported to R by Brian Ripley.
Cox, D.R. and Snell, E.J. (1981) Applied Statistics: Principles and Examples. Chapman and Hall.
This will return a list of the number of finite entries in a distance
matrix. If the distance has no subgroups, it will be a list of length 1. If
the distance has subgroups (i.e. x
is an
BlockedInfinitySparseMatrix
, it will be a named list.)
num_eligible_matches(x) ## S3 method for class 'optmatch.dlist' num_eligible_matches(x) ## S3 method for class 'matrix' num_eligible_matches(x) ## S3 method for class 'InfinitySparseMatrix' num_eligible_matches(x) ## S3 method for class 'BlockedInfinitySparseMatrix' num_eligible_matches(x)
num_eligible_matches(x) ## S3 method for class 'optmatch.dlist' num_eligible_matches(x) ## S3 method for class 'matrix' num_eligible_matches(x) ## S3 method for class 'InfinitySparseMatrix' num_eligible_matches(x) ## S3 method for class 'BlockedInfinitySparseMatrix' num_eligible_matches(x)
x |
Any distance object. |
A list counting the number of eligible matches in the distance.
The optmatch
class describes the results of an optimal full matching
(using either fullmatch
or pairmatch
). For the
most part, these objects can be treated as factors
.
The summary function quantifies optmatch
objects on the effective sample
size, the distribution of distances between matched units, and how well the
match reduces average differences.
## S3 method for class 'optmatch' summary( object, propensity.model = NULL, ..., min.controls = 0.2, max.controls = 5, quantiles = c(0, 0.5, 0.95, 1) )
## S3 method for class 'optmatch' summary( object, propensity.model = NULL, ..., min.controls = 0.2, max.controls = 5, quantiles = c(0, 0.5, 0.95, 1) )
object |
The |
propensity.model |
An optional propensity model (the result of
a call to |
... |
Additional arguments to pass to |
min.controls |
To minimize the the display of a groups with many treated and few controls, all groups with more than 5 treated units will be summarized as “5+”. This is the reciprocal of the default value (1/5 = 0.2). Lower this value to see more groups. |
max.controls |
Like |
quantiles |
A points in the ECDF at which the distances between units will be displayed. |
optmatch
objects descend from factor
.
Elements of this vector correspond to members of the treatment and control
groups in reference to which the matching problem was posed, and are named
accordingly; the names are taken from the row and column names of
distance
. Each element of the vector is either NA
, indicating
unavailability of any suitable matches for that element, or the
concatenation of: (i) a character abbreviation of the name of the subclass
(as encoded using exactMatch
) (ii) the string .
; and
(iii) a non-negative integer. In this last place, positive whole numbers
indicate placement of the unit into a matched set and NA
indicates
that all or part of the matching problem given to fullmatch
was found
to be infeasible. The functions matched
,
unmatched
, and matchfailed
distinguish these
scenarios.
Secondarily, fullmatch
returns various data about the matching
process and its result, stored as attributes of the named vector which is
its primary output. In particular, the exceedances
attribute gives
upper bounds, not necessarily sharp, for the amount by which the sum of
distances between matched units in the result of fullmatch
exceeds
the least possible sum of distances between matched units in a feasible
solution to the matching problem given to fullmatch
. (Such a bound
is also printed by print.optmatch
and summary.optmatch
.)
optmatch.summary
Returns the restrictions which were used to generate the match.
optmatch_restrictions(obj)
optmatch_restrictions(obj)
obj |
An optmatch object |
If mean.controls
was explicitly specified in the creation of the
optmatch object, it is returned; otherwise omit.fraction
is given.
Note that if the matching algorithm attempted to recover from initial infeasible restrictions, the output from this function may not be the same as the original function call.
A list of min.controls
, max.controls
and either
omit.fraction
or mean.controls
.
x
and y
can be
distances (InfinitySparseMatrix
, BlockedInfinitySparseMatrix
,
or DenseMatrix
), or they can be optmatch
objects.To save space, optmatch
objects merely store a hash of the distance
matrix instead of the original object. Any distance objects are hashed before
comparison.
optmatch_same_distance(x, y)
optmatch_same_distance(x, y)
x |
A distances ( |
y |
A distances ( |
Note that the distance is hashed with its call
set to NULL
.
(This avoids issues where, for example, match_on(Z~X, data=d,
caliper=NULL)
and match_on(Z~X, data=d)
produce identical matches but
have differing call
s.)
Boolean whether the two distance specifications are identical.
Over the course of time, several functions in optmatch have been removed in favor of new interfaces and functions.
pscore.dist(...) mahal.dist(...)
pscore.dist(...) mahal.dist(...)
... |
All arguments ignored. |
Given a treatment group, a larger control reservoir, and a method for creating discrepancies between each treatment and control unit (or optionally an already created such discrepancy matrix), finds a pairing of treatment units to controls that minimizes the sum of discrepancies.
pairmatch(x, controls = 1, data = NULL, remove.unmatchables = FALSE, ...) pair(x, controls = 1, data = NULL, remove.unmatchables = FALSE, ...)
pairmatch(x, controls = 1, data = NULL, remove.unmatchables = FALSE, ...) pair(x, controls = 1, data = NULL, remove.unmatchables = FALSE, ...)
x |
Any valid input to Alternatively, a precomputed distance may be entered. |
controls |
The number of controls to be matched to each treatment |
data |
Optional data set. |
remove.unmatchables |
Should treatment group members for which there are no eligible controls be removed prior to matching? |
... |
Additional arguments to pass to |
This is a wrapper to fullmatch
; see its documentation for more
information, especially on additional arguments to pass, additional discussion
of valid input for parameter x
, and feasibility recovery.
If remove.unmatchables
is FALSE
, then if there are unmatchable
treated units then the matching as a whole will fail and no units will be
matched. If TRUE
, then this unit will be removed and the function will
attempt to match each of the other treatment units. As of version 0.9-8,
if there are fewer matchable treated units than matchable controls then
pairmatch
will attempt to place each into a matched pair each of the
matchable controls and a strict subset of the matchable treated units.
(Previously matching would have failed for subclasses of this structure.)
Matching can still fail,
even with remove.unmatchables
set to TRUE
,
if there is too much competition for certain controls; if you
find yourself in that situation you should consider full matching, which
necessarily finds a match for everyone with an eligible match somewhere.
The units of the optmatch
object returned correspond to members of the
treatment and control groups in reference to which the matching problem was
posed, and are named accordingly; the names are taken from the row and column
names of distance
(with possible additions from the optional
data
argument). Each element of the vector is the concatenation of:
(i) a character abbreviation of subclass.indices
, if that argument was
given, or the string 'm
' if it was not; (ii) the string .
; and
(iii) a non-negative integer. Unmatched units have NA
entries.
Secondarily, fullmatch
returns various data about the matching process
and its result, stored as attributes of the named vector which is its primary
output. In particular, the exceedances
attribute gives upper bounds,
not necessarily sharp, for the amount by which the sum of distances between
matched units in the result of fullmatch
exceeds the least possible sum
of distances between matched units in a feasible solution to the matching
problem given to fullmatch
. (Such a bound is also printed by
print.optmatch
and by summary.optmatch
.)
A optmatch
object (factor
) indicating matched groups.
Hansen, B.B. and Klopfer, S.O. (2006), ‘Optimal full matching and related designs via network flows’, Journal of Computational and Graphical Statistics, 15, 609–627.
data(nuclearplants) ### Pair matching on a Mahalanobis distance ( pm1 <- pairmatch(pr ~ t1 + t2, data = nuclearplants) ) summary(pm1) ### Pair matching within a propensity score caliper. ppty <- glm(pr ~ . - (pr + cost), family = binomial(), data = nuclearplants) ### For more complicated models, create a distance matrix and pass it to fullmatch. mhd <- match_on(pr ~ t1 + t2, data = nuclearplants) + caliper(match_on(ppty), 2) ( pm2 <- pairmatch(mhd, data = nuclearplants) ) summary(pm2) ### Propensity balance assessment. Requires RItools package. if(require(RItools)) summary(pm2, ppty) ### 1:2 matched triples ( tm <- pairmatch(pr ~ t1 + t2, controls = 2, data = nuclearplants) ) summary(tm) ### Creating a data frame with the matched sets attached. ### match_on(), caliper() and the like cooperate with pairmatch() ### to make sure observations are in the proper order: all.equal(names(tm), row.names(nuclearplants)) ### So our data frame including the matched sets is just cbind(nuclearplants, matches=tm) ### In contrast, if your matching distance is an ordinary matrix ### (as earlier versions of optmatch required), you'll ### have to align it by observation name with your data set. cbind(nuclearplants, matches = tm[row.names(nuclearplants)]) ### Match in subgroups only. There are a few ways to specify this. m1 <- pairmatch(pr ~ t1 + t2, data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m2 <- pairmatch(pr ~ t1 + t2 + strata(pt), data=nuclearplants) ### Matching on propensity scores within matching in subgroups only: m3 <- pairmatch(glm(pr ~ t1 + t2, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m4 <- pairmatch(glm(pr ~ t1 + t2 + pt, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m5 <- pairmatch(glm(pr ~ t1 + t2 + strata(pt), data=nuclearplants, family=binomial), data=nuclearplants) # Including `strata(foo)` inside a glm uses `foo` in the model as # well, so here m4 and m5 are equivalent. m3 differs in that it does # not include `pt` in the glm.
data(nuclearplants) ### Pair matching on a Mahalanobis distance ( pm1 <- pairmatch(pr ~ t1 + t2, data = nuclearplants) ) summary(pm1) ### Pair matching within a propensity score caliper. ppty <- glm(pr ~ . - (pr + cost), family = binomial(), data = nuclearplants) ### For more complicated models, create a distance matrix and pass it to fullmatch. mhd <- match_on(pr ~ t1 + t2, data = nuclearplants) + caliper(match_on(ppty), 2) ( pm2 <- pairmatch(mhd, data = nuclearplants) ) summary(pm2) ### Propensity balance assessment. Requires RItools package. if(require(RItools)) summary(pm2, ppty) ### 1:2 matched triples ( tm <- pairmatch(pr ~ t1 + t2, controls = 2, data = nuclearplants) ) summary(tm) ### Creating a data frame with the matched sets attached. ### match_on(), caliper() and the like cooperate with pairmatch() ### to make sure observations are in the proper order: all.equal(names(tm), row.names(nuclearplants)) ### So our data frame including the matched sets is just cbind(nuclearplants, matches=tm) ### In contrast, if your matching distance is an ordinary matrix ### (as earlier versions of optmatch required), you'll ### have to align it by observation name with your data set. cbind(nuclearplants, matches = tm[row.names(nuclearplants)]) ### Match in subgroups only. There are a few ways to specify this. m1 <- pairmatch(pr ~ t1 + t2, data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m2 <- pairmatch(pr ~ t1 + t2 + strata(pt), data=nuclearplants) ### Matching on propensity scores within matching in subgroups only: m3 <- pairmatch(glm(pr ~ t1 + t2, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m4 <- pairmatch(glm(pr ~ t1 + t2 + pt, data=nuclearplants, family=binomial), data=nuclearplants, within=exactMatch(pr ~ pt, data=nuclearplants)) m5 <- pairmatch(glm(pr ~ t1 + t2 + strata(pt), data=nuclearplants, family=binomial), data=nuclearplants) # Including `strata(foo)` inside a glm uses `foo` in the model as # well, so here m4 and m5 are equivalent. m3 differs in that it does # not include `pt` in the glm.
This matrix gives discrepancies between light water nuclear power plants of two types, seven built on the site of an existing plant and 19 built on new sites. The discrepancies summarize differences in two covariates that are predictive of the cost of building a plant.
plantdist
plantdist
A matrix with 7 rows and 19 columns
The data appear in Cox, D.R. and Snell, E.J. (1981), Applied Statistics: Principles and Examples, p.82 (Chapman and Hall), and are due to W.E. Mooz.
Rosenbaum, P.R. (2002), Observational Studies, Second Edition, p.307 (Springer).
The CBPS package fits ‘covariate balancing propensity score’ for use in propensity score
weighting. In the binary treatment case they can also be used for matching. This method helps to
implement that process in a manner consistent with use of propensity scores elsewhere in optmatch; see
scores
documentation.
## S3 method for class 'CBPS' predict(object, newdata = NULL, type = c("link", "response"), ...)
## S3 method for class 'CBPS' predict(object, newdata = NULL, type = c("link", "response"), ...)
object |
A CBPS object |
newdata |
Unused. |
type |
Return inverse logits of fitted values (the default) or fitted values themselves |
... |
Unused. |
Inverse logit of the fitted values.
optmatch
objects.Printing optmatch
objects.
## S3 method for class 'optmatch' print(x, quote = FALSE, grouped = FALSE, ...)
## S3 method for class 'optmatch' print(x, quote = FALSE, grouped = FALSE, ...)
x |
|
quote |
A boolean indicating if the matched group names should be quoted or not (default is not to quote). |
grouped |
A logical indicating if the object should printed in
the style of a named |
... |
Arguments passed to |
fullmatch
, pairmatch
,
print
, summary.optmatch
data(nuclearplants) fm <- fullmatch(pr ~ t1 + t2, data = nuclearplants) print(fm) print(fm, grouped = TRUE)
data(nuclearplants) fm <- fullmatch(pr ~ t1 + t2, data = nuclearplants) print(fm) print(fm, grouped = TRUE)
(Internal) Helper function to create an InfinitySparseMatrix from a set of scores, a treatment indicator, and a caliper width.
scoreCaliper(x, z, caliper)
scoreCaliper(x, z, caliper)
x |
The scores, a vector indicating the 1-D location of each unit. |
z |
The treatment assignment vector (same length as |
caliper |
The width of the caliper with respect to the scores
|
An InfinitySparseMatrix
object, suitable to be
passed to match_on
as an within
argument.
This is a wrapper for predict
, adapted for use in matching. Given a
fitted model but no explicit newdata
to ‘predict’ from, it
constructs its own newdata
in a manner that's generally better suited
for matching.
scores(object, newdata = NULL, ...)
scores(object, newdata = NULL, ...)
object |
fitted model object determining scores to be generated. |
newdata |
(optional) data frame containing variables with which scores are produced. |
... |
additional arguments passed to |
Like predict
, its default predictions from a glm
are on
the scale of the linear predictor, not the scale of the response; see
Rosenbaum \ Rubin (1985). (This default can
be overridden by specifying type="response"
.)
In contrast to predict
, if scores
isn't given an explicit
newdata
argument then it attempts to reconstruct one from the context
in which it is called, rather than from its first argument. For example, if
it's called within the formula
argument of a call to glm
, its
newdata
is the same data frame that glm
evaluates that formula
in, as opposed to the model frame associated with object
. See
Examples.
The handling of missing independent variables also differs from that of
predict
in two ways. First, if the data used to generate object
has NA
values, they're mean-imputed using
fill.NAs
. Secondly, if newdata
(either the explicit
argument, or the implicit data generated from object
) has NA
values, they're likewise mean-imputed using fill.NAs
. Also,
missingness flags are added to the formula of object
, which is then
re-fit, using fill.NAs
, prior to calling predict
.
If newdata
is specified and contains no missing data, scores
returns the same value as predict
.
See individual predict
functions.
Josh Errickson
P.~R. Rosenbaum and D.~B. Rubin (1985), ‘Constructing a control group using multivariate matched sampling methods that incorporate the propensity score’, The American Statistician, 39 33–38.
data(nuclearplants) pg <- lm(cost~., data=nuclearplants, subset=(pr==0)) # The following two lines produce identical results. ps1 <- glm(pr~cap+date+t1+bw+predict(pg, newdata=nuclearplants), data=nuclearplants) ps2 <- glm(pr~cap+date+t1+bw+scores(pg), data=nuclearplants)
data(nuclearplants) pg <- lm(cost~., data=nuclearplants, subset=(pr==0)) # The following two lines produce identical results. ps1 <- glm(pr~cap+date+t1+bw+predict(pg, newdata=nuclearplants), data=nuclearplants) ps2 <- glm(pr~cap+date+t1+bw+scores(pg), data=nuclearplants)
Helper function to ease setting the largest problem size to be
accepted by pairmatch
or fullmatch
.
setMaxProblemSize(size = Inf)
setMaxProblemSize(size = Inf)
size |
Positive integer, or |
The function sets the optmatch_max_problem_size global option. The
option ships with the option pre-set to a value that's relatively small,
smaller than what most modern computers can handle. Invoking this
function with no argument
re-sets the optmatch_max_problem_size option to Inf
, effectively
disabling checks on problem size. Unless you're working with an older
computer, it probably makes sense for most users to do this, at least
until they determine what problem sizes are too large for their machines.
(You'll know that when your R crashes, or simply takes too long for
your taste.)
To determine the size of a problem without subproblems, i.e. exact
matching categories, use match_on
to set up and store
the problem distance, then apply length
to the result. If
there were exact matching constraints imposed during the creation
of the distance, then you'll want to look at the largest size of a
subproblem. Apply findSubproblems
to your distance,
creating a list, say dlist
, of your distances; then do
sapply(dlist, length)
to determine the sizes of the subproblems.
Ben B. Hansen
Displays each block of the BlockedInfinitySparseMatrix separately.
## S4 method for signature 'BlockedInfinitySparseMatrix' show(object)
## S4 method for signature 'BlockedInfinitySparseMatrix' show(object)
object |
An BlockedInfinitySparseMatrix to print. |
Specifically, displays an InfinitySparseMatrix by converting it to a matrix first.
## S4 method for signature 'InfinitySparseMatrix' show(object)
## S4 method for signature 'InfinitySparseMatrix' show(object)
object |
An InfinitySparseMatrix to print. |
Internally, an InfinitySparseMatrix
(Blocked or non) comprises of
vectors of values, row positions, and column positions. The ordering of
these vectors is not enforced. This function sorts the internal structure,
leaving the external structure unchanged (e.g. as.matrix(ism)
and
as.matrix(sort(ism))
will look identical despite sorting.)
## S3 method for class 'InfinitySparseMatrix' sort(x, decreasing = FALSE, ..., byCol = FALSE) ## S3 method for class 'BlockedInfinitySparseMatrix' sort(x, decreasing = FALSE, ..., byCol = FALSE)
## S3 method for class 'InfinitySparseMatrix' sort(x, decreasing = FALSE, ..., byCol = FALSE) ## S3 method for class 'BlockedInfinitySparseMatrix' sort(x, decreasing = FALSE, ..., byCol = FALSE)
x |
An |
decreasing |
Logical. Should the sort be increasing or decreasing?
Default |
... |
Additional arguments ignored. |
byCol |
Logical. Defaults to |
By default, the InfinitySparseMatrix
is row-dominant, meaning the row
positions are sorted first, then column positions are sorted within each
row. Use argument byCol
to change this.
An object of the same class as x
which is sorted according to
byCol
.
This is a special function used only in identifying the strata variables
when defining an exactMatch
during a call to fullmatch
,
pairmatch
, or match_on
. It should not be called externally.
strata(...)
strata(...)
... |
any number of variables of the same length. |
the variables with appropriate labels
data(nuclearplants) fullmatch(pr ~ cost + strata(pt), data = nuclearplants)
data(nuclearplants) fullmatch(pr ~ cost + strata(pt), data = nuclearplants)
Tabulate treatment:control ratios occurring in matched sets, and the frequency of their occurrence.
stratumStructure(stratum, trtgrp = NULL, min.controls = 0, max.controls = Inf) ## S3 method for class 'optmatch' stratumStructure(stratum, trtgrp, min.controls = 0, max.controls = Inf) ## Default S3 method: stratumStructure(stratum, trtgrp, min.controls = 0, max.controls = Inf) ## S3 method for class 'stratumStructure' print(x, ...)
stratumStructure(stratum, trtgrp = NULL, min.controls = 0, max.controls = Inf) ## S3 method for class 'optmatch' stratumStructure(stratum, trtgrp, min.controls = 0, max.controls = Inf) ## Default S3 method: stratumStructure(stratum, trtgrp, min.controls = 0, max.controls = Inf) ## S3 method for class 'stratumStructure' print(x, ...)
stratum |
|
trtgrp |
Dummy variable for treatment group membership. (Not
required if |
min.controls |
For display, the number of treatment group
members per stratum will be truncated at the reciprocal of
|
max.controls |
For display, the number of control group
members will be truncated at |
x |
stratumStructure object to be printed. |
... |
Additional arguments to |
A table showing frequency of occurrence of those treatment:control ratios that occur.
The ‘effective sample size’ of the stratification, in
matched pairs. Given as an attribute of the table, named
‘comparable.num.matched.pairs
’; see Note.
For comparing treatment and control groups both of size 10,
say, a stratification consisting of two strata, one with 9
treatments and 1 control, has a smaller ‘effective sample
size’, intuitively, than a stratification into 10 matched pairs,
despite the fact that both contain 20 subjects in
total. stratumStructure
first summarizes this aspect of
the structure of the stratification it is given, then goes on to
identify one number as the stratification's effective sample
size. The ‘comparable.num.matched.pairs
’
attribute returned by stratumStructure
is the sum of
harmonic means of the sizes of the treatment and control
subgroups of each stratum, a general way of calibrating such
differences as well as differences in the number of subjects
contained in a stratification. For example, by this metric the
9:1, 1:9 stratification is comparable to 3.6 matched pairs.
Why should effective sample size be calculated this way? The
phrase ‘effective sample size’ suggests the observations
are taken to be similar in information content. Modeling them
as random variables, this suggests that they be assumed to have
the same variance, , conditional on what
stratum they reside in. If that is the case, and if also
treatment and control observations differ in expectation by a
constant that is the same for each stratum, then it can be shown
that the optimum weights with which to combine treatment-control
contrasts across strata,
, are proportional to the
stratum-wise harmonic means of treatment and control counts,
(Kalton, 1968). The thus-weighted
average of contrasts then has variance
. This motivates the use of
as a measure of effective sample size (Hansen, 2011).
Somewhat different motivations of the same calculation appear
in Hansen (2004) and in Hansen and Bowers (2008). Since for a
matched pair
,
,
can be thought of as the number of matched pairs
needed to attain comparable precision.
Ben B. Hansen
Kalton, G. (1968), ‘Standardization: A technique to control for extraneous variables’, Applied Statistics, 17, 118–136.
Hansen, B.B. (2004), ‘Full Matching in an Observational Study of Coaching for the SAT’, Journal of the American Statistical Association, 99, 609–618.
Hansen B.B. and Bowers, J. (2008), ‘Covariate balance in simple, stratified and clustered comparative studies’, Statistical Science, 23 (2), 219–236.
Hansen, B.B. (2011), ‘Propensity score matching to extract latent experiments from nonexperimental data: A case study’. Ch. 9 of Looking Backwards: Proceedings from a Conference in Honor of Paul W. Holland, Springer.
data(plantdist) plantsfm <- fullmatch(plantdist) # A full match with unrestricted # treatment-control balance plantsfm1 <- fullmatch(plantdist,min.controls=2, max.controls=3) stratumStructure(plantsfm) stratumStructure(plantsfm1) stratumStructure(plantsfm, max.controls=4)
data(plantdist) plantsfm <- fullmatch(plantdist) # A full match with unrestricted # treatment-control balance plantsfm1 <- fullmatch(plantdist,min.controls=2, max.controls=3) stratumStructure(plantsfm) stratumStructure(plantsfm1) stratumStructure(plantsfm, max.controls=4)
Returns a list containing the dimensions of all valid subproblems.
subdim(x) ## S3 method for class 'InfinitySparseMatrix' subdim(x) ## S3 method for class 'matrix' subdim(x) ## S3 method for class 'BlockedInfinitySparseMatrix' subdim(x) ## S3 method for class 'optmatch.dlist' subdim(x)
subdim(x) ## S3 method for class 'InfinitySparseMatrix' subdim(x) ## S3 method for class 'matrix' subdim(x) ## S3 method for class 'BlockedInfinitySparseMatrix' subdim(x) ## S3 method for class 'optmatch.dlist' subdim(x)
x |
A distance specification to get the sub-dimensions of. |
A data frame listing the dimensions of each valid subproblem. Any subproblems with 0 controls or 0 treatments will be ignored. The names of the entries in the list will be the names of the subproblems, if they exist. There will be two rows, named "treatments" and "controls".
em <- exactMatch(pr ~ pt, data=nuclearplants) m1 <- fullmatch(pr ~ t1 + t2, within=em, data=nuclearplants) stratumStructure(m1) (subdims_em <- subdim(em)) m2 <- fullmatch(pr ~ t1 + t2, within=em, data=nuclearplants, mean.controls=pmin(1.5, subdims_em["controls",] / subdims_em["treatments",]) ) stratumStructure(m2)
em <- exactMatch(pr ~ pt, data=nuclearplants) m1 <- fullmatch(pr ~ t1 + t2, within=em, data=nuclearplants) stratumStructure(m1) (subdims_em <- subdim(em)) m2 <- fullmatch(pr ~ t1 + t2, within=em, data=nuclearplants, mean.controls=pmin(1.5, subdims_em["controls",] / subdims_em["treatments",]) ) stratumStructure(m2)
This matches the syntax and semantics of subset for matrices.
## S3 method for class 'InfinitySparseMatrix' subset(x, subset, select, ...) ## S4 method for signature 'InfinitySparseMatrix' x[i, j = NULL, ..., drop = TRUE] ## S4 replacement method for signature 'InfinitySparseMatrix' x[i, j] <- value
## S3 method for class 'InfinitySparseMatrix' subset(x, subset, select, ...) ## S4 method for signature 'InfinitySparseMatrix' x[i, j = NULL, ..., drop = TRUE] ## S4 replacement method for signature 'InfinitySparseMatrix' x[i, j] <- value
x |
InfinitySparseMatrix to be subset or bound. |
subset |
Logical expression indicating rows to keep. |
select |
Logical expression indicating columns to keep. |
... |
Other arguments are ignored. |
i |
Row indices. |
j |
Col indices. |
drop |
Ignored. |
value |
replacement values |
An InfinitySparseMatrix with only the selected elements.
Mark Fredrickson
Given a distance matrix, return information above it, including
dimension, sparsity information, unmatchable members, summary of
finite distances, and, in the case of
BlockedInfinitySparseMatrix
, block structure.
## S3 method for class 'InfinitySparseMatrix' summary(object, ..., distanceSummary = TRUE) ## S3 method for class 'BlockedInfinitySparseMatrix' summary( object, ..., distanceSummary = TRUE, printAllBlocks = FALSE, blockStructure = TRUE ) ## S3 method for class 'DenseMatrix' summary(object, ..., distanceSummary = TRUE)
## S3 method for class 'InfinitySparseMatrix' summary(object, ..., distanceSummary = TRUE) ## S3 method for class 'BlockedInfinitySparseMatrix' summary( object, ..., distanceSummary = TRUE, printAllBlocks = FALSE, blockStructure = TRUE ) ## S3 method for class 'DenseMatrix' summary(object, ..., distanceSummary = TRUE)
object |
A |
... |
Ignored. |
distanceSummary |
Default |
printAllBlocks |
If |
blockStructure |
If |
The output consists of several pieces.
Membership: Indicates the dimension of the distance.
Total (in)eligible potential matches: A measure of the sparsity of
the distance. Eligible matches have a finite distance between treatment and
control members; they could be matched. Ineligible matches have Inf
distance and can not be matched. A higher number of ineligible matches can
speed up matching, but runs the risk of less optimal overall matching
results.
Unmatchable treatment/control members: If any observations have no
eligible matches (e.g. their distance to every potential match is
Inf
) they are listed here. See Value below for details of how to
access lists of matchable and unmatchable treatment and control members.
Summary of minimum matchable distance per treatment member: To assist with choosing a caliper, this is a numeric summary of the smallest distance per matchable treatment member. If you provide a caliper that is less than the maximum value, at least one treatment member will become unmatchable.
Block structure: For BlockedInfinitySparseMatrix
, a quick
summary of the structure of each individual block. (The above will all be
across all blocks.) This may indicate which blocks, if any, are problematic.
A named list
. The summary for an
InfinitySparseMatrix
or DenseMatrix
contains the
following:
total: |
Contains the total number of treatment and control members, as well as eligible and ineligible matches. |
matchable: |
The names of all treatment and control members with at least one eligible match. |
unmatchable: |
The names of all treatment and control members with no eligible matches. |
distances: |
The summary of minimum matchable distances, if |
For BlockedInfinitySparseMatrix
, the named list
instead of contains one entry per block, named after each block
(i.e. the value of the blocking variable) as well as a block
named 'overall' which contains the summary ignoring blocks. Each
of these entries contains a list
with entries 'total',
'matchable', 'unmatchable' and 'distances', as described above.
optmatch
object.NB: THIS CODE IS CURRENTLY VERY MUCH ALPHA AND SOMEWHAT UNTESTED, ESPECIALLY CALLING update
ON AN
OPTMATCH OBJECT CREATED WITHOUT AN EXPLICIT DISTANCE ARGUMENT.
## S3 method for class 'optmatch' update(object, ...)
## S3 method for class 'optmatch' update(object, ...)
object |
|
... |
Additional arguments to the call, or arguments with changed values. |
Note that passing data
again is strongly recommended. A warning will be printed if the hash of the data used to generate the
optmatch
object differs from the hash of the new data
.
To obtain an updated call without performing the actual update, pass an additional evaluate = FALSE
argument.
An updated optmatch
object.