predict
. (#223)@call
slot of objects may look slighly different (but
should function identically). (#234)bibentry()
.Minor revision to address a failing test.
match_on()
, now more simply calculates the contrast to enable more
intuitive results. (Thanks Noah Greifer, #220)dbind()
will now properly support binding more than 26 unique matrices when
renaming is necessary; in fact it supports up to 18,278 uniquely renamed
matrices.match_on()
using the argument
method = "rank_mahalanobis"
was accidentally returning the squared distance
rather than the distance. This has been fixed. To recover results using the
squared distance, square the results, e.g.: match_on(..., method = "rank_mahalanobis")^2
. (Thanks Noah Greifer #218)as.list.BlockedInfinitySparseMatrix()
to split a single
BlockedInfinitySparseMatrix
into a list
of InfinitySparseMatrix
based
upon the separate blocks. (Called via as.list(b)
when b
is a
BlockedInfinitySparseMatrix
.)dbind()
for binding several distance matrices into a single
BlockedInfinitySparseMatrix
. Valid inputs include any distance
convertible into an InfinitySparseMatrix
, or BlockedInfinitySparseMatrix
,
or list
s of these. (#65)License_is_FOSS
and License_restricts_use
flags after
0.10.0 transition to an open license.optmatch::strata
to be used in place of survival::strata
.
Loading survival and masking strata
should not cause issues either.(Note: 0.10.1 and 0.10.2 were functionally equivalent releases updated to
address an issue with CRAN and the License_is_FOSS
and License_restricts_use
flags.)
help(fullmatch)
for a discussion on those, and the new argument to
fullmatch()
, solver =
.survey::svyglm()
(#194)survey::mad
and survey::med
interfacewithin=
arguments to match_on()
, or functions calling
match_on()
such as pairmatch()
or fullmatch()
, were sometimes ignored
(#181).fullmatch()
or pairmatch()
found it infeasible to create
matches within an exact matching category, under some circumstances all
members of that category were being placed into a single category labeled
1.NA
, or 2.NA
etc. Instead, all members of that category are now NA
(#203).match_on()
, scores()
to misinterpret propensity or
other scores fitted with survey::svyglm()
(#194).boxplot()
gains a method for svyglm
objects, e.g. propensity score models
fitted with case weights via survey::svyglm()
(#194).match_on.glm()
's arguments has changed slightly: to
circumvent scale standardization when matching on a propensity score or other
index, you should now pass standardization.scale = 1
, not
standardization.scale = NULL
(#194).summary.optmatch()
to fail b/c of NAs in the treatment
variable (#155).exclude
argument to match_on()
mirroring the exclude
argument for
caliper()
.Optmatch
objects now support an update()
function, update.Optmatch()
.
(#54)Optmatch
objects can be combined via a c()
function, c.Optmatch()
. (#68)labelled
treatment vectors which often arise when
importing from Stata or SPSS. (#159)matchfailed()
. (#175)summary.optmatch()
.if(vectorOfThings)
usage that will give an error in upcoming R
release.controls
times the number of treatments, it now attempts to match in
that stratum by leaving out some of the treatment units. (#116)treatment_new = treatment == "T"
.data
argument is excluded from fullmatch()
or
pairmatch()
and num_NA
> 0 entries in the treatment status vector are NA,
then the length of the vector produced by fullmatch()
or pairmatch()
won't
match the length of the treatment status vector, having num_NA
fewer
observations. Don't forget to pass a data
argument!min.controls
/mean.controls
/max.controls
directives would have been mistakenly applied to the wrong subclasses,
resulting in strange warnings and, potentially, spurious match failures or
unintended structural restrictions in some subclasses (#129).fullmatch()
to automatically fail. I.e. we've restored the behaviour of the
software prior to version 0.8. (#132)summary()
methods for InfinitySparseMatrix
(summary.InfinitySparseMatrix()
), BlockedInfinitySparseMatrix
,
(summary.BlockedInfinitySparseMatrix()
) and DenseMatrix
(summary.DenseMatrix()
). I.e., you can call summary()
on the result of a
call to match_on()
or caliper()
. The information this returns may be
useful for selecting caliper widths, and for managing computational burdens
with large matching problems.pairmatch()
, fullmatch()
or
match_on()
, then the factor "fac" will both serve as an independent variable
for the propensity model and an exact matching variable (#101). See the
examples on the help documentation for fullmatch()
.pairmatch()
and fullmatch()
no longer generate "matched.distances"
attributes for their results. To get this information, use
matched.distances()
.fill.NAs()
directly to glm()
or
similar. Use the traditional formula and data
argument version. See help
documentation for fill.NAs()
for examples.boxplot()
method
for fitted propensities ignoring varwidth argument (#113); various minor
issues affecting package development and deployment (#110,...).stratumStructure()
.contr.match_on()
, a new default contrasts function for making
Mahalanobis and Euclidean distances. Previously we used R defaults, which (a)
generated different answers for the same factor depending on the ordering of
the levels and (b) led to different distances for {0,1}-valued numeric
variables and two level factors. (#80)fullmatch()
with feasible combinations of min.controls
,
mean.controls
/max.controls
and max.controls
(#92)fullmatch()
or
pairmatch()
to create distance specifications directly.glm()
method for match_on()
that caused observations
with fixable NAs to be dropped too often.distUnion()
allows combining arbitrary distance specifications.antiExactMatch()
provides for matches that may only occur
between treated and control units with different values on a factor
variable. This is the opposite of exactMatch()
, which ensures matches occur
within factor levels.data
argument in more cases when using the summary()
method
when the RItools package is present.omit.fraction
argument when there
are unmatched controls.minExactMatch()
function.optmatch_verbose_message
option to provide additional warnings.fullmatch()
.caliper()
function that allows returning values that fit
the caliper instead of just indicators of which entries fit the caliper width.match_on()
.Optmatch
objects now preserves (and subsets) the subproblem
attribute.Solver limits now depend on machine limits, not arbitrary constants defined by the optmatch maintainers. For large problems, users will see a warning, but the solver will attempt to solve.
fullmatch()
and pairmatch()
can now take distance generating arguments
directly, instead of having to first call match_on()
. See the documentation
for these two functions for more details.
Infeasibility recovery in fullmatch()
. When passing a combination of
constraints (e.g. max.controls
) that would make the matching infeasible,
fullmatch()
will now attempt to find a feasible match that respects those
constraints, which will likely result in omitting some controls units.
An additional argument to fullmatch()
, mean.controls
, is an alternative to
the previous omit.fraction
. (Only one of the two arguments can be
presented.) The match will attempt to average mean.controls number of controls
per treatment.
Each Optmatch
object now carries with it the constraints used to generate it
(e.g. max.controls
) as well as a hashed version of the distance it matched
up, to help with some debugging/error checking but avoiding having to carry
the entire distance matrix around.
Creating a distance matrix prior to matching is now optional. fullmatch()
now accepts arguments from which match_on()
would create a distance, and
create the match behind the scenes.
Performance enhancements for distance calculations.
Several new utility functions, including subdim()
,
optmatch_restrictions()
, optmatch_same_distance()
,
num_eligible_matches()
. See their help documentation for additional details.
Arithmetic operations between InfinitySparseMatrices and vectors are supported. The operation is carried out as column by vector steps.
scores()
function allows including model predictions (such as propensity
scores) in formulas directly (such as combining multiple propensity scores).
The scores()
function is preferred to predict() as it makes several smart
choices to avoid dropping observations due to partial missingness and other
useful preparations for matching.
match_on()
is now a S3 generic function, which solves several bugs using
propensity models from other packages.
summary()
method was giving overly pessimistic warnings about failures.
fixed bug in how Optmatch
objects were printing.
mdist()
is now deprecated, in favor of match_on()
.full()
and pair()
are now aliases to fullmatch()
and pairmatch()
All match_on()
methods take caliper
arguments (formerly just the numeric
method and derived methods had this argument).
boxplot methods for fitted propensity score methods (glm()
and bigglm()
)
fill.NAs()
now takes contrasts.arg
argument to mimic model.matrix()
Several bug fixes in examples, documentation
The methods pscore.dist()
and mahal.dist()
are now deprecated, with useful
error messages pointing users to replacements.
Significant performance improvements for sparse matching problems.
Functions umatched()
and matched()
were backwards. Corrected.
More efficient data structure for sparse matching problems, those with
relatively few allowed (finite) distances between units. Sparse problems often
arise when calipers are employed. The new data structure
(InfinitySparseMatrix
) behaves like a simple matrix, allowing cbind()
,
rbind()
, and subset()
operations, making it easier to work with the older
optmatch.dlist
data structure.
match_on()
: A series of methods to generate matching problems using the new
data structure when appropriate, or using a standard matrix when the problem
is dense. This function is being deployed along side the mdist()
function to
provide complete backward compatibility. New development will focus on this
function for distance creation, and users are encouraged to use it right away.
One difference for mdist()
users is the within
argument. This argument
takes an existing distance specification and limits the new comparisons to
only those pairs that have finite distances in the within
argument. See the
match_on()
, exactMatch()
, and caliper()
documentation for more details.
exactMatch()
: A new function to create stratified matching problems (in
which cross strata matches are forbidden). Users can specify the strata using
either a factor vector or a convenient formula interface. The results can be
used in calls match_on()
to limit distance calculations to only with-in
strata treatment-control pairs.
New data
argument to fullmatch()
and pairmatch()
: This argument will set
the order of the match to that of the row.names
, names
, or contents of the
passed data.frame
or vector
. This avoids potential bugs caused when the
optmatch
objects were in a different order than users' data.
Test suite expanded and now uses the testthat library.
fill.NAs()
allows (optionally) filling in all columns (previously, the first
column was assumed to be an outcome or treatment indicator and was not filled
in).
New tools to find minimum feasible constraints: Large matching problems could
exceed the upper limit for a matching problem. The functions minExactmatch()
and maxCaliper()
find the smallest interaction of potential factors for
stratified matchings or the largest (most generous) caliper, respectively,
that make the problem small enough to fit under the maximum problem size
limit. See the help pages for these functions for more information.
1.NA
or similar).
This avoids some obscure bugs when feeding the results of fullmatch()
to
other functions.FOR A DETAILED CHANGELOG, SEE https://github.com/markmfredrickson/optmatch
pairmatch()
has a new option, remove.unmatchables
, that may be useful in
conjunction with caliper matching. With remove.unmatchables = TRUE
, prior to
matching any units with no counterparts within caliper distance are removed.
Pair matching can still fail, if for example for two distinct treatment units
only a single control, the same one, is available for matching to them; but
remove.unmatchables
eliminates one simple and common reason for pair
matching to fail.
Applying summary()
to an optmatch object now creates a summary.optmatch
containing the summary information, in addition to reporting it to the console
(via a summary.optmatch()
method for print()
).
mdist.formula()
no longer requires an explicit data argument. I.e., you can
get away with a call like mdist(Treat~X1+X2|S)
if the variables Treat
,
X1
, X2
and S
are available in the environment you're working from (or in
one of its parent environments). Previously you would have had to do
mdist(Treat~X1+X2|S, data=mydata)
. (The latter formulation is still to be
preferred, however, in part because with it mdist()
gets to use data's row
names, whereas otherwise it would have to make up row names.)
fill.NAs()
replaces missing observations (ie. NA values) with
minimally informative values (ie. the mean of observed columns). fill.NAs()
handles functions in formulas intelligently and provides missing indicators
for each variable. See the help documentation for more information and
examples.mdist.function()
method now properly returns an optmatch.dlist
object for
use in summary.optmatch()
, etc.
mdist.function()
maintains label on grouping factor.
New mdist()
method to extract propensity scores from models fitted using
bigglm()
in package biglm.
mdist()
's formula method now understands grouping factors indicated with a
pipe (|
)
informative error message for mdist()
called on numeric vectors
updated mdist()
documentation
There is a new generic function, mdist()
, for creating matching distances.
It accepts: fitted glm's, which it uses to extract propensity distances;
formulas, which it uses to construct squared Mahalanobis distances; and
functions, with which a user can construct his or her own type of distance.
The function method is more intuitive to work with than the older makedist()
function.
A new function, caliper()
, builds on the mdist()
structure to provide a
convenient way to add calipers to a distance. In contrast to earlier ways of
adding calipers, caliper()
has an optional argument specify observations to
be excluded from the caliper requirement --- this permits one to relax it for
just a few observations, for instance.
summary.optmatch()
now removes strata in which matching failed (b/c the
matching problem was found to be infeasible) before summarizing. It also
indicates when such strata are present, and how many observations fall in
them.
Demo has been updated to reflect changes as of version 0.4, 0.5, 0.6.
subsetting of objects of class Optmatch
now preserves matched.distances
attribute.
fixed bug in maxControlsCap()
/minControlsCap()
whereby they behaved
unreliably on subclasses within which some subjects had no permissible
matches.
Removed unnecessary panic in fullmatch()
when it was given a min.controls
argument with attributes other than names (as when it is created by
tapply()
).
fixed bug wherein summary.optmatch()
fails to retrieve balance tests if
given a propensity model that had function calls in its formula.
Documentation pages for fullmatch()
, pairmatch()
filled out a bit.
summary.optmatch()
completely revised. It now reports information about the
configuration of the matched sets and about matched distances. In addition, if
given a fitted propensity model as a second argument it summarizes covariate
balance.