Back to the Top
Dear All:
About model validation. What is the objective of all this? Usually
one wants to examine the model one has made from a certain data set in order
to see if it represents reality or not, and if it predicts reality well
enough so that the model can be controlled to achieve stated therapeutic
goals such as serum concentration or effect profiles with acceptable, and
possibly even optimal precision [1,2].
Internal Model Validation
As I understand it, bootstrap, jackknife, and cross-validation are
all variants on the same general theme of internal validation. Let's say we
have a model with a set of estimated parameter values, and we wish to
"validate" this model.
For bootstrap, we may take our estimated mean parameter values and
their standard deviations (SD's) and generate, for example 100 or 1000 sets
of simulations from them, sampling from each parameter distribution in a
random way. One may give a certain simulated dosage regimen to all of them,
generating fictive simulated patient data sets. One may also add an error
model, containing the parameters in the error model, or in the assay error
description along with that of the environmental error (more later). In the
bootstrap, one samples randomly from this data set, let's say, 100 times.
Some data sets will be missed, some will be sampled more than once. That is
the randomness in the bootstrap. Then, from each data set of 100 randomly
sampled fictive patients, one makes the model over again, and compares the
result with that of the original results. One does this many times. In this
way one gets an impression of the variability of the results, and sees how
the parameter means and SD's, for example, are distributed after 100 or 500
or 2000 runs of this type. In this way one can see the distribution of the
mean parameter values and their 95% confidence limits, and also that of the
SD's. We do this because there are no really good other ways to calculate
these confidence limits with most methods, except if we assume that the
number of subjects in the population gets very large.
The jackknife is similar, just a bit different. In this case, we
will generate, let's say, 100 fictive data sets just as above. Then we will
randomly leave out one sample, and will model the data set based on the
other 99 fictive patients. We get the results and compare them with the
original results. We do this many times, and in a similar way we see the
distribution of the parameter means and their SD's, and get their confidence
limits.
Cross validation is another variant of the jackknife. Here instead
of leaving out only one sample in the data set, one may leave out several,
for example, one fifth of them, modeling the other 80 %. Then one might do
this five times. All these are ways of using Monte Carlo simulations to
generate fictive data sets which are similar but just a bit different from
the original one, to see what the distribution of the parameter values are
this way, especially to obtain estimates of the population parameter means
and confidence limits.
External Model Validation
Then we can proceed to external model validation. These consist of
having two data sets - a model set, from which one makes the model, and a
separate validation set. It is usually best if this second set is truly a
separate and prospective one, and is not taken from some random splitting of
the original set. Then one takes the results obtained from the original
model set and uses them to predict the data obtained from the second set.
When one does this, one can start with a simple basic model. One
will observe a certain difference between the measured data and the model
estimates of it. One will also use the parameter values from this model to
estimate the measured values in the data set. In each case one will obtain
an error estimate of the original model data and the new model data. If the
error in estimation of the data in the second data set is about equal to
that of the original data set, that is when many in the PK community will
say that the model is adequately validated. However, that may well not be
so. Others may wish to compare the ability of several model candidates to
estimate the original data set and to predict that of the prospective set.
That is a good thing to do if one can. Usually one sees that a more complex
model with more parameters to estimate does a better job of fitting the
original set and predicting the new set.
As one makes the model more and more complex, with more parameters
to estimate, one will usually find that the more complex model continues to
estimate the original data set better and better, and may well predict the
new set better also, for a while. However, there will come a point at which
this will not continue. The original data set may keep on being better
estimated, but the new one will be predicted less and less precisely. This
is because the model, in its complexity, has now started to model the random
noise in the data as if it were part of the model. Because of this, it
predicts the new data set less well. When this happens, the model is said to
be "overfitted". The most valid model is that which predicts the new data
set the best. This is often done in the aerospace community where they have
data that is much richer than we have in the PK community. This is how they
often validate their models.
Over the years, I have seen many examples of PK population modeling.
A common problem is a situation in which people study data from "routine"
therapeutic drug monitoring, usually obtained in the steady state from
trough serum concentrations. They will make a 1 compartment model of the
drug, often digoxin, and estimate the only thing possible under those
circumstances, the clearance. They will often get the volume of distribution
from some other literature source. They make the model. They validate it
with bootstrapping, for example, and also do external validation to predict
a similar but prospective data set. Everything is just fine, because the
prospective data set is exactly the same type as the first.
How to Model
The only thing wrong may be that all that work can go for nothing, as the
data may be so poorly chosen that such a simple model is all that can be
made, and it can be totally incorrect. Digoxin, for example, is a two
compartment model at least, and its effect correlates not with serum
concentrations, but with concentrations in the totally unobservable
peripheral compartment [3,4]. So it is vitally important that data be chosen
that capture the reality of the behavior of the drug, and that the
experimental design be appropriate to permit an adequate description of the
model to be made. D-optimal experimental design [5] has been well known for
many years, and should be carefully employed in deciding when to obtain the
samples from which the model will be made. Indeed, many have started with
simple general impressions of model parameter values, used D-optimal (or
other) designs for several patients, five for example, and made an initial
population model from that data, calculated the D-optimal times based on
that initial model, and done this repeatedly for a number of iterations
until the sampling strategy becomes stable. Pharmaceutical companies have
done this and enhanced the information obtained from a number of patents and
a number of samples to get the most informed model for the least cost.
Also to get the best models, one should consider how best to give
weight to the data they analyze. Laboratory assay data are best weighted by
the reciprocal of the variance with which each measurement is made [6]. If
one fits a structural model to data, the maximum likelihood estimates of the
model parameter values are obtained when the data is weighted by the
reciprocal of the assay variance at each measured point. This is easily done
[7]. In addition, the environmental error due to the clinical uncertainties
in the data can be estimated as a separate additional error term [7,8]. This
will let the investigators know just how much uncertainty is present in the
assay and how much is from the clinical circumstances under which the study
was done. If the environmental error is small, that is good. If large, then
one should consider talking to the clinical personnel doing the study to see
how it can be tightened up.
What type of model is best to make - parametric or nonparametric? In
making parametric models, one assumes what the shape of the model parameter
distributions is, usually Gaussian or lognormal, and then estimates the
parameters in the equation that describes the assumed shape of those
distributions. For an assumed Gaussian distribution, these are the parameter
means, SD's, and covariances. This is why this approach is called
parametric. Some methods, such as NONMEM, use approximations (FO, FOCE) to
calculate the likelihood. Because of this, they are not statistically
consistent. They do not have the guarantee that as more subjects are
studied, the estimated parameter values more closely approach the true
values. Indeed, some FOCE parametric approaches yield results that actually
get worse as more subjects are studied! [6]. Further, the use of approximate
likelihoods also significantly compromises statistical efficiency and the
precision of parameter estimates. Moreover, the stochastic convergence of an
FOCE estimator does not perform as well as that of one with exact
likelihoods. In one example [9], 16 times as any subjects were needed to
reduce the SD's of model parameter estimates by half, rather than the 4
required by theory, which were obtained with methods having exact
likelihoods. Because of this, much information was lost from the data which
could have been obtained.
The nonparametric (NP) approach does not have to make any assumption
at all about the shape of the parameter distributions. It is thus much more
flexible, and obtains parameter distributions that are still more likely, as
they are not constrained by parametric assumptions. Instead of estimating
only the parameter means and covariances, it estimates the entire joint
parameter distribution. Some illustrative examples are given in [9]. Also,
NP methods are more statistically efficient, with more precise parameter
estimates, and statistical convergence is up to what theory says it should
be. Bayesian methods for population modeling now also employ both parametric
and NP approaches.
Use of Models
What is the utility of parametric versus NP models? When one uses a
model to compute a drug dosage regimen for a patient, one would like this to
be as precise (and safe) as possible, to hit a desired target therapeutic
goal with the greatest precision. Parametric models cannot develop maximally
precise dosage regimens, as they are limited by the separation principle
[10]. This principle states that whenever one attempts to control a system
first by obtaining single point parameter estimates, and then using these
values to control the system, the task is done suboptimally, as no
performance criterion is optimized. One simply designs a regimen to hit a
desired target, but we all know that this does not happen. Instead, with NP
models, one has the entire parameter distribution with which to develop the
regimen. Having the entire discrete joint parameter density, one gives a
candidate regimen to each support point in the distribution, and each point
gives a prediction of future serum concentrations, for example, at future
times, each weighted by the probability of that point in the population. At
the time when the target is to be hit, one can easily calculate the weighted
squared error of any candidate regimen hitting the target at that time, and
can optimize this to find the regimen which hits the target with minimum
expected weighted squared error [1,2]. In this way the most precise regimen
will be obtained.
If one only uses parametric models, one will never be aware of this problem,
and will never know that, after covariates, the dosage regimen itself
becomes a very significant way to reduce the expected variability of a
patient's response to a dosage regimen. Such multiple model (MM) control
approaches are widely used in the aerospace community for flight control and
spacecraft guidance systems, and now are available in the PK community
[1,2,8].
In doing all this, one should be guided by the science. To paraphrase
Lincoln, devoutly do we hope, fervently do we pray, that the FDA will be
persuaded to come along in the not too distant future.
References.
1. Jelliffe R, Bayard D, Milman M, Van Guilder M, and Schumitzky A:
Achieving Target Goals most Precisely using Nonparametric Compartmental
Models and "Multiple Model" Design of Dosage Regimens. Ther. Drug Monit.
2000; 22: 346-353.
2. Bayard D, Jelliffe R, Schumitzky A, Millman M, and Van Guilder M:
Precision Drug Dosage Regimens Using Multiple Model Adaptive Control:
Theory, and Application to Simulated Vancomycin Therapy, in Selected Topics
in Mathematical Physics, Professor R. Vasudevan Memorial Volume, Allied
Publishers Ltd., Madras, India, pp.407-426, 1995.
3. Reuning R, Sams R, and Notari R: Role of Pharmacokinetics in Drug
Dosage Adjustment. 1. Pharmacologic Effects, Kinetics, and Apparent Volume
of Distribution of Digoxin. J. Clin. Pharmacol. 1973; 13: 127-141.
4. Jelliffe R: Some Comments and Suggestions Concerning Population
Pharmacokinetic Modeling, Especially of Digoxin, and its relation to
Clinical Therapy. Therap. Drug Monit. 34: 368-377, 2012.
5. D'Argenio D: Optimal Sampling Times for Pharmacokinetic Experiments.
J. Pharmacokin. Biopharmaceut. 1981; 9: 739-756.
6. G.A.F. Seber and C.J. Wild: Nonlinear Regression, Wiley, New York,
1989, pp. 536 - 537.
7. Jelliffe RW, Schumitzky A, Van Guilder M, Liu M, Hu L, Maire P,
Gomis P, Barbaut X, and Tahani B: Individualizing Drug Dosage Regimens:
Roles of Population Pharmacokinetic and Dynamic Models, Bayesian Fitting,
and Adaptive Control. Therapeutic Drug Monitoring, 15: 380-393, 1993.
8. Neely M, van Guilder M, Yamada W, Schumitzky A, and Jelliffe R:
Accurate Detection of Outliers and Subpopulations with Pmetrics, a
Nonparametric and Parametric Pharmacometric Modeling and Simulation Package
for R. Therap. Drug Monit. 34: 467-476, 2012.
9. Bustad A, Terziivanov D, Leary R, Port R, Schumitzky A, and Jelliffe
R: Parametric and Nonparametric Population Methods: Their Comparative
Performance in Analysing a Clinical Data Set and Two Monte Carlo Simulation
Studies. Clin. Pharmacokinet., 45: 365-383, 2006.
10. Bertsekas D: Dynamic Programming: deterministic and stochastic
models. Englewood Cliffs (NJ): Prentice-Hall, 1987; pp. 144-146.
Roger W. Jelliffe, M.D., F.C.P., F.A.A.P.S.
Professor of Medicine,
Founder and Co-Director, Laboratory of Applied Pharmacokinetics
www.lapk.org
USC Keck School of Medicine
2250 Alcazar St, Room 134-B
Los Angeles CA 90033
email = jelliffe.-at-.usc.edu
Want to post a follow-up message on this topic?
If this link does not work with your browser send a follow-up message to PharmPK@boomer.org with "Model validation" as the subject | Support PharmPK by using the |
Copyright 1995-2011 David W. A. Bourne (david@boomer.org)