Dear All,Back to the Top
When doing some compartmental modelling with WinNonlin using a weight
equal to the inverse of the predicted or of the squared predicted
value, the correlation coefficient is very poor (while the fit is quite good) when concentrations below the limit of quantification (BLQ) are
set to "missing" (see first WinNonlin output below) but is good (as
expected) when the BLQ are discarded from the original dataset (see
second WinNonlin output).
Another similar situation is the specification of dummy time-points
(i.e. no observation) for prediction purpose.
Can anyone explain the differences observed for the correlation
coefficient (and for the weighted corrected sum of squared
observations) ?
Thank you in advance for you consideration,
Fabrice Nollevaux
SGS Biopharma - Wavre - Belgium
www.sgsbiopharma.com
-
*** BLQMISSING ***
-
X OBSERVED PREDICTED RESIDUAL WEIGHT SE-PRED
STANDARDIZED
Y Y
RESIDUAL
0.000 . 47.23 . 0.4483E-03-0.6400+152
.
0.8000E-01 42.34 39.31 3.038 0.6472E-03 6.356
1.073
0.3300 19.66 23.61 -3.943 0.1794E-02 2.418
-1.157
0.5000 18.88 17.79 1.094 0.3158E-02 2.071
0.4609
1.000 11.51 10.42 1.094 0.9207E-02 1.107
0.7409
1.500 8.080 8.154 -0.7434E-01 0.1502E-01 0.6096
-0.5679E-01
2.000 8.323 7.113 1.210 0.1974E-01 0.5469
1.067
3.000 6.571 5.803 0.7680 0.2967E-01 0.4616
0.8365
4.000 4.502 4.794 -0.2920 0.4348E-01 0.3620
-0.3804
6.000 2.628 3.278 -0.6497 0.9307E-01 0.2238
-1.214
8.000 1.543 2.241 -0.6985 0.1992 0.1466
-1.895
12.00 1.166 1.048 0.1178 0.9122 0.7811E-01
0.7007
24.00 0.1127 0.1071 0.5572E-02 87.64 0.1717E-01
0.7041
36.00 . 0.1095E-01 . 8420. -0.6400+152
.
48.00 . 0.1119E-02 . 0.8090E+06-0.6400+152
.
60.00 . 0.1143E-03 . 0.7772E+08-0.6400+152
.
CORRECTED SUM OF SQUARED OBSERVATIONS 1568.57
WEIGHTED CORRECTED SUM OF SQUARED OBSERVATIONS 12.1807
SUM OF SQUARED RESIDUALS 30.2384
SUM OF WEIGHTED SQUARED RESIDUALS 0.250689
S 0.177020 WITH 8 DEGREES OF FREEDOM
CORRELATION (OBSERVED,PREDICTED) 0.5023
AIC criteria -8.60251
SBC criteria -6.66288
-
*** Without BLQ ***
-
X OBSERVED PREDICTED RESIDUAL WEIGHT SE-PRED
STANDARDIZED
Y Y
RESIDUAL
0.8000E-01 42.34 39.32 3.023 0.6467E-03 6.357
1.064
0.3300 19.66 23.63 -3.965 0.1791E-02 2.419
-1.161
0.5000 18.88 17.81 1.077 0.3152E-02 2.071
0.4529
1.000 11.51 10.42 1.093 0.9205E-02 1.109
0.7407
1.500 8.080 8.148 -0.6824E-01 0.1505E-01 0.6099
-0.5215E-01
2.000 8.323 7.105 1.218 0.1979E-01 0.5464
1.074
3.000 6.571 5.795 0.7753 0.2975E-01 0.4615
0.8454
4.000 4.502 4.788 -0.2860 0.4359E-01 0.3620
-0.3730
6.000 2.628 3.274 -0.6458 0.9329E-01 0.2237
-1.207
8.000 1.543 2.239 -0.6959 0.1996 0.1466
-1.890
12.00 1.166 1.047 0.1188 0.9140 0.7807E-01
0.7073
24.00 0.1127 0.1070 0.5637E-02 87.75 0.1717E-01
0.7129
CORRECTED SUM OF SQUARED OBSERVATIONS 1568.57
WEIGHTED CORRECTED SUM OF SQUARED OBSERVATIONS 10.4632
SUM OF SQUARED RESIDUALS 30.2943
SUM OF WEIGHTED SQUARED RESIDUALS 0.250867
S 0.177083 WITH 8 DEGREES OF FREEDOM
CORRELATION (OBSERVED,PREDICTED) 0.9908
AIC criteria -8.59401
SBC criteria -6.65438
[Interesting, I've had problems calculating the R^2 and correlation
coefficient for weighted data set with Boomer, anybody have a good
formula that I can read, i.e. code, especially multi-line data sets.
Rereading I see that the missing data are weighted; that should happen, should it? - db]
Back to the Top
Fabrice,
Correlation coefficient is an artifact of how you have arranged your
dataset. On the other hand, correlation coefficient should be the last
thing
we should would look at. From the data you have provided, I don't see
any
noticeable difference between the two fits. If you plot observed
concentrations versus predicted concentrations for the first fit [where
data
are "missing"] and perform linear regression, you should see that your
fit
is leveraged by the concentration at time=0. You have observed=0 and
predicted=47. This is where the major problem is! But when you delete
the
"missing" values, that discrepancy is gone and the fit is
ideal..R^2>0.9. So
comparing fits based on R^2 does not make sense here. (but why did you
even
try to delete these data?)
David, the weights appearing the column are due to the fact that the
data
are weighted for the inverse of predicted or squared predicted (as
mentioned
in the question). And there are some finite predictions available for
weighting even though data are "missing". If data are weighted for
inverse
of observed, that weight should be equal to zero for "missing" values. I
think, WINNONLIN has in-built adjustment to take care of division by
zero(for the compiled models). Thus even if weight of 1/"missing" is not
possible, it is adjusted to zero. But for the user defined models, we
have
to write a discrete function to take care of the same.
Thanks,
Pravin
Pravin Jadhav
Graduate student
VCU/FDA
Dear Jadhav,Back to the Top
Thanks for sharing your thoughts.
First I totally agree that correlation coefficients are not very useful to measure the goodness-of-fit or to compare fits, but I find it
disturbing to have to report a good fit with a so poor correlation
coefficient.
However, I still do not understand where the bias comes from, since the observed value at time 0is "missing" (and not zero as mentioned in
your response) and so the corresponding residual (and weighted
residual) is also "missing", independently from any applied weight.
Moreover, my example is for an i.v. bolus model, so predicted at time0 is not zero (actually Pred(0) D/V1) so it should not be a problem to include time0 when using a weighting scheme based on the predicted concentration.
With regards to the discussion on the weight of missing values, it
seems that your arguments hold for observed0 when data are weighted for the inverse of the OBSERVED (or squared OBSERVED) concentrations.
In that case, WinNonlin can not compute the weight (1/0) and actually
set it to 0.
Any further input is welcome,
Regards,
Fabrice
Back to the Top
Dear Fabrice,
With respect to the very limited value of correlation coefficients as a
measure of good-of-fit, I totally agree with you and Pravin Jadhav.
The output of WinNonlin is not only disturbing, it seems to be
incorrect, or at least misleading. If values below LOQ are treated as
'missing', these values should not be taken into account when
calculating any statistical or other value. WinNonlin seems to include
these values in the correlation coefficient, as well as in the
'weighted corrected sum of squared observations'. IMHO, including
'missing values' makes no sense. A missing value is missing, and the
only thing you can do is to calculate the predicted value at that
point. This is useful to check whether it is indeed below LOQ; if the
predicted value at that point is well above LOQ, you actually have a
problem!
David Bourne has pointed to a good question: how are R^2 and
correlation coefficient calculated for weighted data? There does not
seem to be a single answer to this question.
The most direct way is the calculation of are similar to the usual
procedure:
R = COVAR(Yobs,Ypred) / SQRT(VAR(Yobs) * VAR(Ypred))
where COVAR(Yobs,Ypred) is the covariance between Yobs and Ypred,
VAR(Yobs) is the variance of the observed values, SQRT is square root.
In case of weighted least squares, it seems more appropriate to use the
weights also in the calculation of COVAR and VAR:
COVAR(Yobs,Ypred) = SUM { Wi * (Yobsi - Yobsmean) * (Ypredi -
Ypredmean) }
VAR(Yobs) = SUM { Wi * (Yobsi - Yobsmean) ^2 }
VAR(Ypred) = SUM { Wi * (Yobsi - Yobsmean) ^2 }
Yobsmean = SUM { Wi * Yobsi } / SUM { Wi }
Ypredmean = SUM { Wi * Ypredi } / SUM { Wi }
where Yobsi is the observed data point i, Ypredi is the predicted data
point i, Wi is the weighting factor for data point i, and SUM implies
summation for all data points (i=1, 2, ..., n).
In Pascal or Delphi code:
Sx:=0; Sxx:=0; Sy:=0; Syy:=0; Sxy:=0; Sw:=0;
FOR I:=1 TO N DO
BEGIN
Sx:=Sx+W[I]*Y[I]; Sxx:=Sxx+W[I]*Sqr(Y[I]);
Sy:=Sy+W[I]*Yc[I]; Syy:=Syy+W[I]*Sqr(Yc[I]);
Sxy:=Sxy+W[I]*Y[I]*Yc[I];
Sw:=Sw+W[I];
END;
R:=(Sxy-Sx*Sy/Sw)/Sqrt((Sxx-Sqr(Sx)/Sw)*(Syy-Sqr(Sy)/Sw));
where Y=Yobserved and Yc=Ypredicted=Ycalculated, and W[I] is the
weighting factor for point I.
R^2 is the square of R.
Alternatively, the equation for calculation of R^2 can be used:
R^2 = 1 - SUM { Wi * (Yobsi - Ypredi)^2 } / VAR(Yobs)
where VAR(Yobs) is calculated as described above, including the
weighting factor.
In Pascal or Delphi code:
SSq:=0; Sy:=0; Syy:=0; Sw:=0;
FOR I:=1 TO N DO
BEGIN
SSq:=SSq+W[I]*Sqr(Y[I]-Yc[I]);
Sy:=Sy+W[I]*Y[I]; Syy:=Syy+W[I]*Sqr(Y[I]); Sw:=Sw+W[I];
END;
RR:=1-SSq/(Syy-Sqr(Sy)/Sw)
where RR is R^2.
Both equation yield comparable, but not identical values (although I
would expect them to be identical). In the example of Fabrice Nollevaux
'Without BLQ', I get R^2 = 0.9763 for the first equation, and 0.9760
for the second equation. Values for are are 0.9881 and 0.9879,
respectively.
According to my calculations, WinNonlin calculates are as the correlation
coefficient between Yobs and Ypred, without weighting factors. For the
case 'Without BLQ' this yields indeed 0.9908. If the missing values are
interpreted as zero, the value is about 0.55, so about, but not
identical to, the value given by WinNonlin.
Any comments are welcome.
Best regards,
Hans Proost
Johannes H. Proost
Dept. of Pharmacokinetics and Drug Delivery
University Centre for Pharmacy
Antonius Deusinglaan 1
9713 AV Groningen, The Netherlands
tel. 31-50 363 3292
fax 31-50 363 3247
Email: j.h.proost.-a-.farm.rug.nl
[Thank you Hans for the equations. I'll will try them out. I also
appreciate your comments regarding the inclusion of missing values in
the statistical calculations - db]
Back to the Top
Hi there.
If anyone can explain to me why there is so much difference in AUC
calculated by linear trapezoidal rule (using NCA model) and defaulted
algorithm in 1-compartmental modeling (when uniform weighing used). Is
AUC0-inf by linear-trapezoidal rule is more accurate secondary parameter
for calculation of IV bolus CL values. Or, alternatively, should
compartmental modeling with certain weighting be used instead for
correct CL determination. I need to know what is preferable in order to
input correct model parameters for compartmental IV infusion modeling
(NCA modeling does not exist).
Thanks in advance,
Oleg Khatsenko, Ph.D.
Scientist II
CELGENE
4550 Towne Centre Court
San Diego, CA 92121
Phone: 858-7954714
FAX: 858-5528775
E.mail: okhatsen.aaa.celgene.com
WEB: www.celgene.com
Back to the Top
Dear Fabrice, Pravin, and Hans,
Thank you for your discussion about our WinNonlin product.
I hope the following points will resolve your issues.
First, the value -0.640E0+152 that appears in the text output is really
a missing value code that should have been changed to "Missing".
You will notice that in the workbook output, it has been changed to
"Missing".
The problem in the text output has been fixed for the next WinNonlin
release.
Second, the weights that appear in the summary table next to the
missing values are not actually used in the model fitting.
Perhaps it is confusing that they are shown in the table.
Third, the weights that appear in the summary table next to the missing
values were incorrectly being used in summing the weights when
computing the correlation and weighted corrected sum of squared
observations.
This has been fixed for the next release.
Fourth, for the next release, we have added a Weighted Correlation
(Obs, Pred), that will appear in addition to the Correlation (Obs,
Pred).
We appreciate your comments and suggestions for WinNonlin.
Regards,
Linda Hughes, M.S.,
Software Engineer
Pharsight Corporation
PharmPK Discussion List Archive Index page
Copyright 1995-2010 David W. A. Bourne (david@boomer.org)