- On 31 Jul 2003 at 01:52:30, "Sarah Anne Marston" (smarston.aaa.pharmastatsci.com) sent the message

Back to the Top

Dear Group,

I have antibody titer data, about 7 to 12 subjects per group. We are

reporting the geometric mean, and we would like to estimate the std dev

.. I

tried taking the anti-log of the 95% CI of the log-transformed data, but

that doesn't work - apparently the data are too skewed and the sample

sizes

are small. Any suggestions for estimating a std dev under these

circumstances?

Thanks.

Sincerely,

Sarah - On 31 Jul 2003 at 10:39:42, "Gobburu, Jogarao V" (GOBBURUJ.-a-.cder.fda.gov) sent the message

Back to the Top

If all you want to convey in your report is a feel for the spread of

the data, is it notreasonable to simply report the rangebecause you

think the data are too skewed and the sample size is small.

Joga Gobburu

Pharmacometrics

CDER/FDA - On 31 Jul 2003 at 16:37:20, "Kayode Ogungbenro" (mbpssko3.aaa.man.ac.uk) sent the message

Back to the Top

The following message was posted to: PharmPK

Hi,

The following references will help you to understand better the

relationship between the mean, standard deviation and confidence

interval of skewed and normal data

1. Zhou et al (1997) Statistics in Medicine 16, 783-790

2.Taylor et al (2002) Statistics in Medicine 21, 1443-1459.

Kayode Ogungbenro

Ph.D Student - On 31 Jul 2003 at 11:37:09, (Scott.D.Patterson.aaa.gsk.com) sent the message

Back to the Top

Interquartile range - reflecting spread in the middle half of the data

- is a good measure of spread for highly skewed dist'ns (cf. Schulmann,

Statistics in Plain English, VNR, 1992, p. 29-30).

-Scott

Scott Patterson

GlaxoSmithKline Pharmaceuticals

2301 Renaissance Blvd.

King of Prussia, PA 19406-2772

Email: scott.d.patterson.-at-.gsk.com

Phone: 610-787-3865 - On 31 Jul 2003 at 10:30:10, Paul Johnson (p.johnson.aaa.prodigy.net) sent the message

Back to the Top

The following message was posted to: PharmPK

Dear Sarah,

Here is a reference that may be of

interest and use:

Shumway, R.H., Azari, A.S. and Johnson, P. (1989),

“Estimating Mean Concentrations Under Transformation

for Environmental Data with detection Limits,”

Technometrics, 31(3): 347-356.

I have a set of variability factor programs for

estimating the mean, standard deviation and other

quantitities (e.g., 99th percentile) for the lognormal

with and without non-detects [single and/or multiple

detection limits].

Let me know if they would be of use. I can e-mail them

to you.

Paul Johnson

http://www.biostatsoftware.com - On 1 Aug 2003 at 07:56:52, Nick Holford (n.holford.at.auckland.ac.nz) sent the message

Back to the Top

The following message was posted to: PharmPK

Hi,

The responses to this question remind me of the parable of the blind

men and the elephant.

http://www.wordfocus.com/word-act-blindmen.html

Various parameters of this small sample have been mentioned: skewness,

geometric mean, standard deviation, range (plus some siblings from

Brian Smith). What is the purpose of estimating and reporting these

parameters?

Is it to simply describe the observations? In which case why attempt to

summarize them with statistics? You could display all of them

graphically or list them in a table. With a larger sample size

histograms of the frequency distrbution would be nice.

Is it to test some null hypothesis? The uncertainty about the assumed

distribution for the test statistic may make it more sensible to

consider a randomization test comparing a statistic of interest across

the groups.

http://wfn.sourceforge.net/rtmethod.htm

I suggest to Sarah that she considers whether she want to show people

the elephants she had discovered or jsut tell them about bits (weight?

colour?) of the beast and/or how the beasts differ from each other.

Nick

Nick Holford, Dept Pharmacology & Clinical Pharmacology

University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New

Zealand

email:n.holford.-a-.auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556

http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ - On 31 Jul 2003 at 21:40:34, "Sarah Anne Marston" (smarston.-a-.pharmastatsci.com) sent the message

Back to the Top

Dear Group,

Many thanks to all who responded to my question, your replies are much

appreciated. Please find below a helpful reply from Brian Smith, a

Statistician at Lilly, sent to the Clin Phar Stat list. Because of the

interest on the PharmPK list, I thought I would forward it.

BTW, if you are interested in the Clin Phar Stat list, you can register

at

the topica.com website. This list includes a mix of Statisticians and

Scientists and everything in between.

Sincerely,

Sarah

-----Original Message-----

From: Brian Smith [mailto:smith_brian_p.-a-.lilly.com]

Sent: Thursday, July 31, 2003 11:01 AM

To: cps.-a-.topica.com

Subject: Re: CLIN PHAR STAT: estimating std deviation for skewed data,

small

sample size

Sarah,

Let xbar and s2 be the sample mean and standard deviation from the

log-transformed data. Assuming that the data has a log-normal

distribution.

Assume that m and t2 are the true mean and standard deviation of the

log-transformed data that you are estimating with xbar and s2. The

following are facts about the log-normal distribution.

Geometric mean = exp(m)

The expected value (or mean) = exp(m + t2/2)

The variance = (exp(t2) - 1)*exp(2*m + t2)

The standard deviation = sqrt (variance) = exp(m + t2/2)*sqrt(exp(t2)-1)

The coefficient of variation = (standard deviation)/(the expected

value) =

sqrt(exp(t2)-1)

Thus, maximum likelihood estimates for these values are obtained by

substituting xbar for m and s2 for t2.

A maximum likelihood estimate of the standard deviation is given by

exp(xbar + s2/2)*sqrt(exp(s2)-1)

Note: One is often told that s2 = (sum of squares)/(n-1) is an unbiased

estimate of the variance. It is, but it is not the maximum likelihood

estimate of the variance. s2'=(sum of squares)/n is the maximum

likelihood

estimate of the variance. Thus, you probably want the following to

estimate

the standard deviation

exp(xbar + s2'/2)*sqrt(exp(s2')-1).

With that all said, a log-normal distribution can be completely

described

with the geometric mean for an estimate of central tendency and the

coefficient of variation for an estimate of variability. It is my

opinion,

that these are the two things that you should report. A maximum

likelihood

estimate of the coefficient of variation is given by sqrt(exp(s2')-1).

This

is what I think the preferred estimate of this quantity is. Another

way to

estimate CV is to find the arithmetic mean of the values (not log

transformed) and divide this by the sample standard deviation of the

values

(not log transformed). You can check to see that these estimates will

be

similar, if you like. But, again, I recommend the maximum likelihood

estimate.

Sincerely,

Brian Smith - On 4 Aug 2003 at 11:44:54, SMITH_BRIAN_P.-a-.Lilly.com sent the message

Back to the Top

I never try and disagree with Nick if I can help it. There are always

pearls of wisdom in what he says even when he is trying to be

controversial.

One thing I would like to add, however, with pharmacokinetic data there

is often useful information that is imparted with a good measure of

center and a good measure of variability. First, however, an under

appreciated fact: if your data comes from a log-normal distribution,

then the geometric mean is the maximum likelihood estimate of the

median. With that said, I have heard pharmacokineticists describe, for

instance, how widely the drug circulates based on the central tendency

estimate of the volume of distribution. Additionally, use of

coefficient of variation becomes a way that the variability in one drug

can be compared to other drugs. I see these measures as useful in our

ability to describe the drug. Displaying them in a table does not

immediately allow for understanding of variability. I would imagine

that by the time you get to say a dozen observations, looking at the

values becomes hard to decipher. At least it is for me.

In theoretical statistics a lot of time is spent with the notion of

sufficient statistics. Given a distribution a set of sufficient

statistics allows one to estimate that distribution. The raw

observations are indeed sufficient regardless of the distribution.

Now, if you are willing to assume, for instance, that a set of AUC's

has a log-normal distribution (in my experience this is usually a

reasonable assumption), then the geometric mean and coefficient of

variation are sufficient. That is I can completely estimate the

distribution based on these two estimates. For, a log-normal

distribution we would say that these estimates are minimally

sufficient, since you cannot completely describe a log-normal

distribution with one statistic.

Practically speaking it all comes down to usefulness. Tables of raw

values, histograms, and estimates of parameters, all have a useful

place in helping people understand the data. In the case of the

elephant and the blind men we see that the set of sufficient statistics

may very well be (a wall, a spear, a snake, a tree, a fan, and a rope).

We need all of these statistics (and probably more) to understand the

distribution of an elephant. On the other hand, if the data is

log-normal, the geometric mean and coefficient of variation will

suffice.

Sincerely,

Brian Smith - On 5 Aug 2003 at 07:36:20, Nick Holford (n.holford.at.auckland.ac.nz) sent the message

Back to the Top

The following message was posted to: PharmPK

Brian,

Your point about sufficient statistics is well made. However, in the

specific case that Sarah asked about I think it was clear to her that a

log-normal distribution was not a good description of her data. In the

absence of an adequate description for the distribution there are no

minimally sufficient statistics. That is why I suggested an empirical

approach (show the data itself) for a more honest description.

Sarah did not say what her main goal was for this data. IMHO the

arithmetic mean and standard deviation are as good as anything for

descriptive statistics if there is no clear goal.

Nick

Nick Holford, Dept Pharmacology & Clinical Pharmacology

University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New

Zealand

email:n.holford.-at-.auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556

http://www.health.auckland.ac.nz/pharmacology/staff/nholford/

Want to post a follow-up message on this topic? If this link does not work with your browser send a follow-up message to PharmPK@boomer.org with "Estimating std deviation for skewed data, small sample size" as the subject

PharmPK Discussion List Archive Index page

Copyright 1995-2010 David W. A. Bourne (david@boomer.org)