Those “sneaky adverse events”

Well, the embargo has lifted (funny thing when you are publishing a paper — before you submit it you tell everyone about it, then once the journal accepts it you need to keep it all quiet, but you’re like “yeah, but I was just… well, okay”) and now I can finally talk about my latest research paper!

This time the paper is coming out in Science Translational Medicine (one of my favorite up-and-coming journals). The paper describes how we developed methods to mine adverse drug events on a very large scale. The trick is to account for all the nasty bias in the database that has, historically, inhibited such data mining before. We adjust for that bias by matching identifying “better controls” for the drugs we are studying.

Once Sci TM puts the paper up, I’ll post a link to it right here.

There has already been a little bit of news coverage: put together an article describing the algorithm and implications very nicely. You can see it at

Medical Express and Science Codex picked up the Stanford Press release here (thanks Krista!).

The Scobe blog also picked it up.

Health Canal made use of the photo shoot Russ and I did the other day.

March 15th articles:

FierceBiotechIT rewrote the press release a bit. As did HealthJockey and ScienceDaily.

Ooh, and this one titled “Good Apart, Bad Together” — which is either a review of my paper or a new sitcom airing this fall on NBC.

WHOA! Looks like my paper turned up on GIZMODO! Totally cool and if you’re interested, you can read about it here. Here is a screen shot of it:


and Y combinator.

Looks like local news is starting to pick it up and even put together a video on the topic. Although the message they took away from the paper wasn’t really what I intended — still cool. As well as Curante and something called Mother Nature News.

Here’s an embedded version of that video:


One curante commenter says, “And how many “side effects” are there from eating an apple? Or drinking a cup of tea?” — good question I say good question indeed! :)

En espanol!


March 16th:

Discovery Magazine (blog) put together a quick piece on it here.

Smart Planet and iHealthBeat picked it up today.





Here’s a screen shot of the Sci TM front page. Looks they used the “wormhole” plot for the icon.

Filed under:Science

BCSRank – Week 13

Week 13 was rivalry week with lots of potentially exciting games. However, there weren’t too many surprises as the favored team won in nearly all the top 25 match-ups.

A quality win over Notre Dame pushed Stanford up a few notches, while OSU falls a notch on a bye week.

At this point BCSRank would put LSU vs Houston in the BCS Championship game — a scenario that is incredibly unlikely. Also, while Bama sits on nearly all ranking systems at a solid #2, according the BCSRank they are down at #7. Again, I suspect there is bias against teams in good conferences and need a way to address this. I would like to do this without sacrificing the generality of the method.

RANK                     TEAM LOSSES      SCORE
  1                       LSU      0      0.01554
  2                   Houston      0      0.01212
  3              Oklahoma St.      1      0.01153
  4                 Boise St.      1      0.01118
  5                  Stanford      1      0.01097
  6             Virginia Tech      1      0.01094
  7                   Alabama      1      0.01086
  8                       TCU      2      0.01059
  9                  Oklahoma      2      0.01054
 10       Southern California      2      0.01047
 11                    Oregon      2      0.01042
 12                  Michigan      2      0.01041
 13              Michigan St.      2      0.01029
 14            South Carolina      2      0.01012
 15                   Clemson      3      0.01007
 16                  Nebraska      3      0.01000
 17                 Wisconsin      2      0.00999
 18                    Baylor      3      0.00986
 19                  Arkansas      2      0.00973
 20                Kansas St.      2      0.00957
 21            Southern Miss.      2      0.00935
 22                   Georgia      2      0.00932
 23                Notre Dame      4      0.00916
 24                  Iowa St.      5      0.00901
 25              Arkansas St.      2      0.00899

BCSRank – An open source BCS ranking algorithm based on Google’s PageRank

I’ve been having a great time following the Stanford Cardinal this year. There has been quite a bit of talk about how the ranking system is flawed and the computer rankings “don’t like Stanford” and is better than their rank would indicate (currently #6).

As a student of statistics/computer science and a fan of college football — I got pretty interested in just how these computer rankings work. This pages lists the different algorithms that are used, but the sites don’t really describe how the teams are ranked (probably trade secrets or something).

It occurred to me that Larry Page and Sergey Brin invented a this little algorithm to rank web pages on the internet based on their credibility. The idea behind their ranking algorithm is that more credible pages will have more links to them AND that links from more credible pages are worth more. Well, this is quite similar to college football. Teams that win lots of games are better teams (i.e. more credible). So instead of links between pages we have links between teams. A link is established between each team that has completed a game. The link is directional and goes from the team that lost to the team that won.

This is quite intuitive in that if you beat a team that loses to a lot of teams (e.g. a webpage that promiscuously links to all other pages) you are going to get less credit than if you beat a team that has never lost (e.g. a page that all other pages point to, but where it only points to you). Unless two teams play twice in a year (which typically won’t happen in the regular season) then there will at most one link between each team.

My implementation is far from finished and there are definitely some limitations, but I figure the internet may be able to help me improve the method so here it is.

As an example, lets say we have three teams (A, B, and C), we define our link matrix according the results of those teams playing each other.

L = ((0, 0, 0),
     (1, 0, 0),
     (1, 1, 0))

We call the matrix L for links for losses, you pick. When we try to invert this matrix according to the algorithm we cannot so to make the matrix non-singular I set the diagonal equal to 1.

L = ((1, 0, 0),
     (1, 1, 0),
     (1, 1, 1))

This makes the matrix invertible and we can implement the algorithm as described on Wikipedia. This means that team A has never lost, team B lost only to team A, and team C lost to both A and B.

The other parameter we need is the dampening factor — which for the PageRank algorithm is the probability that a random surfer goes to a random page on the internet rather than using the links. When this value is set to 1 that means the surfer always uses the links and will end up on a terminally linked page. For our football ranking this means that the value will congregate in the top one or two teams (analogous to the surfer ending up at a terminally linked page). We want to spread out the scores a bit more than that so we will set this dampening factor quite low (~0.3).


So, in week 13 of the 2011 season, what does PageRank say about the way the teams should be ranked? Here are the top 25.

Rank                     Team  Losses  RankScore
  1                       LSU   0      0.01474
  2              Oklahoma St.   1      0.01192
  3                   Houston   0      0.01179
  4                   Alabama   1      0.01141
  5                 Boise St.   1      0.01094
  6             Virginia Tech   1      0.01082
  7                       TCU   2      0.01070
  8                  Stanford   1      0.01055
  9                   Clemson   2      0.01046
 10                  Oklahoma   2      0.01040
 11       Southern California   2      0.01039
 12                  Michigan   2      0.01037
 13                  Arkansas   1      0.01036
 14                    Oregon   2      0.01029
 15              Michigan St.   2      0.01009
 16                  Nebraska   3      0.01004
 17                Kansas St.   2      0.00979
 18                    Baylor   3      0.00972
 19            South Carolina   2      0.00952
 20                  Penn St.   2      0.00951
 21                 Wisconsin   2      0.00950
 22            Southern Miss.   2      0.00941
 23                Notre Dame   3      0.00937
 24                  Iowa St.   4      0.00927
 25                   Rutgers   3      0.00917


Now this is pretty good for a first pass at ranking. LSU is clearly the best team going into week 13 and indeed they do come out on top of our rankings. One interesting feature of our rankings is that Houston is right there at #3 above Alabama. This is the result of our algorithm assuming all teams are equal at the start.  It’s a little disappointing that Stanford is sitting at #8 behind a two-loss TCU, but TCU has beat “higher quality” opponents (i.e. Boise State) and that’s contributing their score.

Another interesting/curious ranking is Arkansas sitting very low at #13. This makes me wonder about biases against conferences that are filled with good teams (i.e. SEC West). Because a team has to play the teams in their own conference more than from other conferences this may bias against teams in good conferences. I have not figured out a way to visualize or estimate this bias yet. Once we can understand it we can invent methods for removing it.

If you have ideas for improving this method please feel free to comment. This may ultimately prove a futile effort and will not perform as well as those established ranking systems — but at least it’s an open solution based on some sound CS models.


JAMA Coverage of Paroxetine + Pravastatin

JAMA just wrote a very nice article covering our recent discovery of a drug-drug interaction between paroxetine and pravastatin by Dr. Hampton. You can check it out here.

Filed under:Science

My baby’s graduating from college…

I once was told that submitting a scientific paper is like sending your child off to college. You’ve done what you can to prepare them — you’ve tried to work out all the kinks and polish off the edges. But, ultimately, they have to go out there and stand on their own. I won’t pretend to know what sending your grown baby away to start their life is like, but getting this paper published has been one crazy ride. And I’m very happy to report that today my paper is graduating from college and I’m so proud of the little guy. I mean, of course, that the paper has finally been accepted and is being published today. And by one of my very favorite journals, Clinical Pharmacology & Therapeutics!

Once I get the PMID and all that I”ll post it here for your viewing pleasure. The proofs were quite esthetically pleasing, so I expect it to make a good over-the-mantel piece.


9:10am You can view the article here.

9:11am And here is a little bit of press at US News.

9:12am And the Scope Blog at Stanford quoted me!

3:29pm ABC Action News report and another

4:02pm NPR Health Blog (Shots) wrote about my MacBook Pro! Read the comments, a heated debate ensues :)

4:14pm seems to like it, and so does the Stanford SCOPE blog.

5:04pm My first time on NPR! Local Radio (KQED) coverage of our study. Here is the audio:

5:39pm A little bit of a hyperbole, but I’ll take it.

9:39pm Someone submitted the paper to and it hit the front page! I took a screen shot to preserve the awesomesauce. Wow!

[UPDATE 5/26]

6:33am KQED picked up the story again, this time with a quote from Hank Greely.

7:49am And The Stanford Daily covered it too!


[UPDATE 5/27]

Chelsea Conaboy wrote a great article covering the study at the Boston Globe.

Peter Aldhous also describes the paper very well in his article at the New Scientist.

May 31st, 2011

CNN covered the work as part of a bigger story.


JAMA covered it as did DrugTopics!


Filed under:Science

Submitting to “The Journal”

Science it stupid-fun. I mean, honestly, as a graduate student in informatics I get to sit in front of my computer all day, musing about problems, programming scripts, and testing hypotheses. It’s the greatest job in the world and every so often you hit on something really cool and get to publish a paper. Sometimes you figure out a new way to analyze the data or you apply an old technique to a new field. Either way, sharing your discovery is both fun and gratifying.

The papers you publish are the lifeblood of a scientist’s career and as a graduate student will form the basis of my PhD thesis. I’ve been lucky so far to have worked with some very talented people and publish a couple of papers in my first two years. However, recently I stumbled upon a discovery which could get me a paper in, by most metrics, the most prestigious journal in the world, The New England Journal of Medicine. Just typing this sentence blows my mind. Anyway, because I don’t think about much else, I’m going to blog about my experience submitting a paper to NEJM and will share all the ups and downs…. cross your fingers for ups!

Update: April 28, 2010

The goal was to submit the paper today. But right before my advisor and I were about to submit the final draft of the paper, and I realized that I left out a whole set of analyses! Doh, now I have to go back to the database to extract more patient records and analyze another set. Looks like, it won’t be today.

Update: April 29, 2010

Okay, made a whole bunch of new figures today, modified the text where needed and sat down with my advisor again. We made some final edits (some of which we should have caught in the previous 16 iterations of the manuscript), but alas we feel great about the paper. Personally, I think it’s a work of art and couldn’t imagine it getting rejected :D . Don’t worry, I’m not allowing my hopes to get too high. My advisor let me submit the paper and click on all the “double-check-your-submission” links. In his words “you’re not going to be doing this that often, so you might as well click around.” It was fun and it’s amazing I even get to submit to the NEJM, really.

Paper status is now “Submitted” and we got a confirmation email that it will be forwarded to the editors. Time to order some flowers and wine for our collaborators!

Update: April 30, 2010

I have been neurotically checking the author website at NEJM and today our paper was “Assigned to an Editor”, I have no idea exactly what that means, but it’s movement!

Update: May 2nd, 2010

I went on the website today (yes, and yesterday too), not expecting any updates over the weekend, but much to my surprise the status of the paper has changed. It looks like it made passed the editor and they are now looking for peer reviewers. According to a close friend getting passed the editors is a big deal and a lot of papers don’t make it passed that point. Very exciting! — I just read on the NEJM website about the editorial process and it seems that it has been read by the Editor-in-chief and also another expert editor and has passed both of their filters. Here’s to the reviewers liking it! [crosses-fingers]

Paper status is now “Searching for Reviewers.”

Update: May 10th, 2010

It’s been one week and one day since the last update, and I was starting to think that maybe the next update would come when the reviews came back. That is not the case, however. Looks like they found some reviewers for the paper! Not sure how long they have to review the paper, probably a few weeks at least I imagine. I‘ll check with my PI and add that info in here. Looks like its 2 to 3 weeks.

Paper status is now “Out for Review.”

Update: May 20th, 2010

The reviews of the paper have been returned to the Journal, but a decision has not yet been made. The next step is for the editorial board to discuss the paper and make a decision. I’m not sure how long this process takes, but every other time the status included the word “editor” it only took a few days to get another update. Perhaps I will hear something by weeks end. YIKES! This is getting real.

Paper status is now “With the Editor.”

Update: May 25th, 2010

It’s been a serious roller-coaster ride and I have had quite the experience submitting to the world’s most prestigious journal. The reviews came back a few days ago and we just got a chance to read them today. As far as reviews for The Journal go, they are quite benevolent and really quite positive. They made some great suggestions on how to improve the paper and I have already made most of the changes they suggested (they were quite minor). Unfortunately, for us, the editors were not as favorable on us as the reviewers were and they rejected our paper today. Perhaps the paper is too bold for the New England Journal, perhaps it’s ahead of its time. Perhaps, they just don’t like me or my parentage — I simply can’t be sure. But, what I do know is that this is some of the best work I have ever done and there are many other Journals out there. I will get the good word out about this discovery one way or another.

Paper status is now “Rejected.”


Filed under:Science

Custom iTerm Color Themes

I just found this blog post on here ( and it has changed my iTerm life. I used to be a fanboy but between TextMate’s iTerm integration and this color theme, I am a fully iTerm man now. Because this has changed my life so much, I’m going to repost the script that sets the default iTerm color scheme to this beautiful pastel on dark style.

I’m only posting this for prosterity sake, you should really go to the original blog post (link above) for more detailed information.

Close down iTerm and run this bash script from Terminal. Open up iTerm and you’ll see the changes.


PASTEL=’”Pastel” = {
“Ansi 0 Color” = {
“Blue Component” = 0.3097887;
“Green Component” = 0.3097887;
“Red Component” = 0.3097887;
“Ansi 1 Color” = {
“Blue Component” = 0.3764706;
“Green Component” = 0.4235294;
“Red Component” = 1;
“Ansi 10 Color” = {
“Blue Component” = 0.6727703;
“Green Component” = 1;
“Red Component” = 0.8094148;
“Ansi 11 Color” = {
“Blue Component” = 0.7996491;
“Green Component” = 1;
“Red Component” = 1;
“Ansi 12 Color” = {
“Blue Component” = 0.9982605;
“Green Component” = 0.8627756;
“Red Component” = 0.7116503;
“Ansi 13 Color” = {
“Blue Component” = 0.9965209;
“Green Component” = 0.6133059;
“Red Component” = 1;
“Ansi 14 Color” = {
“Blue Component” = 0.9970397;
“Green Component” = 0.8763103;
“Red Component” = 0.8759136;
“Ansi 15 Color” = {
“Blue Component” = 1;
“Green Component” = 1;
“Red Component” = 1;
“Ansi 2 Color” = {
“Blue Component” = 0.3764706;
“Green Component” = 1;
“Red Component” = 0.6588235;
“Ansi 3 Color” = {
“Blue Component” = 0.7137255;
“Green Component” = 1;
“Red Component” = 1;
“Ansi 4 Color” = {
“Blue Component” = 0.9960784;
“Green Component” = 0.7960784;
“Red Component” = 0.5882353;
“Ansi 5 Color” = {
“Blue Component” = 0.9921569;
“Green Component” = 0.4509804;
“Red Component” = 1;
“Ansi 6 Color” = {
“Blue Component” = 0.9960784;
“Green Component” = 0.772549;
“Red Component” = 0.7764706;
“Ansi 7 Color” = {
“Blue Component” = 0.9335317;
“Green Component” = 0.9335317;
“Red Component” = 0.9335317;
“Ansi 8 Color” = {
“Blue Component” = 0.4862745;
“Green Component” = 0.4862745;
“Red Component” = 0.4862745;
“Ansi 9 Color” = {
“Blue Component” = 0.6901961;
“Green Component” = 0.7137255;
“Red Component” = 1;
“Anti Alias” = 1;
“Background Color” = {
“Blue Component” = 0;
“Green Component” = 0;
“Red Component” = 0;
Blur = 1;
“Bold Color” = {
“Blue Component” = 0.5067359;
“Green Component” = 0.5067359;
“Red Component” = 0.9909502;
Columns = 120;
“Cursor Color” = {
“Blue Component” = 0.3764706;
“Green Component” = 0.6470588;
“Red Component” = 1;
“Cursor Text Color” = {
“Blue Component” = 1;
“Green Component” = 1;
“Red Component” = 1;
“Disable Bold” = 0;
Font = “Monaco 14″;
“Foreground Color” = {
“Blue Component” = 1;
“Green Component” = 1;
“Red Component” = 1;
“Horizontal Character Spacing” = 1;
NAFont = “Monaco 14″;
Rows = 24;
“Selected Text Color” = {
“Blue Component” = 0.9476005;
“Green Component” = 0.9476005;
“Red Component” = 0.9476005;
“Selection Color” = {
“Blue Component” = 0.5153061;
“Green Component” = 0.2224857;
“Red Component” = 0.2099074;
Transparency = 0.1;
“Vertical Character Spacing” = 1;

DISPLAYS=`defaults read net.sourceforge.iTerm Displays | sed “s/}$//”`
defaults write net.sourceforge.iTerm Displays “$DISPLAYS”
echo “Pastel display profile added”

BOOKMARKS=`defaults read net.sourceforge.iTerm Bookmarks | sed ‘s/”Display Profile” = “[^"]*”;/”Display Profile” = “Pastel”;/’`
defaults write net.sourceforge.iTerm Bookmarks “$BOOKMARKS”
echo “Pastel display profile installed as default”

Tags: , ,

Filed under:Development, General

R Tip: Fitting Sigmoidal Data

Sigmoid functions are our friends and sometimes you have data which you would like to fit with a sigmoid function. We can use R to find such a fit. First let us look at a sigmoid function.

y = 1 / (1 + exp( -a*x + b) )

Now let’s say you are given a vector x and y, say:

x = c(0.00,0.02,0.04,0.06,0.08,0.10,0.12,0.14,0.16,0.18,0.20,0.24,0.26,


y = c(0.409742,0.319277,0.530120,0.377778,0.357143,0.608696,0.315789,




In this case the y values here represent probabilities and one thing you’ll notice is that we have probs of 1 and 0. Both of which are bad. So we apply a little “laplace smoothing” to them:

y[y==0] = 0.001

y[y==1] = 0.999

Now let’s look at what the data looks like.

plot(x, y)

Well, it may be sigmoidal, maybe not. For now let’s assume we think it is. Which we do for the most part.

Okay, now let’s solve for a line in our sigmoid function:

y = 1 / (1 + exp( a*x + b) )

1 + exp( a*x + b) = 1/ y

a*x + b = log ( (1/ y) – 1 )

Now the left hand side of the equation is a line and the right hand side is some logarithm of the y data. We can plot x versus this right hand side:

new_y = log( 1 / y – 1 )

plot(x, new_y)

Looks pretty interesting and hopefully at this point it also looks kinda linear, which it kinda does.

Now let’s fit it with a line:

lm.res <- lm( new_y ~ x )


Which produces this output:


(Intercept) x

1.122 -11.647

We can also test the significance of the fit with an ANOVA.


Which produces this output:

Analysis of Variance Table

Response: new_y

Df Sum Sq Mean Sq F value Pr(>F)

x 1 172.80 172.802 14.641 0.0009834 ***

And we can plot the resulting fit in linear space:

Now let’s see how our fit looks back in normal space using our formula with our derived a and b values.

a = -11.647

b = 1.122

plot(x, y)

sim_x = (1:101-1)/100

points(sim_x, 1/(1+exp(a*sim_x+b)), type=”l”)

Voila! We have fit a sigmoid function to our data.

Tags: , ,

Filed under:Development, Science

Snow Leopard Super FAIL

Just got snow leopard in the mail and was so excited to install. Then got on Twitter to see what my friends were up to:

I think I’m waiting now.

Jason posted details describing his troubles and how he worked around it here:

Tags: , ,

Filed under:Development, General

Earmarks and the house that approves them…

Just found a new data set that I couldn’t help running some stats on. published this table which lists all of the members of the house, the number of earmarks they requested, and the total dollar amounts.

The columns of the data are:

  • Representative Name
  • State
  • Number of Earmarks
  • Total Cost
  • Solo Earmarks
  • Solo Cost

The solo columns are for earmarks where that representative was the only representative who requested the earmark.

Republicans vs. Democrats

The first obvious division is to split the data on party lines and see if their behavior is any different.

Column Mean Democrats Mean Republicans P-value Histogram
Total Earmarks 26.8 22.3 3.26E-4
Total Cost $37,402,953 $30,683,681 0.02873
Solo Earmarks 10.3 9.2 0.0925
Solo Cost $7,606,210 $7,746,574 0.7782

Red: Republicans, Blue: Democrats.  Significant p-values are in bold.

Table 1. The above table shows that the average number of earmarks that are approved is significantly higher for democrats than for republicans and also that democrats get significantly more money for their earmarks than republicans do.  Please note that when I say “significantly” I mean it in a statistical sense.  The p-values (or the probability that the difference between democrats and republicans is purely by chance) for the first two rows of the table are signifiant ( less than 0.05).  You can interpret this as having a less than 5% chance of this occurring completely by chance.  However, when looking at solo earmarks there is not a significant difference in the number of earmarks granted or their cost.

For the Statisticians: To calculate the p-value I used the wilcoxon rank sum test as the distributions are not normally distributed.

However, I feel obligated to point out that because the house has a majority of democrats (237 to 163) it may be easier for democrats to get their earmarks passed, thus there are more for democrats.  For comparison, data from when the GOP has control of the house is required.

Filed under:Uncategorized