A Corollary to ExperimentCalculator.com (with examples)

34

Dan McKinley recently put together a very useful tool in estimating how long to run your A/B tests.

The obvious corollary here being, “your experiments will take much longer than you think”.

Let’s dive into some real-world numbers.

Adwords campaign optimization

The scenario. You’re buying clicks from Google Adwords to get people to sign up for your startup’s new service. You just made some copy changes to the landing page which you’re hoping will improve signup conversion. Your base signup rate is 10%, and you expect your new changes to increase signup rate to 15% (a +50% increase!). You spend $0.50 per click with a budget of $100 per day, so your landing pages see a total of 200 visits each day.

The statistics. You’d have to run this campaign for 8 days and spend $800 to verify the changes. Alternately, if conversion rate increased to only 11% (a +10% change), then you’d have to spend $15,000 to verify the change.

Ecommerce optimization (Etsy)

The scenario. During a company hack week, a designer makes several changes to the cart page and wants to run a 1% experiment. The designer is quite bullish about the changes and thinks that it could in fact boost sales by 5% (!), or about $50 million from 2013’s expected sales of over $1 billion.

etsy_cart

The statistics. According to their blog, Etsy sold over $100 million with of goods in April with almost 1.5 billion page views. Assuming standard e-commerce conversion rates of 4% (along with some other assumptions about average order size), this experiment would need to be run for over 3 years! An experiment affecting 10% of users would require only two weeks.

My last startup (Adtuitive)

The scenario. We bought relatively cheap display ads on niche content sites and matched sku-level ads from our database of millions of products. Depending on placement and sites, click rates for us were sometimes around 0.1% (which believe it or not was a huge improvement over static banner ads). We were serving around 200 million ads a month, and we were releasing an algorithmic change that we thought might increase click rates (and our revenue) by 10% (!).

adtuitive_ad_example

As the change was somewhat major, we didn’t want to roll it out to more than 10% of visits during our experiment.

The statistics. We would have had to run the experiment for 39 days. Our 200 million ads per month equated to 3 million per day, or about 500k visits per day (visitors view multiple ads). Running it at 50% would have required only 7 days.

Takeaways

Calling bullshit. Next time someone claims they increased their landing page conversion from 10% to 15%, you may want to question things. Exactly how many conversions are they dealing with? And how many separate changes did they make? Small changes are also harder to measure than larger ones.

Google’s famed 1% experiments really only work at Google scale. You’ll have to run your experiments at 10% or 50% levels. And of course, make sure you double check your statistics.

Opportunity cost. Experiments take more than just design and software to code up, they also take time to run and verify. So before restyling the checkout button, ask yourself if there are other parts of your core funnels or product that you’d be better off testing first.

Other reasons to test. Sometimes changes are necessary to accommodate future functionality or new strategic changes for the overall product. E.g. restyling the cart page to provide more whitespace for a future gift cards launch, or revamping the homepage to give  attention to some fledgling social aspects of your site. In these cases, even when you expect a 0% change (or even a negative change), testing is still important to understand impact. And of course, statistics still apply.

So, next time you’re planning to run an experiment, you may want to spend some time with Mr. ExperimentCalculator.com first. Your intuition is most likely wrong.

Restart

gun hairdryer

Etsy acquired my startup Adtuitive in 2009. At the time, we had a pretty cool product that automated online advertising for small retailers, and we were operating at a modest scale of 200 million ads a month.

Deciding to sell the company was tough, but the last three years at Etsy were awesome. I had the privilege of working with very talented folks across a full stack of things, from Hadoop infrastructure to search ranking to search UI. And of course, search ads. During this time period, I saw Etsy grow from $180 million in 2009 sales, to over $80 million last October alone. My team grew from Adtuitive’s engineering team of only 4 to almost 30 in total.

I’ll miss my time at Etsy, but I’m an entrepreneur at heart, and it’s time to start over. I’ll be taking some time off in the upcoming months – time off from work, time off from management, time off from NYC life. I look forward to writing code again and thinking about real world problems at a fundamental and disruptive level. I’ll be back up running at 110% sometime early to mid next year.

Stay or get back in touch with me at jvdavis ‘at’ gmail.com.

NYC Dining: The cost of a “B”

If you eat out in New York City, the image above should evoke some sort of visceral reaction. In July of 2010, the NYC Department of Health began rating each of the  24,000 restaurants throughout the five boroughs of the city. Each restaurant is given a grade of “A”, “B”, or “C” based on violations ranging from improper food temperature to sewage problems to the presence of vermin. You can browse  the complete list here.

Fast forward 2 years, and the new system seems to be a win for consumers – Mayor Bloomberg credits the program to a 14% reduction in Salmonella, the lowest rate in 20 years. And according to this press release, NYC restaurant revenue is also up 9.3% since grading began. But still many restauranteurs disagree, expressing anger over these health inspections. Restaurants complain about the complexity in understanding the grading system, fighting with the city over infraction points, and spending additional money to maintain their facilities to meet the city’s guidelines.

But clearly the biggest cost associated with the city’s program is the fear of a “B” rating, or, even worse, an unmentionable “C” rating.

Just how costly is a “B”?

To quantify these costs, I correlated NYC restaurant inspection rating changes with their restaurant ratings on the popular review site, Yelp. Starting with the most popular 1000 restaurants in Manhattan on Yelp, I crawled each of their review pages, extracted ratings for each restaurant. NYC health inspection ratings are available via NYC’s OpenData initiative, and each of these top Yelp restaurants were then correlated with their corresponding health code ratings. All code is available on GitHub under my Nyc Restaurant Inspection Project, along with a csv that contains joined Yelp restaurant reviews with their corresponding inspection ratings.

According to the Mayor’s argument, Salmonella cases have gone down since restaurant inspection ratings have, on average, increased since the start of the program. The Mayor’s report claims that the number of “A” ratings has increased form 65% to 72% of all restaurants since the start of the program. And within the set of top Manhattan restaurants analyzed here, trends are similar. The plot below shows average rating inspection value since July 2010 (5.0 represents “A”, 4.0 “B”, etc):

Looking at average Yelp reviews since 2005, we can see that the time period since August 2010 is relatively stable, hovering between 3.8 and 3.9.

To get a better sense of how ratings are impacted by inspection grades, let’s look at restaurant grade changes (“A” to “B”, “A” to “C”) and see how their yelp ratings in the 60 days before and after changed:

Change    Rating Before    Rating After    Delta  
A -> C 3.94 3.68 -6.7%
B -> C 3.86 3.69 -4.6%
A -> B 3.77 3.76 -0.3%

Restaurants downgraded to a “C” rating received significantly lower Yelp ratings in the month after the downgrade, but restaurants receiving a “B” rating were relatively unaffected in their review quality.

So restaurants with “C” ratings tend to have a lower review quality on Yelp, but do lower ratings deter people from dining at a restaurant in the first place? Looking at overal review counts for 60 day periods before and after rating changes:

Change    Count Before    Count After    Delta  
A -> C 167 157 -6.0%
B -> C 214 230 +7.5%
A -> B 724 699 -3.5%

The increase in review counts in “B” to “C” downgrades is most likely due to the data being somewhat thin. Across all three downgrades, Yelp review counts as well as rating counts showed average decreases of almost 2%.

Takeaways

A recent study by Michael Luca found that increased Yelp review rating quality can lead to increased revenue. Among other things, the study also found that independently owned restaurants were much more affected by these reviews than ones with chain affiliations. Many of Manhattan’s top restaurants analyzed here are independent, and the decrease in Yelp ratings here undoubtedly also corresponds to lost revenue.

An interesting question to consider is one of causation: one goal of the inspection program is to improve sanitary conditions at restaurants in NYC. When a restaurant transitions from an “A” rating to a “C” rating, the only change we can say for certain is the letter grade posted outside the front door. In the days and weeks following a downgrade, one would expect restaurants to actually clean up their sanitary conditions. So, during the time period analyzed here, sanitary conditions before the downgrade are probably worse than after.

Of course, the other goal of NYC’s inspection process is to increase consumer awareness. And consumers seem to notice: when restaurants are downgraded, the costs are measurable.

An Insider’s Guide to Facebook’s IPO

The Wall Street Journal recently had a piece on investing in the Facebook IPO. They admit, “most retail investors will be shut out of the offering and won’t get the IPO price, meaning they likely will have to pay more in the days that follow if they want an early piece of the action”.

To see what’s going on here, let’s take a closer look at what happens on the day of the IPO. In pricing the IPO, there are two prices to consider: the offer price, and the open price. The offer price is set by the company and underwriters. This is the stock price that the company receives in its IPO sale. The open price is set by the publicly traded market on the day the company goes public.

As an example, Linkedin went public in May of 2011. Their offering price was $45 per share, and the stock’s open price was $83 per share. On opening day, the stock went up to over $120 per share, and had a low of $80 per share. Linkedin sold approximately 7.8 million shares of stock in its IPO, so on opening day the value of these shares ranged from $624 million (at $80 per share) to $957 million ($122.70 per share). Had Linkedin and their underwriters set the offer price to closer to $80 per share, Linkedin could have made an additional $273 million during its IPO sale.

Let’s consider the difference in offer price vs open price among the six tech companies mentioned in the WSJ article:


Company     Offer   Open (low)  Raised    Loss
Linkedin    $45     $80.00      $351M     $273M
Groupon     $20     $26.11      $700M     $214M
Yandex      $25     $30.55      $1.4B     $310M
Zynga       $10     $9.00       $1B       ($100M)
Renren      $14     $12.30      $743M     ($90M)
Pandora     $16     $17.35      $235M     $20M
TOTAL                           $4.4B     $627M

Had these six company and their underwriters set the offer price closer to the open price, they could have raised an additional $627 million. This value was instead realized by institutional investors and select individuals who were able to participate in the IPO at offering price. They bought a total of $4.43 billion in stock at the IPO, and this stock immediately increased to $5.06 billion when the stock traded publicly that very same day.

Unfortunately, as the WSJ says, most retail investors have no access to IPOs at the offer price. As for the company, they have no access to IPOs at the open price. So who’s getting really getting rich off of these IPOs? The underwriters and their insiders.

3 Day Startup NYC: Day 1 Mentoring

I love innovation. I love working with smart people. And I love working with limited resources around extremely tight deadlines.

3 Day Startup: 30 bright entrepreneurs. Ideas flow Friday night, business plan and product demo need to happen by Sunday night.

The program started out of UT Austin, and I attended as a participant at their first event. I actually pitched the idea that the group ended up building that weekend, which ended up spinning out into a company now called Moodfish. Nik has since taken Moodfish 1000x further, well beyond the simple idea I had. 3DS has grown tremendously since and is now a worldwide operation. They’ve held events in Germany, Spain, The Netherlands, France, Portugal, Israel, Chile, and China.

This weekend marks their first ever event in New York City. I had the privilege of working with a very talented team of individuals last night as a mentor. And Etsy also graciously sponsored the event.

As a mentor, I really enjoyed the discussions I had with these young entrepreneurs and technologists. Some common themes / feedback I gave:

  • For inexperienced / first time entrepreneurs, ideas motivated by a problem that they’ve experienced first hand are always the best. Once this problem has been identified, the next question to answer is whether building a product can solve this problem, and if that product can support a business.
  • Undervaluing current standards. Email, SMS, Craigslist, Post-it Notes: these are all established standards, and their simplicity and pervasiveness is what makes them awesome. When talking innovation, it’s easy to get excited; remember these standards.
  • Maintaining scope and focus. The minimum viable startup (or product) is critical on so many levels: conveying the true value / pitching the idea to others, understanding the business, maintaining focus on execution, and minimizing your technological needs. Every additional component to the business (“and, we’ll give 3% to charity”) adds a tremendous amount of complexity to the business. Start simple.

As a sponsor, I find 3DS as a great way to meet people in a realistic and high-stress setting. As a 3DS participant, a successful weekend requires many real-world skills:

  • The ability to collaborate with people from totally different backgrounds. E.g. an engineer having a discussion with a marketing person about developing a landing page.
  • For engineers, comfort with a workbench, and the ability to rapidly prototype.
  • Execution. Having something to show by the end of the weekend. Finding a balance between working hard, working fast, and building simple and iterating.

Final presentations are on Sunday. I’ll be serving on the panel. Get tickets for the final presentations on EventBrite.

Lending Club Loan Analysis: Making Money with Logistic Regression

The Lending Club is an online marketplace for loans. As a borrower, you can apply for a loan, and if accepted, your loan gets listed in the marketplace. As an investor, you can browse loans in the marketplace, and invest in individual loans at your discretion. This peer to peer model has many advantages over traditional banking counterparts, for example, lower overhead costs, lower cost of capital, etc.

But what excites me the most about peer to peer lending is the democratization of data. As an investor, you can see each and every rejected, completed, ongoing, and available loan. While loan data excludes personally identifiable information, it does include attributes like credit rating, location, college education level, lines of credit, and descriptions of why the applicant needs the loan.

For your average investor who doesn’t have the sophistication (or time) to sift through tens of thousands of reviews, the Lending Club provides tools to find loans based on one’s risk and diversification goals. Being a data geek, I of course immediately downloaded the full dataset.

One of the first things I noticed was that many loans have fairly long descriptions:

“Dear Lenders, I was involved in a sports injury approximately 18 months ago…..Thank you for taking time to read this letter.  Thank you”

While this borrower is clearly in an unfortunate situation (the full text was over 1500 characters in length), it appears as if borrowers who write longer descriptions actually have much higher default rates:

So, is it possible to aggregate across several attributes with the goal of improving upon Lending Club’s basic investment strategies?

The basic problem to be solved here is one of predicting loan default rate. Given a loan with an interest rate of 12% and another loan with an interest rate of 16%, the expected loan default rate of each loan will tell me my expected return. For example, if the first loan has an expected default rate of 25%, and the second a rate of 50%, then my expected interest rates from the loan would be 9% and 8%, respectively. I’d be better off investing in the first loan.

The Lending Club’s analysis tools model default risk solely as a function of a single attribute, credit grade. I built a logistic regression model that optimizes over twelve different attributes including loan size, interest rate, application date, debt to income ratio, home ownership status, and description length.

The model was trained over earliest 50% of loans issued and evaluated over the other half. For each loan, I predict expected default rate and use this to predict the expected interest rate for the loan. Loans are then sorted by highest expected rate. The following shows actual interest rate for investments in the best 40 loans with highest predicted interest rates, up to investments in the best 1000 loans:

For investments over a smaller number of loans (fewer than 400), the logistic regression model clearly outperforms the others. Credit grade binning computes risk as the average default rate of each credit grade, and the final method assumes a default rate of zero for all loans (i.e. just invest in loans with the highest interest rate first).

To get a better idea of sensitivity, for each of the twelve attributes used to train the model, I trained a new model that held out one attribute and used the remaining eleven attributes to train a new model. I then computed expected interest rate for an investment of 80 loans. Resulting interest rate reductions for each attribute are as follows:

Attribute:            Interest rate reduction
amount requested      0.83%
fico range            0.39%
application date      0.35%
earliest credit line  0.31%
interest rate         0.26%
open credit lines     0.26%
total credit lines    0.06%
home ownership        0.04%
credit grade          0.04%
debt to income ratio  0.04%
description length    0.04%
monthly income        0.00%

According to this analysis, the amount requested for a loan is the most important single attribute in the logistic regression model; interest rate drops by 0.83% if this attribute is omitted. On the other hand, description length is relatively unimportant in terms of model sensitivity. This is due to the fact that most loans actually have relatively short descriptions.

Surprisingly, application date is actually quite important to the model. However, when investing in a loan, this isn’t a factor that you can really optimize over, e.g., you can’t invest in a loan issued in 2007, nor can you invest in a loan in the future that someone hasn’t yet applied for. It appears as if the Lending Club’s loan approvals have trended towards riskier loans with higher interest rates:

So, what’s the catch? Why am I blogging here instead of just quietly investing?

  • I do invest in lending club loans, and I will be incorporating my analysis here into my investment strategy.
  • There is of course much more complexity to this problem than I’m presenting. In particular, my model invests in loans with the highest expected return and doesn’t have any real risk model beyond this. I ignore all macroscopic effects.
  • Perhaps the biggest risk of all is if the Lending Club were to go out of business.
  • There are lots of details about my analysis that I haven’t described here. All code can be found on github: https://github.com/drjasondavis/Lending-Club-Learning.
  • There’s a ton more work to be done here: incorporating semantic analysis of descriptions, education information about borrowers, etc.
  • The Lending Club assesses collection fees for loans that are passed due. It’s not 100% clear how these fees are applied, but it probably makes investing in riskier loans less appealing than the models presented here suggest. See more information here under “Investor Fees”: http://www.lendingclub.com/public/rates-and-fees.action
  • As @jderick points out in the comments, this analysis doesn’t accurately account for the cost of capital, which is higher for riskier loans with larger default rates.
  • I’m generally very bullish when it comes to online marketplaces, so I’m excited to share my findings.

Disclaimer: I am not an investment professional. I do not warrant any information supplied here. Invest at your own risk!

Questions on Startups from Luke Carrière

When I was working on my PhD in Austin I got involved with a group called 3 Day Startup (3DS).  The idea is simple: get 40 bright, motivated, and entrepreneurial students in a room for 3 days, and have them build something.

3DS has grown since I attended their first 3 day event, and they now hold these events world wide.  They’ll be holding their first 3DS in NYC on April 20, and I’ll be helping out (as an advisor this time and a sponsor through Etsy). Luke Carrière is organizing the event, and he sent me a list of questions about my startup experience, answered here.

What is your advice to future entrepreneurs?

Do what you love, and start a company if that’s your passion. Entrepreneurship is about value creation, disruption of current standards, and ownership.

How did you recognize the opportunity/research the feasibility of the idea?
 

A successful startup has 99% due with execution: there are bad ideas that of course will never go anywhere, but “good” ideas hold very little weight on their own, IMO.

My previous startup came about through a project I had been previously working on. I was trying to monetize a search engine I’d built and saw a market gap in online retail ads.
 

Recognizing a successful opportunity also involves understanding the strengths of the people who will be working on it (i.e. the founding team).

How did you finance your business?
 

Mostly boot strapped. The internet is a unique place in that production costs are primarily just development time: if you don’t pay yourself, your production costs are zero. We built our initial system on a few machines on Amazon Web Services then scaled up to dozens as we brought on customers.

What was your growth strategy and why?

We were primarily a B2B service provider, so our growth strategy ultimately centered around sales.

What are you willing to give up?

Ownership was one of my three reasons for being an entrepreneur. Company ownership comes in the form of equity and control, and I would sooner give up equity than control.

What is your favorite aspect of being an entrepreneur?

Building cool stuff, re-thinking the way things work, and not having a boss.  Basically a restatement of my reasons for being an entrepreneur.What is your least favorite aspect of being an entrepreneur?
Stress. With ownership and control comes responsibility. If shit hits the fan, you have no one to blame but yourself.

What have you sacrificed?
 

Nothing.What is the number of companies you have started?

I’ve worked on several large and substantial projects of my own, but have only started one company with employees, investors, cash flow, etc. 

What are lessons that you have learned from starting these companies? 

As your company grows, it’s important to stay focused on the core problems you’re trying to solve. Functions like raising money, HR, business operations, etc. require attention, but must be a second priority.

What are the challenges/obstacles you have faced? 

We were presented with an awesome early stage acquisition opportunity 9 months after our series A round. We ended up taking it. I now lead the search and data team at Etsy.com where I’m very happy and work with a great team on interesting problems.

I still think there are huge opportunities in the market we were tackling.

What are some regrets? Biggest mistakes?
 

No regrets.What is you’re background information? (Education, previous job experience, etc.)

Ph.D. in data mining. Ex-Google. Lots of hacking on various projects and consulting.

What is your business structure?
 

S-Corp, I think. Whatever my lawyers recommended.

What resources did you use when starting your company?
 

Good advisors are critical, especially for first time entrepreneurs. We had a great advisor.

How long did it take your initial idea to actually launch it?
 

I worked on some of the technology for quite some time. It took maybe 6 weeks to build our first functional system prototype.

Where do original investors stand now, if at all in your company?
 

Everyone was happy with our acquisition, including our investors.

How do you manage your time?

These days I manage a fairly large team (much bigger than my entire startup). I wake up early and try to get all “real work” done before 11am.What did you do with initial profits?
Left in the bank, paid for server bills.How long did it take for your company to become profitable?
I think we were profitable for a brief period of time right before we raised our series A. Then we hired some awesome people and were no longer profitable :-) .How did your idea change throughout the process?

The core idea stayed in tact, but we learned about how to position it, package it, sell it.

Did you ever think of giving up? If so why?

No.What was your initial role? What is your current role in the company now?
I was the first founder and CEO.What is the worst advice you have ever received and why?
Lawyers generally give very bad advice. They’re generally very smart people, but they don’t understand business relationships behind the contracts and deals they work on. Business starts with people first and contracts second: legal advice only concerns the latter.Which part of your job is actual work opposed to passion?
I’m an engineering director now at a 300 person company. I try to spend as much time as possible working on things that are my passion, and I try to impress my entrepreneurial way of working on my team and larger company culture.How is the economy effecting your business?

A bad economy is a great opportunity for disruption. We closed our series A during March of 2008 when the Dow was at it’s lowest point in 10 years.