· 6 years ago · Oct 22, 2019, 01:38 PM
1How do you time your startup? Technological forecasts are often surprisingly prescient in terms of predicting that something was possible & desirable and what they predict eventually happens; but they are far less successful at predicting the timing, and almost always fail, with the success (and riches) going to another.
2
3Why is their knowledge so useless? Why are success and failure so intertwined in the tech industry? The right moment cannot be known exactly in advance, so attempts to forecast will typically be off by years or worse. For many claims, there is no way to invest in an idea except by going all in and launching a company, resulting in extreme variance in outcomes, even when the idea is good and the forecasts correct about the (eventual) outcome.
4
5Progress can happen and can be foreseen long before, but the details and exact timing due to bottlenecks are too difficult to get right. Launching too early means failure, but being conservative & launching later is just as bad because regardless of forecasting, a good idea will draw overly-optimistic researchers or entrepreneurs to it like moths to a flame: all get immolated but the one with the dumb luck to kiss the flame at the perfect instant, who then wins everything, at which point everyone can see that the optimal time is past. All major success stories overshadow their long list of predecessors who did the same thing, but got unlucky. So, ideas can be divided into the overly-optimistic & likely doomed, or the fait accompli. On an individual level, ideas are worthless because so many others have them too—‘multiple invention’ is the rule, and not the exception. Progress, then, depends on the ‘unreasonable man’.
6
7This overall problem falls under the reinforcement learning paradigm, and successful approaches are analogous to Thompson sampling/posterior sampling: even an informed strategy can’t reliably beat random exploration which gradually shifts towards successful areas while continuing to take occasional long shots. Since people tend to systematically over-exploit, how is this implemented? Apparently by individuals acting suboptimally on the personal level, but optimally on societal level by serving as random exploration.
8
9A major benefit of R&D, then, is in laying fallow until the ‘ripe time’ when they can be immediately exploited in previously-unpredictable ways; applied R&D or VC strategies should focus on maintaining diversity of investments, while continuing to flexibly revisit previous failures which forecasts indicate may have reached ‘ripe time’. This balances overall exploitation & exploration to progress as fast as possible, showing the usefulness of technological forecasting on a global level despite its uselessness to individuals.
10
11In the 1980s, famed technologist Stewart Brand visited the equally-famed MIT Media Lab (perhaps the truest spiritual descendant of the MIT AI Lab) & Nicholas Negroponte, publishing a 1988 book, The Media Lab: Inventing the Future at M.I.T. (TML). Brand summarized the projects he saw there and Lab members’ extrapolations into the future which guided their projects, and added his own forecasting thoughts.
12
13Visiting the Media Lab
14Three decades later, the book is highly dated, and the descriptions are of mostly historical interest for the development of various technologies (particularly in the 1990s). But enough time has passed since 1988 to enable us to judge the basic truthfulness of the predictions and expectations held by the dreamers such as Nicholas Negroponte: they were remarkably accurate! And the Media Lab wasn’t the only one, General Magic (1989), had an almost identical vision of a networked future powered by small touchscreen devices. (And what about Douglas Engelbart, or Alan Kay/Xerox PARC, who explicitly aimed to ‘skate towards where the puck would be’?) If you aren’t struck by a sense of déjà vu or pity when you read this book, compare the claims by people at the Media Lab with contemporary—or later—works like Clifford Stoll’s Silicon Snake Oil, and you’ll see how right they were.
15
16Déjà vu, because what was described in TML on every other page is recognizably ordinary life in the 1990s and 2000s, never mind the 2010s, from the spread of broadband to the eventual impact of smartphones.
17
18And pity, because the sad thing is noting how few future millionaires or billionaires grace the page of TML—one quickly realizes that yes, person X was 100% right about Y happening even when everyone thought it insane, except that X was just a bit off, by a few years, and either jumped the gun or was too late, and so some other Z who doesn’t even appear in TML was the person who wound up taking all the spoils. I read it constantly thinking ‘yes, yes, you were right—for all the good it did you!’, or ‘not quite, it’d actually take another decade for that to really work out’.
19
20To Everything A Season
21“I basically think all the ideas of the ’90s that everybody had about how this stuff was going to work, I think they were all right, they were all correct. I think they were just early.”
22
23Marc Andreessen, 2014
24
25The question constantly asked of anyone who claims to know a better way (as futurologists implicitly do): “If you’re so smart, why aren’t you rich?” The lesson I draw is: it is not enough to predict the future, one has to get the timing right to not be ruined, and then execute, and then get lucky in a myriad ways.
26
27Many ‘bubbles’ can be interpreted as people being 100% correct the future—but missing the timing (Thiel’s article on China and bubbles1, The Economist on obscure property booms, Garber’s Famous First Bubbles). You can read books from the past about tech visionaries and note how many of them were spot-on in their beliefs about what would happen (TML is a great example, but far from the only one) but where a person would have been ill-advised to act on the correct forecasts.
28
29NOT TO THE SWIFT
30“Whoever does not know how to hit the nail on the head should be entreated not to hit the nail at all.”
31
32Friedrich Nietzsche2
33
34Many startups have a long list of failed predecessors who tried to do much the same thing, often simultaneous with several other competitors (startups are just as susceptible to multiple discovery as science/technology in general3). What made them a success was that they happened to give the pinata a whack at the exact moment where some S-curves or events hit the right point. Consider the ill-fated Pets.com: was the investor right to believe that Americans would spend a ton of money online such as for buying dogfood? Absolutely, Amazon (which has rarely turned a profit and has sucked up far more investment than Pets.com ever did, a mere ~$300
35452
362002
37m) is a successful online retail business that stocks thousands of dog food varieties, to say nothing of all the other pet-related goods it sells, and Chewy, which primarily does pet food, filed for a multi-billion-dollar IPO in 2019 on the strength of its billions in revenue. But the value of Pets.com stock still went to ~$0. Facebook is the biggest archive of photographs there has ever been, with truly colossal storage requirements; could it have succeeded in the 1990s? No, and not even later, as demonstrated by Orkut and Friendster, and the lingering death of MySpace. One of the most notorious tech business failures of the 1990s was the Iridium satellite constellation, but that was brought down by bizarrely self-sabotaging decisions on the part of Motorola, and when Motorola was finally removed from the equation, Iridium found its market, and 2017 saw the launch of the second Iridium satellite constellation, Iridium NEXT, with competition from other since-launched satellite constellations, including SpaceX’s own nascent Starlink (aiming at global broadband Internet) which launched no less than 60 satellites in May 2019. Or look at computers: imagine an early adopter of an Apple computer saying ‘everyone will use computers eventually!’ Yes, but not for another few decades, and ‘in the long run, we are all dead’. Early PC history is rife with examples of the prescient failing.
38
39Smartphones are an even bigger example of this. How often did I read in the ’90s and early ’00s about how amazing Japanese cellphones were and how amazing a good smartphone would be, even though year after year the phones were jokes and used pretty much solely for voice? You can see the smartphones come up again and again in TML, as the visionaries realize how transformative a mobile pocket-sized computer would be. Yet, it took until the mid-00s for the promise of smartphones to materialize overnight, as it were, a success which went primarily to latecomers Apple and Google, cutting out the previously highly-successful Nokia, never mind visionaries like General Magic. (You too can achieve overnight success in just a few decades of hard work…) A 2013 interview with Eric Jackson looks back on smartphone adoption rates:
40
41Q: “What’s your take on how they’re [Apple] handling their expansion into China, India, and other emerging markets?”
42A: “It’s depressing how slow things are moving on that front. We can draw lines on a graph but we don’t know the constraints. Again, the issue with adoption is that the timing is so damn hard. I was expecting smartphones to take off in mid 2004 and was disappointed over and over again. And then suddenly a catalyst took hold and the adoption skyrocketed. Cook calls this”cracking the nut“. I don’t know what they can do to move faster but I suspect it has to do with placement (distribution) and with networks which both depend on (corrupt) entities.”
43In 2012, I watched impressed as my aunt used the iPhone application FaceTime to video chat with her daughter half a continent away. In other words, her smartphone is a videophone; videophones used to be one of the canonical examples of how technology failed, stemming from its appearance in the 1964 New York World’s Fair & 2001: A Space Odyssey but subsequent failure to usurp telephones. This was oft-cited as an example of how technoweenies failed to understand that people didn’t really want videophones at all—‘who wants to put on makeup before making a call?’, people offered as an explanation, in all seriousness—but really, it looks like the videophones back then simply weren’t good enough.
44
45Or to look at VR; I’ve noticed geeks express wonderment at the Oculus Rift (and Vive and PlayStation VR and Go and Quest…) bringing Virtual Reality to the masses, and won’t that be a kick in the teeth for the Cliff Stolls & Jaron Laniers (who gave up VR for dead decades ago)? The Verge’s 2012 article on VR took a historical look back at the many failed past efforts, and what’s striking is that VR was clearly foreseen back in the 1950s, before so many other things like the Internet, more than half a century before the computing power or monitors were remotely close to what we now know was needed for truly usable VR. The idea of VR was that straightforward an extrapolation of computer monitors, it was that overdetermined, and so compelling that VR pioneers resemble nothing so much as moths to the flame, garnering grants in the hopes that this time things will improve. And at some point, it does improve, and the first person to try at the right time may win the lottery; Palmer Luckey (founder of Oculus, sold to Facebook for $2.3
462.75
472014
48 billion in March 20144):
49
50Here’s a secret: the thing stopping people from making good VR and solving these problems was not technical. Someone could have built the Rift in mid-to-late 2007 for a few thousand dollars, and they could have built it in mid-2008 for about $500
51628
522008
53. It’s just nobody was paying attention to that.
54
55GO TO THE ANT, THOU SLUGGARD
56“Ummon addressed the assembly and said: ‘I am not asking you about the days before the fifteenth of the month. But what about after the fifteenth? Come and give me a word about those days.’ And he himself gave the answer for them: ‘Every day is a good day.’”
57
58Case 6, Blue Cliff Record5
59
60Any good idea can be made to sound like a bad idea & probably did sound like a bad idea then6, and Bessemer VC’s anti-profile is a list of good ideas which Bessemer declined to invest in. Michael Wolfe offers some examples of this:
61
62Facebook: the world needs yet another MySpace or Friendster except several years late. We’ll only open it up to a few thousand overworked, anti-social, Ivy Leaguers. Everyone else will then join since Harvard students are so cool.
63Dropbox: we are going to build a file sharing and syncing solution when the market has a dozen of them that no one uses, supported by big companies like Microsoft. It will only do one thing well, and you’ll have to move all of your content to use it.
64Virgin Atlantic: airlines are cool. Let’s start one. How hard could it be? We’ll differentiate with a funny safety video and by not being a**holes.
65…iOS: a brand new operating system that doesn’t run a single one of the millions of applications that have been developed for Mac OS, Windows, or Linux. Only Apple can build apps for it. It won’t have cut and paste.
66Google: we are building the world’s 20th search engine at a time when most of the others have been abandoned as being commoditized money losers. We’ll strip out all of the ad-supported news and portal features so you won’t be distracted from using the free search stuff.
67Tesla: instead of just building batteries and selling them to Detroit, we are going to build our own cars from scratch plus own the distribution network. During a recession and a cleantech backlash.7
68…Firefox: we are going to build a better web browser, even though 90% of the world’s computers already have a free one built in. One guy will do most of the work.
69We can play this game all day:
70
71How about Netflix? “We’ll start off renting people a doomed format in a way inferior to our established competitor Blockbuster (which will choose to commit suicide by ignoring both mail order & Internet all the way until bankruptcy in 2010)8; this will (somehow) let us pivot to streaming, where we will license all our content from our worst enemies, who will destroy us the instant we are too successful & already intend to run streaming services of their own—but that’s OK because we’ll just convince Wall Street to wait decades while giving us hundreds of billions of dollars to replace Hollywood by making thousands of film & TV series ourselves (despite the fact that we’ve never done anything like that before and there is no reason to think we would be any better at it than they are).”
72
73Or Github: “We’ll offer code hosting services like that of SourceForge or Google Code which requires developers to use one of the most user-hostile DVCSes, only to FLOSS developers who are notorious cheapskates, and charge them a few bucks for a private version.”
74
75SpaceX: “Orbital Sciences Corporation has a multi-decade headstart but are fat and lazy; we’ll catch up by buying some spare Russian rockets while we invent our own futuristic reusable ones. It’s only rocket science.”
76
77Uber/Lyft/DiDi: “Taxis & buses. You’ve invented taxis & buses. And rental bikes.”
78
79Instacart/Ocado: “We’ll do Kozmo.com/Webvan again, minus the bankruptcy.”
80
81PayPal: “Everyone else’s online payments has failed, so we’ll do it again, with anonymous cryptography! On phones! In 1998! End-users love cryptography, right? If the software doesn’t work out, I guess we’ll… do something else.”
82
83Venmo: “TextPayMe worked out well, right?”
84
85Patreon: “Online micropayments & patronage schemes have failed hundreds of times; might as well try again.”
86
87Bitcoin: “Every online-only currency from DigiCash to Flooz.com to e-gold to [too many to list] has either failed or been shut down by governments; so, we’ll use a hilariously expensive ‘proof of work’ thing we just made up which has zero theoretical support for actually ensuring decentralization & censorproofing, and was roundly mocked by almost every e-currency enthusiast who bothered to read the whitepaper.”
88
89Tether: “Mastercoin already exists so we’ll make another.”
90Seamless/Grubhub/Uber Eats/DoorDash/Slice (!): “CyberSlice blew through $100
91162
922000
93m+ trying to sell pizza delivery, but this time will be different.”
94
95FedEx: “The experienced & well-capitalized Emery Air Freight is already trying and failing to make the hub-and-spoke air delivery method work; I’ll blow my inheritance on trying to compete with them while being so undercapitalized I’ll have to do multiple illegal things to keep FedEx afloat.”
96
97Lotus 1-2-3: “VisiCalc literally invented the spreadsheet, has owned the market for 4 years despite clones like Microsoft’s, and singlehandedly made Apple II a success; we’ll write our own spreadsheet from scratch, fixing some of VisiCalc’s problems, and beat them to the IBM PC. Everyone will buy it simply because it’ll be somewhat better.”
98
99Airbnb: “We’ll max out our credit cards to let people illegally rent out their air mattresses en route to eating the hotel industry.”
100
101Stripe: “Banks & online payment processors like PayPal are heavily-regulated inefficient monopolies which really suck; we’ll make friends with some banks and make a payment processor which doesn’t suck. Our signature will be that it takes fewer lines of code to set up, so programmers will like us.”
102
103LinkedIn: “We’ll do social networking like has been already patented by SixDegrees.com, forcing us to buy their patent.”
104
105Slack: “IRC+email but infinitely slower & more locked in. Businesses won’t be able to get enough of it; employees will love to hate it.”
106
107You don’t have to be bipolar to be an entrepreneur, but it might help. (“The most successful people I know believe in themselves almost to the point of delusion…”)
108
109BUT TIME AND CHANCE
110“Yes, but when I discovered it, it stayed discovered.”
111
112Lawrence Shepp (attributed; “Pity the Scientist Who Discovers the Discovered”)
113
114Why so many failed predecessors?
115
116Part of the explanation is survivorship bias causing hindsight bias. We remember the successes, and see only how they were sure to succeed, forgetting the failures, which vanish from memory and seem laughable and grotesque should we ever revisit them9 as they fumble towards what we can now see so clearly.
117
118The origins of many startups are highly idiosyncratic & chancy; eg. why should a podcasting company, Odeo, have led to Twitter? Survival alone is highly chancy, and founders can often see times where it came down to a dice roll.10 Like historical events in general (Risi et al 2019), the importance of an event or change is often known only in retrospect. Overall, the odds of success are low, and the rewards are not great for most—despite the skewed distribution producing occasional eye-popping returns in a few cases, the risk-adjusted return of the technology sector or VC funds is not that much greater than the broader economy.
119
120“Of course Google was always going to be a huge success because of PageRank and also (post hoc theorizing) Z, Y, & Z”, except for the minor problem that Google was merely one of many search engines, great perhaps11 but not profitable, and didn’t hit upon a profitable business model—much less a unicorn-worth model—until 4 years later when it copied Overture’s advertising auction, which was its salvation (In The Plex); in the mean time, Google had to sign potentially fatal deals or risking burning through the last of its capital when minor technical glitches derailed vital deals. (All of which was doubtless why Page & Brin tried & failed to sell Google to AltaVista & Excite & Yahoo early on, and negotiated a possible sale with Yahoo as late as 2002 which they ultimately rejected.) In a counterfactual world, Google went down in flames quite easily because it never hit upon the advertising innovations that saved it, no matter how much you liked PageRank, and anything else is hindsight bias. Fedex, early on, couldn’t make payroll and the founder famously kept the planes flying only by gambling the last of their money in Las Vegas, among other near-death experiences & crimes—just one of many startups doing highly questionable things.12 Both SpaceX & Tesla have come within days (or hours) of bankruptcy, in 2008 and 2013; in the former case, Musk borrowed money from friends to pay his rent after 3 rocket failures in a row, and in the latter, Musk reportedly went as far as securing a pledge from Google to buy Tesla outright rather than let it go bankrupt (Vance 2015). Tesla’s struggles in general are too well known to mention. Mark Zuckerberg, in 2004, wanted nothing more than to sell Facebook for a few million dollars so he could work on his P2P filesharing program, Wirehog, commenting that the sale price just needed to be large enough “to propel Wirehog.” Youtube was a dating site. Stewart Butterfield wanted to make a MMORPG game which failed, and all he could salvage out of it was the photo-sharing part, which became Flickr; he still really wanted to make a MMORPG, so after Flickr, he founded a company to make the MMORPG Glitch which… also failed, so after trying to shut down his company and being told not to by his investors, he salvaged the chat part from it, which became Slack. And, consistent with the idea that there is a large ineradicable element of chance to it, surveys of startups suggest that while there are individual differences in odds of success (‘skill’), any founder learning curve (‘learning-by-doing’) is small & success probability remains low regardless of experience (Gompers et al 2006/Gompers 2010, Parker 2011, Gottschalk 2014), and experienced entrepreneurs still have low odds of forecasting startups achieving commercialization at all, approaching random predictions in “non-R&D-intensive sectors” (eg Scott et al 2019, McKenzie & Sansone 2019).
121
122Thiel (Zero to One; original): “Every moment in business happens only once. The next Bill Gates will not build an operating system. The next Larry Page or Sergey Brin won’t make a search engine. And the next Mark Zuckerberg won’t create a social network. If you are copying these guys, you aren’t learning from them.”. This is true but I would say it reverses the order (‘N to N+1’?): you will not be the next Bill Gates, because Bill Gates was not the first and only Bill Gates, he was, pace Stigler’s Law, the last Bill Gates13; many people made huge fortunes off OSes, both before and after Gates—you may have forgotten Wang, but hopefully you remember Steve Jobs (before, Mac) and Steve Jobs (after, NeXT). Similarly, Mark Zuckerberg was not the first and only Zuckerberg, he was the last Zuckerberg; many people made social networking fortunes before him—maybe Orkut didn’t make its Google inventor a fortune, but you can bet that MySpace’s DeWolfe and Anderson did well. And there were plenty of lucrative search engine founders (is Jerry Yang still a billionaire? Yes).
123
124Gates, however, proved the market, and refined the Gates strategy to perfection, using up the trick; no one can get historically rich off shipping an OS plus some business productivity software because there are too many competitors and too many players interested in ensuring that no one becomes the next Gates, and so opportunity has moved on to the next area.
125
126A successful company rewrites history and its precursors14; history must be lived forward, progressing to an obscure destination, but we always recall it backwards as progressing towards the clarity of the present.
127
128THE WISE IN THEIR CRAFTINESS
129“It is universally admitted that the unicorn is a supernatural being and one of good omen; thus it is declared in the Odes, in the Annals, in the biographies of illustrious men, and in other texts of unquestioned authority. Even the women and children of the common people know that the unicorn is a favorable portent. But this animal does not figure among the domestic animals, it is not easy to find, it does not lend itself to any classification. It is not like the horse or the bull, the wolf or the deer. Under such conditions, we could be in the presence of a unicorn and not know with certainty that it is one. We know that a given animal with a mane is a horse, and that one with horns is a bull. We do not know what a unicorn is like.”
130
131Jorge Luis Borges, “Kafka And His Precursors” (1951)
132
133Can you ask researchers if the time is ripe? Well: researchers have a slight conflict of interest in the matter, and are happy to spend arbitrary amounts of money on topics without anything to show for it. After all, why would they say no?
134
135Scott Fisher:
136
137I ended up doing more work in Japan than anything else because Japan in general is so tech-smitten and obsessed that they just love it [VR]. The Japanese government in general was funding research, building huge research complexes just to focus on this. There were huge initiatives while there was nothing happening in the US. I ended up moving to Japan and working there for many years.
138
139Indeed, this would have around the Japanese boondoggle the Fifth Generation Project (note that despite Japan’s reputed prowess at robotics, it is not Japan’s robots who went into Fukushima / flying around the Middle East / revolutionizing agriculture and construction). All those ‘huge initiatives’ and…? Don’t ask Fisher, he’s hardly going to say, “oh yeah, all the money was completely wasted, we were trying to do it too soon; our bad”. And Lanier implies that Japan alone spent a lot of money:
140
141Jaron Lanier: “The components have finally gotten cheap enough that we can start to talk about them as being accessible in the way that everybody’s always wanted…Moore’s law is so interesting because it’s not just the same components getting cheaper, but it really changes the way you do things. For instance, in the old days, in order to tell where your head was so that you could position virtual content to be standing still relative to you, we used to have to use some kind of external reference point, which might be magnetic, ultrasonic, or optical. These days you put some kind of camera on the head and look around in the room and it just calculates where you are—the headsets are self-sufficient instead of relying on an external reference infrastructure. That was inconceivable before because it would have been just so expensive to do that calculation. Moore’s law really just changes again and again, it re-factors your options in really subtle and interesting ways.”
142Kevin Kelly: “Our sense of history in this world is very dim and very short. We were talking about the past: VR wasn’t talked about for a long time, right? 35 years. Most people have no idea that this is 35 years old. 30 years later, it’s the same headlines. Was the technological power just not sufficient 30 years ago?”
143…On the Nintendo Power Glove, based on a VPL dataglove design:
144
145JL: “Both I and a lot of other people really, really wanted to get a consumerable version of this stuff out. We managed to get a taste of the experience with something called the Power Glove…Sony actually brought out a little near-eye display called Virtual Boy; not very good, but they gave it their best shot. and there were huge projects that have never been shown to the public to try to make a consumable [VR product], very expensive ones. Counting for inflation, probably more money was spent [than] than Facebook just spent on Oculus. We just could never, never, never get it quite there.”
146KK: “Because?”
147JL: “The component cost. It’s Moore’s law. Sensors, displays… batteries! Batteries is a big one.”
148Issues like component cost were not something that could be solved by a VR research project, no matter how ambitious. Those were hard binding limits, and to solve them by creating tiny high-resolution LED/LCD screens for smartphones, required the benefit of decades of Moore’s law and the experience curve effects of manufacturing billions of smartphones.
149
150Researchers in general have no incentive to say, “this is not the right time, wait another 20 years for Moore’s law to make it doable”, even if everyone in the field is perfectly aware of this—Palmer Luckey:
151
152I spent a huge amount of time reading…I think that there were a lot of people that were giving VR too much credit, because they were working as VR researchers. You don’t want to publish a paper that says, ‘After the study, we came to the conclusion that VR is useless right now and that we should just not have a job for 20 years.’ There were a few people that basically came to that conclusion. They said, ‘Current VR gear is low field of view, high lag, too expensive, too heavy, can’t be driven properly from consumer-grade computers, or even professional-grade computers.’ It turned out that I wasn’t the first person to realize these problems. They’d been known for decades.
153
154AI researcher Donald Michie, claimed in 1970, based on a 1969 poll, that a majority of AI researchers estimated 10–100 years for AGI (or 1979–2069) and that “There is also fair agreement that the chief obstacles are not hardware limitations.”15 While AI researcher surveys still suggest that wasn’t a bad range (Gruetzemacher et al 2019), the success of deep learning makes clear that hardware was a huge limitation, and resources 50 years ago fell short by at least 6 orders of magnitude. Michie went on to point out that in a previous case, Charles Babbage, his work was foredoomed by it being an “unripe time” due to hardware limitations and represented a complete waste of time & money16. This, arguably, was the case for Michie’s own research.
155
156NOR RICHES TO MEN OF UNDERSTANDING
157“But to come very near to a true theory, and to grasp its precise application, are two very different things, as the history of science teaches us. Everything of importance has been said before by somebody who did not discover it.”
158
159Alfred North Whitehead, The Organization of Thought (1917)
160
161So you don’t know the timing well enough to reliably launch. You can’t imitate a successful entrepreneur, the time is past. You can’t foresee what will be successful based on what has been successful; you can’t even foresee what won’t be successful based on what was already unsuccessful; and you can’t ask researchers because they are incentivized to not know the timing any better than anyone else.
162
163Can you at least profit from your knowledge of the outcome? Here again we must be pessimistic.
164
165Certainty is irrelevant, you still have problems making use of this knowledge. Example: in retrospect, we know everyone wanted computers, OSes, social networks—but the history of them is strewn with flaming rubble. Suppose you somehow knew in 2000 that “in 2010, the founder of the most successful social network will be worth at least $10b”; this is a falsifiable belief at odds with all conventional wisdom and about a tech that blindsided everyone. Yet, how useful would this knowledge be, really? What would you do with it? Do you have the capital to start a VC fund of your own, and throw multi-million-dollar investments at every social media until finally in 2010 you knew for sure that Facebook was the winning ticket and could cash out in the IPO? I doubt it.
166
167It’s difficult to invest in ‘computers’ or ‘AI’ or ‘social networking’ or ‘VR’; there is no index for these things, and it is hard to see how there even could be such a thing. (How do you force all relevant companies to sell tradable stakes? “If people don’t want to go to the ball game, how are you going to stop them?” as Yogi Berra asked.) There is no convenient CMPTR you can buy 100 shares of and hold indefinitely to capture gains from your optimism about computers. IBM and Apple both went nearly bankrupt at points, and Microsoft’s stock has been flat since 1999 or whenever (translating to huge real losses and opportunity costs to long-term holders of it). If you knew for certain that Facebook would be as huge as it was, what stocks, exactly, could you have invested in, pre-IPO, to capture gains from its growth? Remember, you don’t know anything else about the tech landscape in the 2000s, like that Google will go way up from its IPO, you don’t know about Apple’s revival under Jobs—all you know is that a social network will exist and will grow hugely. Why would anyone think that the future of smartphones would be won by “a has-been 1980s PC maker and an obscure search engine”? (The best I can think of would be to sell any Murdoch stock you owned when you heard they were buying MySpace, but offhand I’m not sure that Murdoch didn’t just stagnate rather than drop as MySpace increasingly turned out to be a writeoff.) In the hypothetical that you didn’t know the name of the company, you might’ve bought up a bunch of Google stock hoping that Orkut would be the winner, but while that would’ve been a decent investment (yay!) it would have had nothing to do with Orkut (oops)…
168
169And even when there are stocks available to buy, you only benefit based on the specifics—like one of the existing stocks being a winner, rather than all the stocks being eaten by some new startup. Let’s imagine a different scenario, where instead you were confident that home robotics were about to experience a huge growth spurt. Is this even nonpublic knowledge at all? The world economy grows at something like 2% a year, labor costs generally seem to go up, prices of computers and robotics usually falls… Do industry projections expect to grow their sales by <25% a year?
170
171But say that the market is wrongly pessimistic. If so, you might spend some of your hypothetical money on whatever the best approximation to a robotics index fund you can find, as the best of a bunch of bad choices. (Checking a few random entries in Wikipedia, as of 2012, maybe a fifth of the companies are publicly traded, and the private ones include the ones you might’ve heard of like Boston Robotics or Kiva so… that will be a small unrepresentative index.) Suppose the home robotic growth were concentrated in a single private company which exploded into the billions of annual revenue and took away the market share of all the others, forcing them to go bankrupt or merge or shrink. Home robotics will have increased just as you believed—keikaku doori!—yet your ‘index fund’ gone bankrupt (reindex when one of the robotics companies collapses? Reindex into what, another doomed firm?). Then after your special knowledge has become public knowledge, the robotics company goes public, and by EMH, their shares become a normal investment.
172
173Morgan Housel:
174
175There were 272 automobile companies in 1909. Through consolidation and failure, 3 emerged on top, 2 of which went bankrupt. Spotting a promising trend and a winning investment are two different things.
176
177Is this impossibly rare? It sounds like Facebook! They grew fast, roflstomped other social networks, stayed private, and post-IPO, public investors have not profited all that much compared to even late investors.
178
179Because of the winner-take-all dynamics, there’s no way to solve the coordination problem of holding off on an approach until the prerequisites are in place: entrepreneurs and founders will be hurling themselves at an common goal like social networks or VR constantly, just on the off chance that maybe the prerequisites just became adequate and they’ll be able to eat everyone’s lunch. A predictable waste of money, perhaps, but that’s how the incentives work out. It’s a weird perspective to take, but we can think of other technologies which may be like this.
180
181Bitcoin is a topical example: it’s still in the early stages where it looks either like a genius stroke to invest in, or a fool’s paradise/Ponzi scheme. In my first draft of this essay in 2012, I noted that we see what looks like a Bitcoin bubble as the price inflates from ~$0 to $130
182160
1832012
184—yet, if Bitcoin were the Real Deal, we would expect large price increases as people learn of it and it directly gains value from increased use, an ecosystem slowly unlocking the fancy cryptographic features, etc. And in 2019, with 2012 a distant memory, well, one could say something similar, just with larger numbers…
185
186Or take niche visionary technologies: if cryonics was correct in principal, yet turned out to be worthless for everyone doing it before 2030 (because the wrong perfusion techniques or cryopreservatives were used and some critical bit of biology was not vitrified) while practical post-2030 say, it would simply be yet another technology where visionaries were ultimately right despite all nay-saying and skepticism from normals but nevertheless wrong in a practical sense because they jumped on it too early, and so they wasted their money.
187
188Indeed, do many things come to pass.
189
190Surfing Uncertainty
191“Whatsoever thy hand findeth to do, do it with thy might; for there is no work, nor device, nor knowledge, nor wisdom, in the grave, whither thou goest.”
192
193Qoheleth, Ecclesiastes
194
195Where does this leave us? In what I would call, in a nod to Thiel’s ‘definite’ vs ‘indefinite optimism’, definitely-maybe optimism. Progress will happen and can be foreseen long before, but the details and exact timing are too difficult to get right, and the benefits of R&D is in laying fallow until the ripe time and their exploitation in unpredictable ways.
196
197Returning to Donald Michie: one could make fun of his extremely overly-optimistic AI projections, and write him off as the stock figure of the biased AI researcher blinded by the ‘Maes-Garreau law’ where AI is always scheduled for right when a researcher will retire17 but while he was wrong, it is unclear this was a mistake because in other cases, an apparently doomed research project—Marconi’s attempt to radio across the Atlantic ocean—succeeded because of an unknown factor—the Kennelly–Heaviside layer18. We couldn’t know for sure that such projections were wrong, and the amount of money being spent back then on AI was truly trivial (and the commercial spinoffs likely paid for it all anyway).
198
199Further, on the gripping hand, Michie suggests that such research efforts like Babbage’s should be thought of not as commercial R&D, expected to usually pay off right now, but as prototypes buying optionality, demonstrating that a particular technology was approaching its ‘ripe time’ & indicating what are the bottlenecks, so society can go after the bottlenecks and then has the option to scale up the prototype as soon as the bottlenecks are fixed19. Richard Hamming describes ripe time as finally enabling attacks on consequential problems20 Edward Boyden describes the development of both optogenetics & expansion microscopy as “failure rebooting”, revisiting (failed) past ideas which may now be workable in the light of progress in other areas21. As time passes, the number of options may open up, and any of them may bypass what was formerly a necessary or serial dependency which was fatal. Enough progress in one domain (particularly computing power), can sometimes make up for stasis in another domain.
200
201So, what Babbage should have aimed for is not making a practical thinking machine which could churn out naval tables, but demonstrating that a programmable thinking machine is possible & useful, and currently limited by the slowness & size of its mechanical logic—so that transistors could be pursued with higher priority by governments, and programmable computers could be created with transistors as soon as possible, instead of the historical course of a meandering piecemeal development where Babbage’s work was forgotten & then repeatedly reinvented with delays (eg Konrad Zuse vs von Neumann). Similarly, the benefit of taking Moore’s law seriously is that one can plan ahead to take advantage of it22 even if one doesn’t know exactly when, if ever, it will happen.
202
203Such an attitude is similar to the DARPA paradigm in fostering AI & computing, “a rational process of connecting the dots between here and there” intended to “orchestrate the advancement of an entire suite of technologies”, with responsibilities split between multiple project managers each given considerable autonomy for several years. These project managers tend to pick polarizing projects rather than consistent projects (Goldstein & Kearney 2017), ones which generate disagreement among reviews or critics. Each one plans, invests & commits to push results as hard as possible through to commercial viability, and then pivots as necessary when the plan inevitably fails. (DARPA indeed saw itself as much like a VC firm.)
204
205The benefit for someone like DARPA of a forecast like Moore’s law is that it provides one fixed trend to gauge overall timing to within a decade or so, and look for those dots which have lagged behind and become reverse salients.23 For an entrepreneur, the advantage of exponential thinking is more fatalistic: being able to launch in the window of time between just after technical feasibility but before someone else randomly gives it a try; if wrong and it was always impossible, it doesn’t matter when one launches, and if wrong because timing is wrong, one’s choice is effectively random and little is lost by delay.
206
207TRY & TRY AGAIN (BUT LESS & LESS)
208“The road to wisdom?—Well, it’s plain
209and simple to express:
210Err
211and err
212and err again
213but less
214and less
215and less.”
216
217Piet Hein, Grooks
218
219This presents a conflict between personal and social incentives. Socially, one wants people regularly tossing their bodies into the marketplace to be trampled by uncaring forces just on the off chance that this time it’ll finally work, and since the critical factors are unknown and constantly changing, one needs a sacrificial startup every once in a while to check (for a good idea, no amount of failures is enough to prove that it should never be tried—many failures just implies that there should be a backoff). Privately, given the skewed returns, diminishing utility, the oversized negative impacts (a bad startup can ruin one’s life and drive one to suicide), the limited number of startups any individual can engage in (yielding gambler’s ruin)24, and the fact that startups & VC will capture only a minute percentage of the total gains from any success (most of which will turn into consumer surplus/positive externalities), the only startups that make any rational sense, which you wouldn’t have to be crazy to try, are the overdetermined ones which anyone can see are a great idea. However, those are precisely the startups that crazy people will have done years before when they looked like bad ideas, avoiding the waste of delay. Further, people in general appear to overexploit & underexplore, exacerbating the problem—even if the expected value of a startup (or experimentation, or R&D in general) is positive for individuals
220
221So, it seems that rapid progress depends on crazy people.
222
223There is a more than superficial analogy here, I think, to Thompson sampling25/posterior sampling (PSRL) Bayesian reinforcement learning. In RL’s multi-armed bandit setting, each turn one has a set of ‘arms’ or options with unknown payoffs and one wants to maximize the total long-term reward. The difficulty is in coping with failure: even good options may fail many times in a row, and bad options may succeed, so options cannot simply be ruled out after a failure or two, and if one is too hasty to write an option off, one may take a long time to realize that, losing out for many turns.
224
225One of the simplest & most efficient MAB solutions, which maximizes the total long-term reward and minimizes ‘regret’ (opportunity cost), is Thompson sampling & its generalization PSRL26: randomly select each option with a probability equal to the current estimated probability that it is the most profitable option. This explores all options initially but gradually homes in on the most profitable option to exploit most of the time, while still occasionally exploring all the other options once in a while, just in case; strictly speaking Thompson sampling will never ban an option permanently, the probability of selecting an option merely becomes vanishingly rare. Bandit settings can further assume that options are ‘restless’ and the optimal option may ‘drift’ over time or ‘run out’ or ‘switch’, in which case one also estimates the probability that an option has switched, and when it does, one changes over to the new best option; instead of the regular Thompson sampling where bad options become ever more unlikely to be tried, a restless bandit results in constant low-level exploration because one must constantly check lest one fails to notice a switch.
226
227This bears a resemblance to startup rates over time: an initial burst of enthusiasm for a new ‘option’, when it still has high prior probability of being the most profitable option at the moment, triggers a bunch of startups selecting that option, but then when they fail, the posterior probability drops substantially; however, even if something now looks like a bad idea, there will still be people every once in a while who insist on trying again anyway, and, because the probability is not 0, once in a while they succeed wildly and everyone is astonished that ‘so, X is a thing now!’
228
229In DARPA’s research funding and VC, they often aren’t looking for a plan which looks good on average to everyone, or which no one can find any particular problem with, but something closer to a plan which at least one person thinks could be awesome for some reason. An additional analogy from reinforcement learning is PSRL, which handles more complex problems by committing to a strategy and following it until the end and either success/failure. A naive Thompson sampling would do badly in a long-term problem because at every step, it would ‘change its mind’ and be unable to follow any plan consistently for long enough to see what happens; what is necessary is to do ‘deep exploration’, following a single plan long enough to see how it works, even if one thinks that plan is almost certainly wrong, one must “Disagree and commit”. The average of multiple plans is often worse than any single plan. The most informative plan is the most polarizing one.27
230
231The system as a whole can be seen in RL terms. One theme I notice in many systems is that they follow a multi-level optimization structure where slow blackbox methods give rise to more efficient Bayesian inference. Ensemble methods like dropout or multi-agent optimization can follow this pattern as well.
232
233A particularly germane example here is Krafft et al 2016/Krafft 2017 (discussion), which examines a large dataset of trades made by eToro online traders, who are able to clone financial trading strategies of more successful traders; as traders find successful strategies, others gradually imitate them, and so the system as a whole converges on better strategies in what they identify as a sort of particle filter-like implementation of “distributed Thompson sampling” which they dub “social sampling”. So for the most part, traders clone popular strategies, but with certain probabilities, they’ll randomly explore rarer apparently-unsuccessful strategies.
234
235This sounds a good deal like individuals pursuing standard careers & occasionally exploring unusual strategies like a startup; they will occasionally explore strategies which have performed badly (ie. previous similar startups failed). Entrepreneurs, with their speculations and optimistic biases, serve as randomization devices to sample a strategy regardless of the ‘conventional wisdom’, which at that point may be no more than an information cascade; information cascades, however, can be broken by the existence of outliers who are either informed or act at random (“misfits”). While each time a failed option is tried, it may seem irrational (“how many times must VR fail before people finally give up on it‽”), it was still rational in the big picture to give it a try, as this collective strategy collectively minimizes regret & maximizes collective total long-term returns—as long as failed options aren’t tried too often.
236
237Reducing Regret
238What does this analogy suggest? The two failure modes of a MAB algorithm are investing too much in one option early on, and then investing too little later on; in the former, you inefficiently buy too much information on an option which happened to have good luck but is not guaranteed to be the best at the expense of others (which may in fact be the best), while in the latter, you buy too little & risk permanently making a mistake by prematurely rejecting an apparently-bad option (which simply had bad luck early on). To the extent that VC/startups stampede into particular sectors, this leads to inefficiency of the first time—were so many ‘green energy’ startups necessary? When they began failing in a cluster, information-wise, that was highly redundant. And then on the other hand, if a startup idea becomes ‘debunked’, and no one is willing to invest in it ever, that idea may be starved of investment long past its ripe time, and this means big regret.
239
240I think most people are aware of fads/stampedes in investing, but the latter error is not so commonly discussed. One idea is that a VC firm could explicitly track ideas that seem great but have had several failed startups, and try to schedule additional investments at ever greater intervals (similar to DS-PRL), which bounds losses (if the idea turns out to be truly a bad idea after all) but ensures eventual success (if a good one). For example, even if online pizza delivery has failed every time it’s tried, it still seems like a good idea that people will want to order pizza online via their smartphones, so one could try to do a pizza startup 2.5 years later, then 5 years later, then 10 years, then 20 years, or perhaps every time computer costs drop an order of magnitude, or perhaps every time the relevant market doubles in size? Since someone wanting to try the business again might not pop up at the exact time desired, a VC might need to create one themselves by trying to inspire someone to do it.
241
242What other lessons could we draw if we thought about technology this way? The use of lottery grants is one idea which has been proposed, to help break the over-exploitation fostered by peer review; the randomization gives disfavored low-probability proposals (and people) a chance. If we think about multi-level optimization systems & population-based training, and optimization of evolution like strong amplifiers (which resemble small but networked communities: Pavlogiannis et al 2018), that would suggest we should have a bias against both large and small groups/institutes/granters, because small ones are buffeted by random noise/drift and can’t afford well-powered experiments, but large ones are too narrow-minded.28 But a network of medium ones can both explore well and then efficiently replicate the best findings across the network to exploit them.
243
244See also
245Origins of Innovation: Bakewell & Breeding
246Evolution as Backstop for Reinforcement Learning
247Littlewood’s Law and the Global Media (It only takes one—for good & ill)
248“Guess 2/3 of the average”
249External links
250“The Myth of The Infrastructure Phase”
251“Book Review: Zero To One”, SSC
252“Resistant protocols: How decentralization evolves”, John Backus
253“Technological convergence in drug discovery and other endeavors”
254“Explicit and Tacit Rationality”
255“The Milo Criterion”
256“Quantifying the evolution of individual scientific impact”, Sinatra et al 2016
257“Large teams develop and small teams disrupt science and technology”, Wu et al 2019
258“Why did we wait so long for the bicycle?”
259Discussion: HN, Twitter: 1/2/3, GoodReads
260Appendix
261ARPA AND SCI: SURFING AI (REVIEW OF ROLAND & SHIMAN 2002)
262
263Review of DARPA history book, Strategic Computing: DARPA and the Quest for Machine Intelligence, 1983–1993, Roland & Shiman 2002, which reviews a large-scale DARPA effort to jumpstart real-world uses of AI in the 1980s by a multi-pronged research effort into more efficient computer chip R&D, supercomputing, robotics/self-driving cars, & expert system software. Roland & Shiman 2002 particularly focus on the various ‘philosophies’ of technological forecasting & development, which guided DARPA’s strategy in different periods, ultimately endorsing a weak technological determinism where the bottlenecks are too large for a small (in comparison to the global economy & global R&D) organization best a DARPA can hope for is a largely agnostic & reactive strategy in which granters ‘surf’ technological changes, rapidly exploiting new technology while investing their limited funds into targeted research patching up any gaps or lags that accidentally open up and block broader applications.
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300REVERSE SALIENTS
301
302Excerpts from The First Miracle Drugs: How the Sulfa Drugs Transformed Medicine, Lesch 2006, describing Heinrich Hörlein’s drug development programs & Thomas Edison’s electrical programs as strategically aimed at “reverse salients”, necessary steps which hold back the practical application of progress in areas, where research efforts have disproportional payoffs by removing a bottleneck.
303
304
305
306
307
308
309
310“INVESTING IN GOOD IDEAS THAT LOOK LIKE BAD IDEAS”
311
312Summary by one VC of a16z investment strategy.
313
314
315
316
317
318
319
320
321Thiel uses the example of ‘New France’/the Louisiana Territory, in which the projections of John Law et al that it (and thus the Mississippi Company) would be as valuable as France itself turned out to be correct—just centuries later, with the benefits redounding to the British colonies. Even the Mississippi Company worked out: “The ships that went abroad on behalf of his great company began to turn a profit. The auditor who went through the company’s books concluded that it was entirely solvent—which isn’t surprising, when you consider that the lands it owned in America now produce trillions of dollars in economic value.” One could also say the same thing of China: countless European observers forecast that China was a ‘sleeping giant’ which, once it industrialized & modernized, would again be a global power. They were correct, but many of them would be surprised & disappointed how long it took.↩︎︎
322
323#326, “Part II. The Wanderer And His Shadow”, Human, All Too Human.↩︎︎
324
325Nathan Myhrvold‘s patent troll company Intellectual Ventures is also featured in Malcolm Gladwell’s essay on multiple invention, “In the Air: Who says big ideas are rare?”; IV’s business model is to spew out patents for speculations that other people will then actually invent, who can then be extorted for license fees when they make the inventions work in the real world & produce value. (This is assisted by the fact that patents no longer require even the pretense of a working model.) As Bill Gates says, “I can give you fifty examples of ideas they’ve had where, if you take just one of them, you’d have a startup company right there.” Indeed—that this model works demonstrates the commonness of ’multiples’, the worthless of ideas, and the moral bankruptcy of the current patent system.↩︎︎
326
327At the margin, compared to other competitors in the VR space, like Valve’s concurrent efforts, and everything that the Rift built on, did Luckey and co really create ~$2.3
3282.75
3292014
330b of new value? Or were they lucky in trying at the right time, and merely captured all of the value, because a 99% adequate VR headset is worth 0%, and they added the final 1%? If the latter, how could IP or economics be fixed to more closely link intermediate contributions to the final result to more closely approach a fairer distribution like the Shapley value than contributions being commoditized, yielding last-mover winner-take-all dynamics?↩︎︎
331
332Translation, Katsuki Sekida, Two Zen Classics: The Gateless Gate and The Blue Cliff Records, 2005.↩︎︎
333
334Benedict Evans (“In Praise of Failure”) summarizes the problem:
335
336It’s possible for a few people to take an idea and create a real company worth billions of dollars in less than a decade—to go from an idea and a few notes to Google or Facebook, or for that matter Dollar Shave Club or Nervana. It’s possible for entrepreneurs to create something with huge impact.
337
338But equally, anything with that much potential has a high likelihood of failure—if it was obviously a good idea with no risks, everyone would be doing it. Indeed, it’s inherent in really transformative ideas that they look like bad ideas—Google, Apple, Facebook and Amazon all did, sometimes several times over. In hindsight the things that worked look like good ideas and the ones that failed look stupid, but sadly it’s not that obvious at the time. Rather, this is how the process of invention and creation works. We try things—we try to create companies, products and ideas, and sometimes they work, and sometimes they change the world. And so, we see, in our world around half such attempts fail completely, and 5% or so go to the moon.
339
340It’s worth noting that ‘looks like a bad idea’ is flexible here: I emphasize that many good ideas look like bad ideas because they’ve been tried before & failed, but many others look bad because a necessary change hasn’t yet happened or people underestimate existing technology.↩︎︎
341
342Where there is, as Musk describes it, a “graveyard of companies” like Coda Automotive or Fisker Automotive. It may be relevant to note that Musk did not found Tesla; the two co-founders ultimately quit the company.↩︎︎
343
344As late as 2007–2008, Blockbuster could have still beaten Netflix, as its "Total Access" program demonstrated, but CEO changes scuppered its last chance. And, incidentally, offering an example of why stock markets are fine with paying executives so much: a good executive can create—or destroy—the entire company. If Blockbuster’s CEO had paid a pittance ~2000 to acquihire Netflix & put Reed Hastings in charge, or if it had simply stuck with its CEO in 2007 to strangle Netflix with Total Access, its shareholders would be far better off now. But it didn’t.↩︎︎
345
346“After solving a problem, humanity imagines that it finds in analogous solutions the key to all problems. Every authentic solution brings in its wake a train of grotesque solutions.” —Nicolás Gómez Dávila, Nicolás Gómez Davila: An Anthology (original: Escolios a un Texto Implícito: Selección, p. 430)↩︎︎
347
348Finding out these tidbits is one reason I enjoyed reading Founders at Work: Stories of Startups’ Early Days (ed Livingston 2009; “Introduction”), because the challenges are not always what you think they are. PayPal’s major challenge, for example, was not finding a market like eBay power sellers, but coping with fraud as they scaled, which apparently was the undoing of any number of rivals.↩︎︎
349
350Personally, I was still using Dogpile until at least 2000.↩︎︎
351
352From Frock 2006, Changing How the World Does Business: Fedex’s Incredible Journey to Success, in 1973:
353
354On several occasions, we came within an inch of failure, because of dwindling financial resources, regulatory roadblocks, or unforeseen events like the Arab oil embargo. Once, Fred’s luck at the gaming tables of Las Vegas helped to save the company from financial disaster. Another time, we had to ask our employees to hold their paychecks while we waited for the next wave of financing…Fred dumped his entire inheritance into the company and was full speed ahead without concern for his personal finances.
355
356…The loan guarantee from General Dynamics raised our hopes and increased our spirits, but also increased the pressure to finalize the private placement. We continued to be in desperate financial trouble, particularly with our suppliers. The most demanding suppliers when it came to payments were the oil companies. Every Monday, they required Federal Express to prepay for the anticipated weekly usage of jet fuel. By mid-July our funds were so meager that on Friday we were down to about $5,000
35719,705
3581973
359 in the checking account, while we needed $24,000
36094,584
3611973
362 for the jet fuel payment. I was still commuting to Connecticut on the weekends and really did not know what was going to transpire on my return.
363
364However, when I arrived back in Memphis on Monday morning, much to my surprise, the bank balance stood at nearly $32,000
365126,112
3661973
367. I asked Fred where the funds had come from, and he responded, “The meeting with the General Dynamics board was a bust and I knew we needed money for Monday, so I took a plane to Las Vegas and won $27,000
368106,407
3691973
370.” I said, “You mean you took our last $5,000
37119,705
3721973
373—how could you do that?” He shrugged his shoulders and said, “What difference did it make? Without the funds for the fuel companies, we couldn’t have flown anyway.” Fred’s luck held again. It was not much but it came at a critical time and kept us in business for another week.
374
375This also illustrates the ex post & fine line between ‘visionary founder’ & ‘criminal con artist’; had Frederick W. Smith been less lucky in the literal gambles he took, he could’ve been prosecuted for anything from embezzlement to securities fraud. As a matter of fact, Smith was prosecuted—for something else entirely:
376
377Fred now revealed that a year earlier [also in 1973] he had forged documents indicating approval of a loan guarantee by the Enterprise Company without consent of the other board members, specifically his two sisters and Bobby Cox, the Enterprise secretary. Our respected leader admitted his culpability to the Federal Express board of directors and to the investors and lenders we were counting on to support the second round of the private placement financing. While it is possible to understand that, under extreme pressure, Fred was acting to save Federal Express from almost certain bankruptcy, and even to empathize with what he did, it nevertheless appeared to be a serious breach of conduct…December 1975 was also the month that settled the matter of the forged loan guarantee documents for the Union Bank. At his trial, Fred testified that as president of the Enterprise board and with supporting letters from his sisters, he had authority to commit the board. After 10 hours of deliberation, he was acquitted. If convicted, he would have faced a prison term of up to five years.
378
379Similarly, if Reddit or Airbnb had been less successful, their uses of aggressive marketing tactics like sockpuppeting & spam would perhaps have led to trouble.↩︎︎
380
381To borrow a phrase from Kelly:
382
383The electric incandescent lightbulb was invented, reinvented, coinvented, or “first invented” dozens of times. In their book Edison’s Electric Light: Biography of an Invention, Robert Friedel, Paul Israel, and Bernard Finn list 23 inventors of incandescent bulbs prior to Edison. It might be fairer to say that Edison was the very last “first” inventor of the electric light. These 23 bulbs (each an original in its inventor’s eyes) varied tremendously in how they fleshed out the abstraction of “electric lightbulb.” Different inventors employed various shapes for the filament, different materials for the wires, different strengths of electricity, different plans for the bases. Yet they all seemed to be independently aiming for the same archetypal design. We can think of the prototypes as 23 different attempts to describe the inevitable generic lightbulb.
384
385This happens even in literature: Doyle’s Sherlock Holmes stories weren’t the first to invent “clues”, but the last (Moretti 2000, Moretti 2005, Batuman 2005), with other detective fiction writers doing things that can only be called ‘grotesque’; Moretti, baffled, recounts that “one detective, having deduced that ‘the drug is in the third cup of coffee’, proceeds to drink the coffee.”
386
387To give a personal example: while researching “Registered Reports”, supposedly invented in 2013, I discovered that they had been invented at least 10 times dating back to 1966.↩︎︎
388
389“Kafka And His Precursors”, Borges 1951:
390
391At one time I considered writing a study of Kafka’s precursors. I had thought, at first, that he was as unique as the phoenix of rhetorical praise; after spending a little time with him, I felt I could recognize his voice, or his habits, in the texts of various literatures and various ages…If I am not mistaken, the heterogeneous pieces I have listed resemble Kafka; if I am not mistaken, not all of them resemble each other. This last fact is what is most significant. Kafka’s idiosyncrasy is present in each of these writings, to a greater or lesser degree, but if Kafka had not written, we would not perceive it; that is to say, it would not exist. The poem “Fears and Scruples” by Robert Browning prophesies the work of Kafka, but our reading of Kafka noticeably refines and diverts our reading of the poem. Browning did not read it as we read it now. The word “precursor” is indispensable to the vocabulary of criticism, but one must try to purify it from any connotation of polemic or rivalry. The fact is that each writer creates his precursors. His work modifies our conception of the past, as it will modify the future.’ In this correlation, the identity or plurality of men doesn’t matter. The first Kafka of “Betrachtung” is less a precursor of the Kafka of the gloomy myths and terrifying institutions than is Browning or Lord Dunsany.
392
393↩︎︎
394“Integrated Cognitive Systems”, Michie 1970 (pg93–96 of Michie, On Machine Intelligence):
395
396How long is it likely to be before a machine can be developed approximating to adult human standards of intellectual performance? In a recent poll [8], thirty-five out of forty-two people engaged in this sort of research gave estimates between ten and one hundred years. [8: European AISB Newsletter, no. 9, 4 (1969)] There is also fair agreement that the chief obstacles are not hardware limitations. The speed of light imposes theoretical bounds on rates of information transfer, so that it was once reasonable to wonder whether these limits, in conjunction with physical limits to microminiaturization of switching and conducting elements, might give the biological system an irreducible advantage. But recent estimates [9, 10], which are summarized in Tables 7.1 and 7.2, indicate that this is not so, and that the balance of advantage in terms of sheer information-handling power may eventually like with the computer rather than the brain. It seems a reasonable guess that the bottleneck will never again lie in hardware speeds and storage capacities, as opposed to purely logical and programming problems. Granted that an ICS can be developed, is now the right time to mount the effort?
397
398↩︎︎
399Michie 1970:
400
401Yet the principle of ‘unripe time’, distilled by F. M. Cornford [15] more than half a century ago from the changeless stream of Cambridge academic life, has provided the epitaph of more than one premature technology. The aeroplane industry cannot now redeem Daedalus nor can the computer industry recover the money spent by the British Admiralty more than a hundred years ago in support of Charles Babbage and his calculating machine. Although Babbage was one of Britain’s great innovative geniuses, support of his work was wasted money in terms of tangible return on investment. It is now appreciated that of the factors needed to make the stored-program digital computer a technological reality only one was missing: the means to construct fast switching elements. The greater part of a century had to elapse before the vacuum tube arrived on the scene.
402
403↩︎︎
404Which as a side note is wrong; compiled predictions actually indicate that AI researcher forecasts, while varying anywhere from a decade to centuries, typically cluster around 20 years in the future regardless of researcher age. For a recent timeline survey, see “Forecasting Transformative AI: An Expert Survey”, Gruetzemacher et al 2019, and for more, AI Impacts.org. (One wonders if a 20-year forecast might be driven by anthropics: in an exponentially-growing field, most researchers will be present in the final ‘generation’, and so a priori one could predict accurately that it will be 20 years to AI. In this regard, it is amusing to note the exponential growth of conferences like NIPS or ICML 2010–2019.)↩︎︎
405
406Michie 1970:
407
408…A further application of criterion 4 arises if theoretical infeasibility is demonstrated…But it is well to look on such negative proofs with caution. The possibility of broadcasting radio waves across the Atlantic was convincingly excluded by theoretical analysis. This did not deter Marconi from the attempt, even though he was as unaware of the existence of the Heaviside layer as everyone else.
409
410↩︎︎
411Michie 1970:
412
413It can reasonably be said that time was unripe for digital computing as an industrial technology. But it is by no means obvious that it was unripe for Babbage’s research and development effort, if only it had been conceived in terms of a more severely delimited objective: the construction of a working model. Such a device would not have been aimed at the then unattainable goal of economic viability; but its successful demonstration might, just conceivably, have greatly accelerated matters when the time was finally ripe. Vacuum tube technology was first exploited for high-speed digital computing in Britain during the Second World War [16]. But it was left to Eckert and Mauchly [16] several years later to rediscover and implement the conceptions of stored programs and conditional jumps, which had already been present in Babbage’s analytical engine [17]. Only then could the new technology claim to have drawn level with Babbage’s design ideas of a hundred years earlier.
414
415↩︎︎
416A kind of definition of Value of Information:
417
418If you do not work on an important problem, it’s unlikely you’ll do important work. It’s perfectly obvious. Great scientists have thought through, in a careful way, a number of important problems in their field, and they keep an eye on wondering how to attack them. Let me warn you, ‘important problem’ must be phrased carefully. The three outstanding problems in physics, in a certain sense, were never worked on while I was at Bell Labs. By important I mean guaranteed a Nobel Prize and any sum of money you want to mention. We didn’t work on (1) time travel, (2) teleportation, and (3) antigravity. They are not important problems because we do not have an attack. It’s not the consequence that makes a problem important, it is that you have a reasonable attack. That is what makes a problem important.
419
420↩︎︎
421“Ed Boyden on Minding your Brain (Ep. 64)”:
422
423BOYDEN: …One idea is, how do we find the diamonds in the rough, the big ideas but they’re kind of hidden in plain sight? I think we see this a lot. Machine learning, deep learning, is one of the hot topics of our time, but a lot of the math was worked out decades ago—backpropagation, for example, in the 1980s and 1990s. What has changed since then is, no doubt, some improvements in the mathematics, but largely, I think we’d all agree, better compute power and a lot more data.
424
425So how could we find the treasure that’s hiding in plain sight? One of the ideas is to have sort of a SWAT team of people who go around looking for how to connect the dots all day long in these serendipitous ways.
426
427…COWEN: Two last questions. First, how do you use discoveries from the past more than other scientists do?
428
429BOYDEN: One way to think of it is that, if a scientific topic is really popular and everybody’s doing it, then I don’t need to be part of that. What’s the benefit of being the 100,000th person working on something?
430
431So I read a lot of old papers. I read a lot of things that might be forgotten because I think that there’s a lot of treasure hiding in plain sight. As we discussed earlier, optogenetics and expansion microscopy both begin from papers from other fields, some of which are quite old and which mostly had been ignored by other people.
432
433I sometimes practice what I call ‘failure rebooting’. We tried something, or somebody else tried something, and it didn’t work. But you know what? Something happened that made the world different. Maybe somebody found a new gene. Maybe computers are faster. Maybe some other discovery from left field has changed how we think about things. And you know what? That old failed idea might be ready for prime time.
434
435With optogenetics, people were trying to control brain cells with light going back to 1971. I was actually reading some earlier papers. There were people playing around with controlling brain cells with light going back to the 1940s. What is different? Well, this class of molecules that we put into neurons hadn’t been discovered yet.
436
437↩︎︎
438“Was Moore’s Law Inevitable?”, Kevin Kelly again:
439
440Listen to the technology, Carver Mead says. What do the curves say? Imagine it is 1965. You’ve seen the curves Gordon Moore discovered. What if you believed the story they were trying to tell us: that each year, as sure as winter follows summer, and day follows night, computers would get half again better, and half again smaller, and half again cheaper, year after year, and that in 5 decades they would be 30 million times more powerful than they were then, and cheap. If you were sure of that back then, or even mostly persuaded, and if a lot of others were as well, what good fortune you could have harvested. You would have needed no other prophecies, no other predictions, no other details. Just knowing that single trajectory of Moore’s, and none other, we would have educated differently, invested differently, prepared more wisely to grasp the amazing powers it would sprout.
441
442↩︎︎
443It’s not enough to theorize about the possibility or prototype something in the lab if there is then no followup. The motivation to take something into the ‘real world’, which necessarily requires attacking the reverse salients, may be part of why corporate & academic research are both necessary; too little of either creates a bottleneck. A place like Bell Labs benefits from remaining in contact with the needs of commerce, as it provides a check on l’art pour l’art pathologies, a fertile source of problems, and can feed back the benefits of mass production/experience curves. (Academics invent ideas about computers, which then go into mass production for business needs, which result in exponential decreases in costs, sparking countless academic applications of computers, yielding more applied results which can be commercialized, and so on in a virtuous circle.) In recent times, corporate research has diminished, and that may be a bad thing: “The changing structure of American innovation: Some cautionary remarks for economic growth”, Arora et al 2019.↩︎︎
444
445One might appeal to the Kelly criterion as a guide to how much individuals should wager on experiments, since the Kelly criterion gives optimal growth of wealth over the long-term while avoiding gambler’s ruin, but given the extremely small number of ‘wagers’ an individual engages in, with a highly finite horizon, the Kelly criterion’s assumptions are far from satisfied, and the true optimal strategy can be radically different from a naive Kelly criterion; I explore this difference more in “The Kelly Coin-flipping Game”, which is motivated by stock-market investing.↩︎︎
446
447Thompson sampling, incidentally, has been rediscovered↩︎︎
448
449PSRL (posterior sampling, see also Ghavamzadeh et al 2016) generalizes Thompson sampling to more complex problems, MDPs or POMDPs in general, by for each iteration, assuming an entire collection or distribution of possible environments which are more complex than a single-step bandit, picking an environment at random based on its probability of being the real environment, finding the optimal actions for that one, and then acting on that solution; this does the same thing in smoothly balancing exploration with exploitation. Normal PSRL requires ‘episodes’, which don’t really have a real-world equivalent, but PSRL can be extended to handle continuous action—a nice example is deterministic schedule posterior sampling reinforcement learning (DS-PRL), which does ‘back off’ in periodically stopping, and re-evaluating the optimal strategy based on accumulated evidence, but less & less often, so it does PSRL over increasingly large time windows.↩︎︎
450
451Polarizing here could be reflect a wide posterior value distribution, or if the posterior is being approximated by something like a mixture of experts or an ensemble of multiple models (like running multiple passes over a dropout-trained neural network, or a bootstrapped neural network ensemble). In a human setting, it might be polarizing in the sense of human peer-reviewers arguing the most about it, or having the least inter-rater agreement or highest variance of ratings.
452
453As Goldstein & Kearney 2017 describe their analysis of the numerical peer reviewer ratings of DARPA proposals:
454
455In other words, ARPA-E PDs tend to fund proposals on which reviewers disagree, given the same mean overall score. When minimum and maximum score are included in the same model, the coefficient on minimum score disappears. This suggests that ARPA-E PDs are more likely to select proposals that were highly-rated by at least one reviewer, but they are not deterred by the presence of a low rating. This trend persists when median score is included (Model 7 in Table 3). ARPA-E PDs tend to agree with the bulk of reviewers, and they also tend to agree with scores in the upper tail of the distribution. They use their discretion to surface proposals that have at least one champion, regardless of whether there are any detractors…The results show that there is greater ex ante uncertainty in the ARPA-E research portfolio compared to proposals with the highest mean scores (Model 1).
456
457↩︎︎
458The different pathologies might be: small ones will collectively try lots of strange or novel ideas but will fail by running underpowered poorly-done experiments (for lack of funding & expertise) which convince no one, suffer from small-study biases, and merely pollute the literature, giving meta-analysts migraines. Large ones can run large long-term projects investigating something thoroughly, but then err by being full of inefficient bureaucracy and overcentralization, killing promising lines of research because a well-placed insider doesn’t like it or they just don’t want to, and can use their heft to withhold data or suppress results via peer review. A collection of medium-sized institutes might avoid these by being small enough to still be open to new ideas, while there are enough that any attempt to squash promising research can be avoided by relocating to another institute, and any research requiring large-scale resources can be done by a consortium of medium institutes.
459
460Modern genomics strikes me as a bit like this. Candidate-gene studies were done by every Tom, Dick, and Harry, but the methodology failed completely because sample sizes many orders of magnitude larger were necessary. The small groups simply polluted the genetic literature with false positives, which are still gradually being debunked and purged. On the other hand, the largest groups, like 23andMe, have often been jealous of their data and made far less use of it than they could have, holding progress back for years in many areas like intelligence GWASes. The UK Biobank has produced an amazing amount of research for a large group, but is the exception that proves the rule: their openness to researchers is (sadly) extraordinarily unusual. Much progress has come from groups like SSGAC or PGC, which are consortiums of groups of all sizes (with some highly conditional participation from 23andMe).↩︎︎
461
462Ironically, as I write this in 2018, DARPA has recently announced another attempt at “silicon compilers”, presumably sparked by commodity chips topping out and ASICs being required, which I can only summarize as “Verilog but let’s do it sanely this time and with FLOSS rather than a crazy tragedy-of-the-anticommons proprietary ecosystem of crap”.↩︎︎
463
464Specifically, contemporary computers don’t use the dense grid of 1-bit processors with local memory which characterized the CM. They do feature increasingly thousands of ‘processor’ equivalents in the form of CPU cores and the GPU cores, but those are all far more powerful than a CM CPU node. But we might yet see some convergence with the CM thanks to neural networks: neural networks are typically trained with wastefully precise floating point operations, slowing them down, thus the rise of ‘tensor cores’ and ‘TPUs’ using lower precision, like 8-bit integers, and it is possible to discretize neural nets all the way down to binary weights. This offers a lot of potential electricity savings, and if you have binary weights, why not binary computing elements as well…?↩︎︎
465
466People tend to ignore this, but CNNs can work with a few hundred or even just one or two images, using transfer learning, few-shot learning, and aggressive regularization like data augmentation.↩︎︎
467
468While the accuracy rates may increase by what looks like a tiny amount, and one might ask how important a change from 99% to 99.9% accuracy is, the large-scale training papers demonstrate that neural nets continue to learn hidden knowledge from the additional data which provide ever better semantic features which can be reused elsewhere.↩︎︎