Scaling the API Economy with Scale-Free Networks

Transcript

Hey. Again, welcome, and I’m very honored to be here as well. I had a great time in Paris last year, the first time we met. I’m very, very excited to be able to come back and talk. But I’m also kind of nervous because what I have to say may or may not be very popular here. I usually give very uplifting talks, very positive talks, very, like, "Let’s go get them," kind of talks about the future. This one, I’ve got a little bit of worry in this one. Let’s see how this goes. I made sure we leave some room for a little discussion as well.

First, today I come as a messenger. I come as a messenger very much like the goddess Iris. The Greek goddess who is associated with communication and with, more specifically, new endeavors. So I want to talk about something new here. But I bring some rather hard truths, I think. Some may be unpalatable information to some. Hard like iridium, named after the goddess Iris, the second densest most corrosion resistant element on the planet. One that’s found very often, I find really curious, found very often in impact craters because it comes from meteors, and in a poetic turnaround we use it very often in deep space satellites because of its strength at high heats.

It actually works really well for holding the nuclear power centers for these deep space probes. So we’re actually sending iridium back out into space. Which I find really, really cool. Some of the news I have is not so good today. I love this character. This was just like a performance piece that somebody put together. They would leave it out and see if people would react to it.

What I’m here to say is that the most common approach to building internet based web APIs runs very much counter to the way the web works. And that worries me about us. When we talk about scaling the API economy, I’m worried about whether or not we’re really going to be able to scale it to the level that we need to. Because we’re looking at tens of billions of devices in the next 10 years. There’s going to be a huge explosion. We heard Adam talk about this idea of sort of the human API, the devices that we’re going to use on our bodies. It’s not just Google Glass and it’s not just Fitbit but things to read our heart, and things to read our blood pressure, and all sorts of other information inside our bodies, as well as things around us.

Every light sensor, every motion sensor, every heat detector, every video camera, everything monitoring pipelines for oil systems, automobiles, all sorts of utility systems, all sorts of factory processes, all sorts of people moving everywhere. There are going to be billions upon billions of these devices. I’m pretty sure the way we’re thinking about building APIs today is going to be very difficult to handle these tens of billions of devices.

So I think we’re headed for a scalepocalypse! I love this cover. It says some great headlines. If you just look at these headlines. Like the Mayan secrets revealed and then epic fail! It’s very cool stuff. Okay, but I’m getting a little ahead of myself. I’m going to back up a little bit. I’m going to talk about another scalepocalypse that we avoided. It was a big one. It was one that fundamentally changed the way the web works. If we hadn’t have solved this problem we wouldn’t be here today, and it came from a relatively unlikely source, I think. I’m going to go all the way back to 1973. Anybody recognize this picture, by the way?

In 1973, I was actually starting my first year of college in Michigan State University. However, I was starting as a Music major not a Computer major. I was working in music theory and composition. Little did I know, in my first year in Michigan State University, in that same city across the town there was a baby born - Larry Page. Anybody know Larry Page? We recognize Larry Page’s name? Right? So here we are 1973. Two paths cross. One of them is going to be an incredibly influential person in all of our lives. Another one is going to be a billionaire.

I’ll jump to 1994. Does anybody recognize these pictures? Jerry Yang. Dave Folio. They worked on a project at Stanford as students, as engineering students. It turns out that Jerry had this little project on the side he called, "Jerry’s Guide to the World Wide Web." This was 1994, by the way. HTTP comes in 1992. We didn’t get the URI spec until 1995. We didn’t get XML until 1998. Nineteen ninety-four, Jerry’s kind of collecting up all these links and kind of curating them. He and Dave eventually developed a site that becomes rather popular, rather fast. It’s based on this notion of being a curated, aggregated, hierarchical index of all the things that are on the web. And it shoots off. But it immediately starts to run into problems. We all know what those problems are. Right?

Any of us who remember this period remember that it was difficult to find things, and it was not kept up to date. This is our problem. Also, at the same time, in the '90s, we don’t really think about this, but in the early '90s there were lots of competitors to the HTTP specification. We had all sorts of indexing servers. We had Archie, and Gopher, and Wayson, Veronica, and all these cool names. All these other ways that we were indexing information. So lots of people were competing for this space.

Now HTTP was doing very well because it had a very low bar of entry, it was open, we talked about open earlier, Anna was talking about open. It was basically given away, right? So we just gave it away. Little bit different that the way Gopher was treated by the way in the University of Minnesota. They said they were going to start, they wanted to copyright it or brand it or something. It kind of freaked everybody out. They just wanted to make sure nobody could duplicate it or sort of claim that they were the Gopher or something like that. It turns out we ran away in droves from all of these indexed servers. And these centralized index servers became a real bottleneck. A real bottleneck to the system.

Jump ahead a few more years. Same place, Stanford University. There’s Larry again. Larry and his pal Serge, write a paper called, "The Anatomy of Large-Scale Hypertextual Web Search Engine," 1998. In it, it’s just a tiny, tiny, little bit, "Due to rapid advance in web proliferation, creating a search engine today is very different from three years ago." Who do you think they were talking about? Their fellow students. That’s a little jab right there in the paper there. It’s very different now. And it is very, very different. And we sort of assumed this difference was just sort of always there.

This was a fundamental change. This was a fundamental rethinking of how to treat the web. So what is it that they saw that no one else yet had figured out? How did they avoid the scaling problem? How did they completely blow away the speed at which things could get done, and the way in which they were done? I borrowed a little bit from our open source friends. Free, as in scale free. Has anybody heard this phrase? Scale free networking. Scale free networks. "Scale free networks - a network whose degree distribution follows a power law." Follows a power law. To put it in simple terms, there is not equal distribution along this network. The connections are not equally distributed. So nice little math there. Since again remember, I remind you I was a music major, this means nothing to me.

But the power law thing, that’s interesting. What is a power law? Power law. This is an illustration of a power law. It might look familiar. It’s a classic hockey stick. It happens to be reversed in this particular case. But we see lots of hockey sticks. Dave showed us a hockey stick today. We sort of know, any of us who were talking about the internet, or talking about anything that’s sort of modern, we need lots of hockey stick graphs. Al Gore used hockey stick graphs. I used one earlier when we were talking about the billions of devices. We don’t usually call them power law. We use another name. We know this name very well, this idea of long tail. Although most of us apply long tail in economic terms, the early ones or the biggest ones get most of the fish and the other ones have to fight over their leavings. That kind of thing. We always think about it in economic terms. But actually this is the power law. And this is what Serge and Larry figured out. This is what made the difference.

They understand that in a network, in a connected network, in a linked network, in a bunch of nodes there is not equal distribution, and that unequal distribution is actually the important thing. Few nodes have many of the links in the network. There’s some really popular pages, right? This seems really kind of logical to us. But many nodes have very few links. They figured out this pattern. Actually there was a paper written by Barabasi and Albert in 1999, it talked about preferential attachment. But this idea of unequal distribution goes all the way back to the '60s in some early information theory.

This idea that these networks have this sort of odd property that they don’t organize themselves, they’re not just random, there’s some kind of information there built into the network itself. It turns out this notion about this preferential attachment works in our brains as well, it works in neurons. Ant colonies, immune systems, all use this notion of connections. There are some other things we won’t get a chance to talk about, about this notion of preferential attachment. You can look this up, there’s a very good paper, a very readable paper by Barabasi and Albert. And they published some books I think in like 2003, 2005 on the same subject.

But this long tail, this unequal distribution becomes very, very important because what they recognize is that curation and aggregation doesn’t scale on the web. It runs counter to the way the web works. Yahoo and all those before had thought that what we want to do is create this situation that there’s an easy hierarchy for us to find things. As a matter of fact, I remember in this period working with organizations, working with people on the web and they were saying, "Well, we need to create a hierarchy." And I would say, "No, no, no. What you want to create is this sort of loose tagging experience."

Remember we got this idea of tagging in the early oughts. But no, they said, "No, no that’s bad because then it favors the wrong things and you get more of these connections here than you do other places." That’s the way the system works. So not only did Brandon Page figure this out, but they instead, what they decided to do was let the users themselves, in the links that they create, inform the search engine. They found the information already in the network itself. The paper that I mentioned earlier- you’ll get these slides - the paper that I mentioned earlier is a great, very, very readable paper.

So what’s so cool about what these two did is, not only did they write this paper and not only did they notice this, but they actually built the first engine and they called it the Googler. They built that first engine right away and proved that this was true. And proved that they could get so much more information with so much less effort by letting the users inform them, by noticing what was in the network, the information in the network itself. And, of course, it made them lots and lots of money. They figured they could advertise on the pages that have lots of links. They could lead people to those pages. So not only did they gain the leading role in search, not only did they blow away Yahoo and AltaVista, and all these other places, but they actually built a multibillion dollar company by recognizing this property of the web. That was what? Ten years ago? Twelve years ago?

We experience, and I’ve seen more of this recently both on a small scale and a large scale, a lot of what I call hub vulnerability on the web. Unexpected disasters in complex systems cause them to fail rather dramatically. We know this butterfly effect, this idea that this small change can result in a very large difference somewhere along the way. It’s sort of like the reverse of that. Just a small break somewhere, someone misconfiguring a DNS. Boom, the backbone is in trouble. Somebody screwing up a power center that doesn’t feedback correctly and the whole northeast coast of the United States goes down. Complex systems can be affected by very small changes. We’ve had a series of DNS failures in the early oughts before we figured out how to stabilize that network.

Even though we have multiple services on the backbone it was actually pretty easy for somebody to screw something up. We went through this period where we had these vulnerabilities. We all recognized the AWS problems. Like, so many of us now depend on this one service, on this one network of services, that if something goes bad somewhere, a lot of people can be down for quite a bit. It happens with Amazon, it happens with Google, happens with Microsoft’s Azure systems. We’re building these in and they’re getting worse.

Of course, we all remember the big one. Right? When our entire financial system did the same thing because we built all sorts of vulnerabilities into that complex system as well. That generated the phrase that we all recognize now - too big to fail. One of the concerns that I have that I’m telling you today, because I’m concerned that we’re also creating some too big to fails in our business as well. I think it’s worrisome, and I don’t think it’s necessary. Because one of the features of this complex system, one of the features of power law based systems, is not just hub vulnerability but no resiliency. Losing a note or two doesn’t hurt that much.

Highly distributed systems reduce the risk of system-wide failures when they depend on nodes for the most part. Now, power law solved the search problem, got us past a key scaling problem, beyond risk, and created lots, and lots of millionaires. So I want to believe that the power law can help us too. Think about the difference between these two services. When the internet goes down, which one provides you your files even though the internet is down? Which one is based on node resiliency? And which one relies on a vulnerable hub?

Vulnerable hubs. Resilient nodes. Vulnerable hubs. Multiple resilient nodes. We have the possibility. We have all of the bits in place to solve a lot of these problems. But most of us don’t think this way, right? Most of us don’t build these kinds of services. We build another kind of service. We create a hub. Many of us here are responsible for helping people create lots of hubs. We make money when you create hubs. But is that the way we want to scale APIs? Is that going to work? Do you want to be adding more vulnerability to the web? Even to our little corner of it. Or even to our own private version of it inside our organizations, or inside our little private little cloud thing, or whatever we’re going to call that today. Even if we wanted to do that, does the notion of creating curation and aggregation points, is that really going to scale over the long term? If I’m only worried about the next few years, I think I’m cool. I think I can pull this off. I can make some money and split. Five years from now, 10 years from now, 20, 30, 40, 50 billion devices from now, I’m not so sure.

Do you recognize this quote? Right? I had to change it actually. "It with him," and actually the start of it is, "Gentleman we can rebuild him." Because it was always a male-dominated universe when we had our astronauts in the 19 - is that 1980, 1990? Million Dollar Man.

Audience member: Seventies.

Speaker: Seventies. Oh, I’m heartbroken. I was probably watching Million Dollar Man when Serge was born. Was it six, was it five, it was inflation I don’t know. So what if we change the way we think about this? When we think about creating a service that other people will use, what if we changed what we were thinking? What if we said, "How can I create a node based service rather than create a hub?" What are the pluses to this? What are the minuses? What are the costs involved? What are the gains that we can get? Let’s just talk about this a little bit. What are the possible business models that can be based on node systems rather than hubs? Think about building powerful clients and act as my agent, go out and do lots of things that understand lots of different parts of the internet and can solve problems for me. I’d tell you what I’d pay for that.

What about creating user-centric systems? Do the same thing that Serge and Larry did. Flip it around. Rather than tell users what they want, rather than give users what you think they need, ask users to create for you and for everyone else what they want. Ask users to discover the things that are really cool, that are really great, the APIs that really work, the services they really want put together. Adam showed several examples where users, not geeks, not techs, users were saying, "No, no, no. I want these two things to work together." What if we can make that possible? What if we can make that easier? What if we can actually empower individuals to start to put them together?

The same way when John was working at IF, the same way, but make that possible in lots and lots of ways from the user’s perspective. What if the user could create links between things? The user could start telling us how these two things relate and inform the rest of the network. What if they could share that information, share those links, share the things that they build with other people? What if we allowed the users to identify the services themselves rather than us identify them?

So there are lots of possible advantages here. I get the increased processing power of all of these smart devices. This is what the SETI@home service does, right? The US Space Administration tries to get us all to search through stars and look for information just by sharing bits of information on our machines. All of the sudden we treat the web as a machine. We treat the web as an application. Richard Taylor, who teaches at UC Irvine, who is the dissertation lead for Roy Fielding and, Jason Aaron Krantz[SP], and Rohit Khare. I mean these are like the highlights of distributed software architecture for event-based systems and rest-based systems, and computational portability systems.

Richard Taylor actually calls the World Wide Web a single hypermedia application. Not a transport. Not a communications system. Not a storage bucket, but an application that we contribute to. What if we treated it that way? What if we started to have access to lots and lots of metadata? One of the reasons Dropbox works, and some of these other systems work, is because of all of the EXIF information in our phones, all of the location and other information that every time we upload a picture we upload that stuff as well.

What if we took advantage of that? What if we could get people to share all sorts of bits of that with us? How much more could we learn? And we can do that by allowing them to put them in their Dropbox, or whatever the equivalent is, and share that information with us. What if we flip the process around? Rather than worry about lots of users draining the system, what about lots of users actually adding power to the system, adding more functionality to the system? So the scale problem gets reversed, it goes away. We want more people to participate because we get more processing power. What if we reverse that scalability vulnerability challenge?

Now the vulnerability gets reduced because the individual machines work even when the internet is knocked down. Isn’t that what we’re doing with our mobile phones today, right? Most of us who have great apps, those apps work at least in some sort of way even when I don’t have a signal. We’re learning to build smart applications that work even when I’m not connected. And also then catch-up and share when I am connected. What if we did that but not just on a little handheld devices, and not just little pipes, little islands, little stovepipes that don’t share with each other but actually start to share with each other?

We’ll increase network intelligence. The overall network will start to get smarter. We’ll have more information in it the same way that Larry and Serge found out the information is in the network itself and we can find it. There’s a whole bunch of stuff in that paper about dealing with the added information that hypertext linking provides. That was the other thing that they did really well in that search engine, right? We won’t have a chance to talk about it today. We’re heading right for this. This internet of things, this industrial internet, whatever we’re going to call this, this is going to be full of all sorts of the mixtures of smart and dumb devices, very, very small devices that I just plug in and need to communicate immediately, and devices that have rich, rich information that can aggregate that, and churn that data, and do lots of things with it.

But can we make money at this, right? This seems pretty weird. This seems pretty odd. You recognize a few of these, right? IBM was happy to let Microsoft to on the operating system because they knew hardware was where you were going to get the profits. Yahoo knew curated indexes was what people really wanted. You’re never going to make money in open source. So I think it’s important for us not to forget there may be great opportunities here. The power law may apply to more than just search. So my call-to-action to you is be a node not a hub. Think of yourself as a node. As we put more and more things on our bodies, as they carry more and more GPS devices around, I’m a node. As I drive in my car, I’m a node. My house is a node. We’re all nodes.

Think about each one of these services. Wouldn’t it be possible to implement them as node based? Sure, it would. Sure, it would. I would post information onto my own machine, onto my handheld device, or some other device, and I would decide how those devices communicate. Just like I do, Dropbox does this today, I can create as many nodes in my own little world as I want to. Then those nodes can send information to each other back and forth. I can select the service. I can decide who’s going to publish my blog. I can decide who’s going to publish my little micro blogging feed, or all these other things, or pictures of my family. All of a sudden publishers can compete with me about who’s going to carry my content, or how many places I’m going to carry that content. Or maybe publishers are going to charge me to carry that content or connect that in some way. Now we have lots and lots of possible opportunities.

And publishers can distribute as well. Those nodes, they all become nodes in the system. There are lots of possibilities here. And there are even recent attempts, right? Whether it’s Diaspora, or what’s the micro-blogging that’s an open source project? What’s that?

Audience member 2: App dot net.

Speaker: App dot net. There’s a bunch of things. There’s a bunch of attempts at this. But so much of what we’re doing right now is still based on this idea of a central processor. Sort of, we’re trying to create one computer, right, where there’s one CPU, and one set of storage and all these other things. We don’t need to do it that way. We can avoid things. So API providers, I’m telling you know to consider the hub vulnerability problem when you create your system. Am I creating more vulnerability? What are those vulnerabilities going to be? You cannot get rid of them. You can mitigate them but they’ll always be there. Explore the advantage of a node-based system, not just technically, but also financially. What happens when I empower people? What happens when I empower individuals to own not just their data, but maybe their privacy, and maybe all these other things? Adam pointed that sometimes if you build it, that doesn’t mean they’ll show up. Maybe we ought to try it. Maybe we ought to try building a few. Empower users and you may reap the rewards.

Same thing for consumers. If you’re a consumer of an API, think about it, is there a node-based provider that gives you the service you want? Think about using them. Think about the possibilities. Think about what that’s going to do to change the way you write clients. Build your clients as aggregators. Build your clients to take on all sorts of information. Yes, we’ve got some technical details on how we can make that easy, how we can make that possible, but it’s not impossible. In fact, we may also get lots and lots of benefit from this idea of flipping this and changing the way we think about it.

So it turns out maybe my message is a positive one, if we think about this. We have lots of opportunity for some new endeavors. We are standing at just the beginning. Internet is like 20, what? Twenty, 25 years old, something like that? It’s cheap. It’s nothing. We’ve barely started. We can build strong vulnerability-resilient systems that work really, really well. There is another way to build systems to take advantage of the way the web works. Doesn’t run counter to it. We don’t have to build hierarchies, we don’t have to build aggregators, we don’t have to be the one that everyone talks to in order to get the same thing done. We can actually use power law to our advantage. We can probably even profit from it. If we do that, we can meet this growing demand. It’s not going to be a big deal so we can avoid this scalepocalypse. Might be another one in 10 or 20 years but we can avoid this one. So "Be a node!" is my call-to-action. Thank you.