9
Oct

Internet Technologies – Computer Science for Business Leaders 2016


DAVID J. MALAN: All right, so
the overarching question now, and we started down this road with
our look at Dropbox, is the internet. So let me try to ask a
loaded question deliberately. What is the internet? >>Surely you all use It.>>AUDIENCE: Network? DAVID J. MALAN: A network? OK, what is a network?>>AUDIENCE: A connectivity
between different systems. DAVID J. MALAN: OK, connectivity
between different people and systems. All right, and what makes
the internet an internet as opposed to just a network as we might
have in just a building or a classroom? AUDIENCE: It’s global. DAVID J. MALAN: It’s global. All right, so it’s a network
of networks, if you will. Internet denoting connections
across individual networks. And of course, there’s
different services that the internet provides these days.>>There’s, of course, the world wide
web with which all of us are familiar. There’s services like email. There’s services like
chat or Google Chat. Or there’s things like voice over IP. There’s things like Skype, and Google
Hangouts, and FaceTime, and the like.>>And so there’s this layering
concept in the internet. And indeed, this too is
a fundamental concept in computer science of
layering, or abstraction, where you build one thing down here. Then, you build something
else on top of it, and then, something else on top
of it, on top of it, on top of it. And so we’ll see some manifestations of
that in this discussion and, perhaps, others moving forward.>>So let’s start to paint a picture
of some of the technologies all around us by considering what
is, perhaps, in most everyone’s home here, and use that as a point of
departure for a conversation more generally about how all of this stuff
works, and what some of the issues underlying design decisions have
to be when building networks and when using the internet. So back at home, we’ll go
back to my little laptop here. You probably have one or more
computers, and maybe one or more phones, that are connected these days via Wi-Fi. Maybe once upon a time, you had a cable. Maybe you do still have a desktop
computer at home that has a cable. But our story’s not really
going to change that much there.>>Here is the so-called
cloud, or internet. And there are bunches of other things
on the internet like Amazon.com, and Facebook, and Google, and
Microsoft, and other such companies on the internet, and
certainly people as well. But there’s a whole lot of stuff that
goes on between you and the internet.>>So let’s first tease apart that. What is your computer, if
wirelessly, connected to at home? What kind of devices gets you
on the internet these days?>>AUDIENCE: Router.>>DAVID J. MALAN: A router. So you have this a home device called
a router, whose purpose in life, ultimately, is to route
information at the simplest form. If this is the internet over here, your
computer has connectivity between it. And the router, meanwhile,
somehow has connectivity between the rest of the internet.>>But there’s even more
going on inside of here. So let’s dive in a little deeper. You go home. You open your laptop’s lid or turn on
your desktop for the first time ever, the first time in a while. What happens?>>What kinds of steps have to
happen before you can actually get on the internet? Well, it turns out– oh, yeah? Nakissa? Sorry?>>AUDIENCE: User ID.>>DAVID J. MALAN: A user ID. So you might have to
log in to something. Although, typically at
home, most typically this would just work these days.>>But as we just saw, in environments
like universities, companies, you have to log in. So let’s avoid the
login scenario for now. Keep it simple.>>AUDIENCE: Open up a browser.>>DAVID J. MALAN: You
might open a web browser. Or what, Pat?>>AUDIENCE: Number or passcode. DAVID J. MALAN: Ah,
a number or passcode. So let’s go with number, not
so much passcode just yet. Let’s not worry about security
for this particular discussion. But a number.>>So, yeah, in fact, much like all
of our homes or a building like has a physical address. This building is One Brattle Square in
Cambridge, Massachusetts, 02138, USA. That address uniquely identifies
us, in theory, in the whole world.>>AUDIENCE: An IP.>>DAVID J. MALAN: An IP address, exactly,
is the analog in the computer world that uniquely addresses a computer. So an IP address, or internet protocol
address, is just a numeric address. Computers prefer things that
are a little simpler, that are easier to read than long phrases
like One Brattle Square, Cambridge, Mass., and so forth.>>And so an IP address is a
number of the form something dot something dot
something dot something. And each of these somethings, as
denoted by the pound sign here, is a number between 0 and 255. And so it’s a four-dotted
decimal number– something dot something dot
something dot something.>>And this numeric address,
in theory, uniquely identifies a computer on the internet. So at the risk of
oversimplifying, let’s now assume that when I connect to
Wi-Fi or via cable, at home, my home router is what is
somehow giving me an IP address. Because gone are the
days for the most part, at least locally here,
where when you sign up for Comcast, or RCN, or your
local internet service provider, no longer does a technician have to
come to your house with a printout, and then have you, or him, or her type
in your IP address into your computer.>>Rather, this is all
discovered dynamically. When you open your laptop’s
lid or turn on your computer, your computer just starts
broadcasting a message, essentially. It says, hello. I’m awake. What should my IP address be?>>And the purpose in life of a home
router these days, among them, is to give you exactly
one of these addresses. And the mechanism by which it does
it, just to tease apart some jargon, is called a DHCP server. Fancy way of saying Dynamic
Host Configuration Protocol. It’s just a really
fancy way of saying it is a piece of software running
inside of our home router that, upon hearing your request– hello. I’m online. Please give me an IP address–
responds with exactly that. And it tells you to use something dot
something dot something dot something. And then, your Mac or
PC does exactly that. And just to make this
a little more concrete before we take your question,
on Mac OS, and there’s a comparable window in
Windows, if I go to Network, I can actually see here
that my laptop is connected to Harvard University,
which is the Wi-Fi, and has the IP address 10.254.25.237.>>If I’m more curious, I can
click Advanced on my Mac. I can go up to TCP/IP. And notice what is
now familiar, perhaps. What protocol, what
feature is my laptop using to do exactly what we’ve just described? DHCP. I can’t even change it. Because I’m already
configured right now. It’s locked, this setting. But my computer’s configured using DHCP. And it looks like what
the Harvard’s DHCP server has given me is an IP address–
and 254.25.237– a subnet mask, which we won’t go into today.>>But a subnet mask is
just an additional number that specifies what network you’re on. Maybe it’s this room’s. Maybe it’s a different building. Maybe it’s a different part of Harvard. It’s a way of segmenting
a local network.>>Router, that word sounds familiar. Because we were just
talking about it here. And even though I’m on Harvard’s
network, not like a home network, the principles are still the same here.>>Harvard has also told me the IP
address of a router– 10.254.16.1. And as an aside, generally as a
convention, but it’s not required, a router’s IP address does tend to
end with .1, which is a useful signal, just to know this. So what do these things do?>>The IPv4 address, version 4, which
is sort of the older but most popular version of internet protocol
these days, is that address. I’ve got a router address. So why do I need to
know a router’s address?>>Isn’t it sufficient to know where I am? AUDIENCE: That’s [INAUDIBLE]
related to my question. So if you have two
routers in the same room so we can get connected
to each other, then you will get a separate IP
address because it’s going to be associated with a network.>>DAVID J. MALAN: Ah, so
this is where we actually have to start teasing apart
what we really mean by router. Because the term, certainly in
the consumer market, is overused. So in this room alone, we
have what most people would call two routers, these
things with antennas and the blue lights on
either side of the wall.>>But router, in this case, they’re not. These aren’t quite home routers. But let’s just suppose, for simplicity,
we do have two such things here. If you had two access points,
as they’re more properly called because of the antennas– a
wireless access point or AP– they should be configured in a
way that they, in turn, connect to one central device, whose purpose in
life is to do what you’re describing, to give out the IP address.>>If you did have two of these
kinds of devices at home, maybe two Linksys, devices two D-Link
devices, two AirPort Extremes at home, or AirPort Expresses. You can configure all
of those products, even if you have two identical
models, to make one the primary, and then the other the secondary. So that you run a wire
between them, typically, or you have someone come do
it for you behind the walls.>>And then, one is the primary. One is in charge of
giving out IP addresses. And the other one is just
responsible for extending the range of your wireless signal. In fact, at home I have two such things.>>We have in our office five
such things, all of which are physically wired together. But it’s just to give us
more wireless coverage. But one of them is in charge.>>OK, so with that said, why does
my Mac in this room right now, need to know what the IP
address of the router is? Isn’t it sufficient just to
be told what my address is? AUDIENCE: But it can change. If you get connected to the
VPN, it’s going to be different. DAVID J. MALAN: Oh, now you’re using
another word I don’t know yet– VPN. So let’s not go there. Because VPN’s going to complicate it. I just want to get, little old me
wants to get on the internet right now. Well, this really invites the
question, how does the internet work?>>All right, I might have an address. That’s all fine and good. But why do I have an address?>>Well, let’s consider what really
is going on on the internet. I’ll use a different
picture for the moment. And in the actual internet, we might
have me over here on my laptop. We might have the internet over here. And then, we might have, let’s
say, Amazon.com this time.>>And this is me. And, somehow, I want to connect to
Amazon.com, through the internet, and get my data from point A to
point B. Or I guess, in Amazon, from point A to point
Z in Amazon’s case.>>So what is inside of this internet? It turns out, there’s a whole
bunch of things called routers. And now, we’re mixing terms. But we’ll see how even home
routers relate to the dots that I’ve just drawn on the screen.>>A router on the internet is
generally like a medium-sized device. It’s not like an old mainframe. But it’s a device that’s probably this
wide, maybe this tall, maybe this tall, maybe this tall. Depends on how expensive
a model you have.>>And it’s got a lot of cables coming into
it and a lot of cables going out to it. And at the risk of oversimplifying, you
can think of a router’s purpose in life as being to take in data from this cable
here, look at the information that’s come in, and look at its address. Where is this information being sent? And then say, OK, I’m going
to send this along this way. If I get another piece
of information over here, it’s destined for a different address. I’m going to send it this
way, instead, up this cable. And if I see another piece
of information destined for yet a different address, I’m
going to send it out this cable, over in this way.>>So a router’s purpose in life
is to truly route information. And in it’s simplest form, a router
just has a big Excel file inside of it that says any IP address starting
with the number 1, send it this way. Any IP address starting with
the number 2, send it this way. Number 3, send it this way. Number 4, send it that way.>>Oversimplifying, but it uses
those numbers and, specifically, prefixes of numbers, typically,
to decide to go left, right, back, forward. Because a router, typically, has
multiple connections to other routers. In fact, I’ve not drawn them here.>>But you can imagine this being a web,
not to be confused with the web we use, but a web of devices, all of which are
interconnected very deliberately so. In fact, the origins of the
internet are militaristic in design. And one of the designing principles
was that if a router, or worse, a city were taken out in a military
sense, you want the data to be able to route around that problem.>>And so what happens when I send a
request to Amazon.com for their home page, my data might leave my
computer, go to my default router, or default gateway as it’s often called. Then, maybe that router will decide to
send it here, here, here, here, here, here, here, and then
on its way to Amazon.>>And that was an arbitrary path I drew. But what’s noteworthy about
the red line I just drew? How would you describe it?>>AUDIENCE: It’s not direct.>>DAVID J. MALAN: It’s not direct. So contrary to the popular saying, “The
shortest distance between two points is a straight line,” it’s not
necessarily true on the internet when it comes to routing information. Because geographic distance
isn’t necessarily the only metric you care about. Rather, what else might govern what
direction the data should take in order to get from point A to point B? AUDIENCE: Speed? DAVID J. MALAN: Speed. So it turns out you might configure a
router to favor a faster connection. Even if you might have to go
a few hundred extra miles, maybe it’s just faster to go
this way than over, maybe, an old school satellite connection
this way just to get from one point to another. It doesn’t even have to be
physical devices on the ground. It can be physical devices in
the sky, for instance, or even underwater these days, or so forth.>>So that’s true. What else might dictate that a company,
an internet service provider, or ISP, want to send data this way instead of
that way, even though it’s farther? >>Well, it turns out the way the internet
itself is governed commercially is that there’s a lot of big
players out here on the internet, whether it’s Comcast, or Verizon, or
Level 3, or more arcane names that you might not have heard of but that
are fairly big infrastructure companies that compose the internet’s
backbone– the wiring, the routers, the cabling that you just
don’t really see or care about. Because it’s all in the
inside run commercially.>>Well, there are things
called peering points whereby a big ISP
might have some server, might have some routers and
some cables in a data center. And other ISPs might have the same. And other ISPs might have the same
all inside the same data center.>>And the intraconnect. It’s a peering point in so
far as they all connect. That’s where peers connect.>>And by nature of
financial arrangements, it might be the case that Comcast has
agreed to send as much of its data as it can this way instead of this way. Because, maybe, the
vendor over here is going to charge them more per gigabyte to
send their data over in that direction. So it might be financial decisions
that govern which direction things go.>>It might just be performance
implications, even more commonly. Routers get overloaded. If there’s a lot of
people get home at 5:00 PM and start getting on the internet, maybe
there’s congestion on the internet. And the algorithms, the
software running on routers, generally will say, if I
start to get overloaded, I should provide some feedback
to other routers near me so that they, hopefully,
go in another direction, much like you would avoid a traffic jam.>>So this is not all that unlikely of a
path that data might take from point A to point B. And in
fact, you can generally assume that your data is going to take
30 or fewer such hops from point A to point B. That is there might be as
many as 30 or so routers between you and point B.>>And we can, sometimes, see this. Let me see if the
network here cooperates. Otherwise, I’ll try a different example. Let me see if I can
do it on this network. And I can.>>So I have just run, let me
simplify my outputs slightly. I’m going to do not that. Here, OK.>>So I’m going to do the following
command called traceroute. So right now, I’m just on my Mac. I’m in an old school black and
white interface, nothing like DOS from yesteryear. But I just want to see
some textual output.>>And I, literally, here
at Harvard University want to trace the route
between me and www.cnn.com. So let’s see what happens
now when I hit Enter. A whole bunch of stuff starts
flashing up on the screen.>>And let’s see if we can’t
make some sense of this. So 1, 2, 3, 4, 5, 6, 7, and
it’s kind of hanging right now. We’ll see if it completes
this process or not. It turns out that each of the
lines of output, on the screen, represent something. And based on our leading
discussion thus far, what do each of these lines of output,
numbered 1 through 11 at the moment, represent?>>AUDIENCE: Different routers. DAVID J. MALAN: Different routers,
different dots on the screen. And so what this program,
traceroute, is doing is it’s literally tracing the
route between me and CNN.com. So in this case, step 1 is, apparently,
a router whose IP address is what?>>AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah, but
specifically, its IP address. Remember, its IP address is numeric. So to just make sure we’re
all on the same page, what’s the IP address of the first
router between me and Harvard? I mean, sorry, between me and CNN?>>AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Perfect. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Exactly. We’re just inferring
this from the reality that this first hop, so to
speak, just has that address. It doesn’t have a name for some reason. But that’s just because the humans
decided not to give it a name. And so be it.>>Step 2 is another router. But again, I said it was convention. It’s not required that
routers IPs end in .1. This one does not. The second router’s IP is this.>>Now, it looks like the humans
got a little more organized and have started naming
their routers with what look like URLs or portions of URLs. But they’re not. They’re just the names
that humans give to things.>>And it, apparently, is the case
that this router, not surprisingly, is owned by whom probably? It’s probably Harvard, right? Because the name of the
thing ends in harvard.edu. What is the name? coregw1, core just means
important, in the middle. gw is– I said it earlier.>>AUDIENCE: Gateway.>>DAVID J. MALAN: Gateway,
just a synonym for router. So this is the very important
core gateway number 1. I don’t know what te means. 3-5, don’t know. core, probably means the same thing.>>.net.harvard.edu, doesn’t
necessarily look clean. But it’s useful to some system
administrator somewhere at Harvard. Step 4, I’m inferring from convention. What do you think 4 represents? It’s still a router.>>What does bdr probably,
what does it sound like? Border. So this is probably a router that’s
physically on the border of Harvard and the rest of the world, so on
the edge of the campus somewhere.>>Step 5 is interesting. Step 5 still says harvard. But NoX tends to stand for
Northern Crossroads, which is a very popular peering point– as I
described earlier, a data center where lots of different people, Harvard
and other big ISPs, come together and interconnect their cabling
so that data can go out elsewhere on the internet.>>And now, things get a
little more interesting. I don’t know where this is just yet. Apparently, rtr, I’m
guessing, is router. Equinix in New York is
possibly the origin of that. But internet2 is a super fast internet
connectivity among universities, especially. So that seems to be what
we’re connected to there. For whatever reason, the
routers in steps 7, 8, and 9 are just not answering us. That’s probably because
of either misconfiguration or conscious configuration. Whoever runs those routers doesn’t
care to disclose information.>>But step 10 is interesting enough. Because I can guess from
this, with some probability, that my data, the data
leaving my laptop, by step 10– 10 steps later– has
entered what geography? New York.>>And how fast did it take my data,
from my laptop, to get to New York on its way to CNN would you guess? 28 milliseconds. And this tool not only traces the route. It also times things.>>And things can get congested. So the numbers could sometimes jump
up or down a little unexpectedly. But if you think, now, how long
it takes to get to New York from here, which is probably about
four or so hours by car or train, it’s much faster to send
yourself via electronically if it takes just 28 milliseconds
to get from here to there. Now unfortunately, the other
routers don’t seem to be disclosing. Let’s try another one. Just for kicks, let’s
try Amazon.com and see if the routers are a little more
cooperating, knowing that it could take a completely different path. So maybe we won’t hit
as much blockages there.>>It looks a little different here. I don’t think we saw aws sum1 net. And in fact, aws is Amazon Web Services. Harvard has a service called
Direct Connect with Amazon, where we pay a little
bit of money to Amazon to get faster connectivity
to Amazon’s network. So we use a lot of their
cloud services, some of which we might talk about a little later.>>Seems the routers here,
too, are being a little shy. So we don’t see all that much more. But let’s see if we can
glean a little something more by going a different
direction altogether.>>Let’s try our friends at Stanford.edu. See if we get any farther. No, still being a little private. Seems this same path is
hiding itself a little bit. So we’ll try one more if this
doesn’t yield juicy results. But you can kind of see those
IPs, I can make an inference here. What might you conclude, even if
you’re not a network engineer, is true based on the numbers you’re
seeing in step 7 through 9 and 12 through 15?>>What’s an educated guess here? What’s a true statement?>>AUDIENCE: Something around
the 205 [INAUDIBLE].>>DAVID J. MALAN: True, and I’m
looking at the numbers to the right. Where are these routers, even though
they don’t seem to have names?>>AUDIENCE: Somewhere further
away than [INAUDIBLE].>>DAVID J. MALAN: Yeah. And I don’t know where. But notice step 7 says 123 milliseconds. But just three hops prior,
it only took 3 milliseconds.>>AUDIENCE: So [INAUDIBLE] DAVID J. MALAN: Not here, yeah. So maybe it is middle of the country. Maybe it’s the West Coast already. I really don’t know,
completely guessing.>>But given that every other hop
thereafter also took more time, feels reasonable to
conclude that there’s just physical geography between us and them. And to be clear, each of
these numbers isn’t pairwise. It doesn’t mean each hop
takes 100 milliseconds.>>Each of these numbers represents from
point A to that intermediate hop. So in general, they should just
be incrementing ever so slightly. So the fact that all of these,
now, are roughly 100 milliseconds, feels like it’s got to be farther away. And I’ll try one last one.>>But I’m guessing we’re going
to see a bunch of stars. Let’s try the Japanese
version of CNN’s website. Oh, OK, now it’s getting juicy. Because apparently it really has
taken a different path through the US.>>Let’s take a look at, oh, this is great. This one finished. So this is powerful. In steps 1 through 4, what
town are we probably in?>>AUDIENCE: Cambridge. DAVID J. MALAN: Cambridge. And why do you say that? It’s all harvard.edu. In step 5, where might we be? Boston. In step 6, where might we be?>>AUDIENCE: Number 6.>>DAVID J. MALAN: And where is San Jose?>>AUDIENCE: It’s in California.>>DAVID J. MALAN: California? It’s probably the San Jose,
California, which is kind of amazing. Now, why do we say that? So one, San Jose– that’s
the only San Jose I know of. But I’m sure there are others. But corroborating that hunch
is what other piece of data?>>AUDIENCE: The geographical. DAVID J. MALAN: The
geographical path feels like that’s the direction
we probably are going to go to get to Japan over the Pacific Ocean. And what furthermore piece
of data corroborates that, yeah, we just took a
left turn to California? The time really jumps.>>Notice we go from 1.989 milliseconds,
in row 5, to 74 milliseconds in row 6, which suggests there’s
probably some big body of land. So there’s also some really expensive,
powerful cable, it would seem, going across the entire country leading
from Boston to San Jose in this case. Don’t know where step 7 is.>>But it gets really cool when we
look, now, at step 8 and 9 onward. Where are those routers? Probably Japan. So what is between step
7 and 8 most likely?>>AUDIENCE: London.>>DAVID J. MALAN: Yeah, so
there’s also trans-Pacific, transatlantic, transoceanic
cabling that really big ships just roll out and put on the bottom
of the ocean, that carries all of this internet connectivity. And that’s why our
network connection gets so much slower, relatively speaking. And I mentioned earlier,
generally, and well, this is something a web developer
might want to keep in mind.>>We won’t go into too
much detail tomorrow. But generally, a human will start
to notice delays on a web page if something takes 200 or
more milliseconds to load. I mean, that’s still super
fast– a fifth of a second. But this is one of the
metrics that a web developer should keep in mind when designing
a page, when he or she is creating graphics, or adding in
third-party software– advertisements, perhaps.>>You don’t want to slow
down the page load. You, ideally, want to keep
it as fast as possible. And if you start having page load
times of 200 plus milliseconds, the human’s going to notice
that it’s not truly instant. And so these numbers aren’t
all that unfamiliar to us.>>So this, then, captures a little more
quantitatively what’s going on here. And it truly is, even
though I’m sort of bemoaning how slow it is to get to Japan. I mean, it’s still
less than half a second to get your data halfway
around the world, whether that’s an email, a web page,
or anything else along these lines.>>All right, so how does this, then,
relate to where we were going earlier. We were talking about an IP address. And every computer, on the internet, has
a unique address, we’ll say for now– but a bit of a white lie–
called an IP address. And that IP address is used how?>>It’s used by these routers to decide
whether the data should go here, here, here, or here. And I simplified things by saying
it just looks at the first digit. But that’s not really true. It looks at more of the digits,
typically, to figure this out.>>And either humans have
decided or computer algorithms have decided what the best
route is for that data. So that, hopefully,
within 30 or so hops, it eventually gets to its destination. Once I’ve requested Amazon’s
home page, how does Amazon know to whom to send the home page?>>Right, in old school
form, I send a postcard to Amazon saying, please
send me your home page. Amazon’s going to respond with some
kind of message, some kind of postcard, some kind of envelope of its own. So let’s do exactly this just
to visualize this for a moment.>>So the internet these days,
as you may have heard, seems to be filled with
cats and pictures of cats. So suppose that someone’s trying to
visit not Amazon.com, but some website to download a picture of a cat. So my laptop wants to send a request,
via the web, to some websites saying, give me today’s picture of a cat.>>And this cat, hopefully, has to
then get downloaded to my computer. So what’s really happening? Well, let me go ahead and do this. I’ve got four old school envelopes here. And this is a useful metaphor. Because this is, essentially,
electronically what happens underneath the
hood when I send a message.>>So for the sake of discussion,
let’s say this is no longer Amazon. This is cats.com or something. And my IP address, I’m going to
say for simplicity, is 1.2.3.4. And the cat website will be 5.6.7.8.>>And what this means for
me is the following. I am going to put 1.2.3.4, 1.2.3.4. And I’ll hold these up in a second. 1.2.3.4. I’m going to put my return
address on all of these envelopes, in the top left-hand
corner as you typically would when mailing an envelope. And now, just take a guess what needs
to go in the main part of the envelope. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah, yeah. That’s all. So 5.6.7.8. So 5.6.7.8, 5.6.7.8, 5.6.7.8, 5.6.7.8.>>And now, this cat here,
by design, is going to be chomped up into multiple
pieces after I request it. So let’s say, for the
sake of this story, I’ve already sent out
an envelope of my own to cats.com saying, please
give me today’s cats. So what we’re talking about,
now, is the latter half of the transaction, when the reply comes
back from cats.com to little old me.>>So it turns out that the protocol,
that these computers speak, is generally something called TCP/IP,
which you probably have seen somewhere or other on your Mac, or
PC, or media, or on a movie, or a TV show, or the like. So what does this all mean? This is actually a
combination of two protocols.>>And a protocol is just a language
that two computers speak. In fact, a protocol in
the human world, hello. My name’s David.>>AUDIENCE: Hello.>>DAVID J. MALAN: Nice to meet you. So this is a fairly stupid human
protocol, where I extend my hand. And Arwa extends her hand. And we meet and greet. And then, the transaction is complete.>>But it’s a protocol in so
far as it’s a set of steps that it’s a script that both
of us know how to act out. And there’s a beginning. And there’s an end to it. Similarly, when it
comes to computers, they have protocols– sets of
conventions that, in fairness, have been decided by humans. But they’re used by computers that
dictate how computers intercommunicate.>>IP is the half of this pair of protocols
that governs how you address computers. How do you address computers? Exactly like this.>>So IP is a set of
conventions that says make sure you have an IP
address of the recipient and an IP address of the sender. And use it in dotted,
something dot something dot something dot something format. For instance, TCP is a different
protocol, used in conjunction with IP, that generally guarantees delivery. IP just tells computers
how to address each other.>>It’s just when I said
David, you said Arwa. That was our IP equivalent, our
steps for addressing each other. But to confirm delivery,
computers use a protocol called TCP, Transmission
Control Protocol, which is just a fancy way of saying there
are additional features used by computers to ensure that all of these
envelopes I keep holding up actually get to their destination.>>And one mechanism for
that is as follows. I seem to have how many
envelopes here at the moment?>>AUDIENCE: Four. DAVID J. MALAN: OK, four. So feels like, just to be a little tidy
about this all, I’m going to number them in the bottom left-hand
corner, like the memo field. And I’m just going to say 1, 2, 3, 4. But now, start thinking a
bit more like an engineer.>>Have I jotted down as much
information as I actually have? Can I be even more uptight
than this when it comes to specifying these numbers? What more could I put on the
envelope that just maybe is useful?>>AUDIENCE: [INAUDIBLE].>>DAVID J. MALAN: What’s that?>>AUDIENCE: The number of total
envelopes that you have.>>DAVID J. MALAN: Yeah, the total number. I feel like I’m not capturing as
much available information as I have. So, you know, I probably should do that. So 1 out of 4, 2 out of
4, 3 out of 4, 4 out of 4.>>And now, why is that? What’s the intuition behind also jotting
down the total number of envelopes I’m about to send? AUDIENCE: Find out if
something’s missing. DAVID J. MALAN: Exactly. So TCP leverages this. It uses something called a
sequence number, very similar in spirit to what we’re drawing here. But it needs to know how many packets,
or envelopes, there’re supposed to be. Because otherwise, how
do you know if when you get 1, 2, and 3 should
there have been a 4?>>You can infer if you get
1, 2, and 4, wait a minute. There probably was a number 3. And in fact, that’s
closer to how TCP works. But for our purposes now, let’s just be
super precise and say this is 1 of 4, 2 of 4, 3 of 4, 4 of 4 so that we
know at the end of the process, the end of the handshake if you will,
if the whole thing is actually complete.>>Now, it turns out TCP
does one other thing. TCP also allows a computer
to provide multiple services. And by services I mean web,
email, chats, voice over IP. There’s bunches of different things the
internet and servers on the internet can do these days.>>So for instance, just thinking
hypothetically, if I hand this to Arwa, how do you know what’s going to
be inside of these envelopes? Is it going to be a
request for a web page? Is it an email? Is it an instant message?>>You don’t know based
on this information. All you know is who it’s from, who
it’s to, and what number of envelope this is. So we need one more
piece of information. And we’re talking about
the web in this case, just because it’s pictures of cats. But it could be anything.>>So I could write web on it. Or more properly, I
could write HTTP, which is the protocol used by web
browsers and servers to communicate. More on that in a moment. But I’m going to be even more
computer-oriented than that.>>It turns out that humans,
some time ago, decided to assign unique numbers to
popular internet services. HTTP happens to use the number
80, or as we’ll see, 443. But 80 is fine for now.>>SMTP, which is a fancy way
of saying outbound email. This is Simple Mail Transfer Protocol. Just the set of conventions that
governs how computers send email from one computer to another. Happens to use the number 25.>>FTP, with which some of you might
be familiar, what does FTP do?>>AUDIENCE: File transfer. DAVID J. MALAN: Yeah, File Transfer
Protocol should not be used anymore. If your company still
uses it, you’re probably using it without encryption,
which means you’ve been sending your username and password
across the internet all of this time. Probably shouldn’t use it. Because secure versions exist. It uses port 21. And there’s bunches of
other examples like this. So in other words,
humans, some time ago, decided that, hey, let’s just
assign numbers to all these services to keep everything nice and tidy. But what that really means,
even though this envelope’s starting to look a little arcane,
I can now put on the end of it, for instance, colon 80. And I’m just going to
use a colon here just because that’s computer convention. I’m going to add a colon 80
to the end of the address just to arcanely capture the fact that
this is destined for 5.6.7.8 port 80. So now, when I hand it to Arwa, assuming
she is running an email server, a web server, an instant
message server, she now knows that upon seeing the number 80,
oh, this should go into this bucket. Or this should go into this mailbox. Or this should be handed
off to this service that’s running on her particular server.>>So now, the last piece
of it, this is the cat. And why do I have four envelopes? Well, one of the features offered
by IP, in addition to addressing, is also the ability
to fragment requests.>>This is a pretty big cat. And in fact, for efficiency and to
maximize throughput, so to speak, what fragmentation is good for
is taking big files like this and tearing them up into
smaller pieces for fragments, we’ll say in this case,
the upside of which is that just because one
person is monopolizing your network by downloading
really big video files, those video files are still going to
be chopped up into super small pieces and transmitted one or more at a time. So that little of me
with my cat, or my email, or my instant message, or something
more important than any of those things can also have an opportunity to go
out from your computer or your home to the rest of the internet.>>And it’s up to the
software and the routers to decide how to send these things out. But eventually, they will all
get to their destinations. As an aside, if you’ve ever thought
about the issue of, or read about, the issue of net neutrality? Net neutrality, this was in vogue
for quite some time, in this country, where politically it
became a hotbed issue. Because some companies, for instance,
wanted to prioritize certain traffic over others. For instance, people
were worried that maybe Microsoft with Skype, or Google with
Hangouts, or maybe Netflix with videos would, maybe, be willing
to pay Comcast, or Verizon, or who knows, even the government more
money to prioritize their traffic. Now, what does that actually
mean technologically? That might mean that an ISP,
upon seeing certain IP addresses, might give those packets,
those envelopes, priority. Upon seeing certain port numbers, might
give those packets priority and, then, slow down my e-mail, or
slow down my service. And it really just boils down to
prioritizing or quality of service for these various different services. So and that’s how it would
be done on a technical level.>>So in any case, we now
have these four envelopes. I’m going to put one quarter of
the cat in this envelope, one quarter of the cat in this envelope,
one quarter in this envelope. And now, suppose my goal is to
send these, let’s say, to Jeffery. Recall that just like the
picture up here suggests, they don’t all necessarily
have to take the same route.>>So if I am the cats.com server,
I’m responding to Jeffery’s request in this story. I’m going to pass one off here. They probably start
in the same location. So Arwa, if you want to decide
whom to route this to next, you can go ahead and send it that way. And don’t send it to the
same router every time.>>[CHUCKLING]>>So Dan’s getting a little congested. There you go. All right. And so those need to make
their way around the room. And again, you as a router
generally know Jeffery’s that way. So just keep sending it that way. And now, suppose Dan
didn’t quite make it. And so this packet got dropped along the
way, if I can steal that away from you forcefully, sorry. >>Very nice. It’s not necessarily the
most geographic direct route. Still trying to get to Jeffery. And complete. Now, this was deliberate. I didn’t mean to hit
your hand when I did it. But packet 4 of 4 did
get lost or dropped. And maybe that happened because
there was a hardware error. Maybe that’s because Dan got
overloaded or Andrew got overloaded. But it happened. So if, Jefferey, you’d
like to reassemble that. What picture do you have
in front of you right now? If you’d like to take the
messages out of the envelopes.>>AUDIENCE: 1, 2, 3.>>DAVID J. MALAN: OK, go ahead and open
them up and take the pieces of cat out.>>AUDIENCE: [INAUDIBLE].>>DAVID J. MALAN: All right, so
we have the top left of the cat, the bottom right, and the bottom left. So we’re missing the
top right of the cat. So TCP, again, is this
protocol that kicks in here. So Jeffery, upon receiving 1, and
2, and 3 of 4, in this scenario, somehow sends a message
back to me, via some route– could be any number of
different hops here– that says, hey, but wait a minute. Resend 4 out of 4.>>And so what I have to go and do
is– it’s all electronic data. So I can very easily copy the cat
inside of my own RAM or memory. I can come up with another
envelope, put another copy of just this fragment for efficiency. I don’t have to resend the whole cat. I can put it in a new
envelope, send it all around. And some number of milliseconds
later, Jeffrey, hopefully, has the entirety of the packet. So it took a little
time to tell this story. And that’s not unreasonable.>>Because there is a lot of
complexity going on here. These protocols aren’t simple. But if you want to guarantee
delivery in this way, you need to have those extra measures,
that extra metadata, if you will.>>And just to toss a term out
there, data that we care about is like the cat inside the envelope. Metadata, which is data that’s
useful but not what I actually care about at the end of
the day, is all the stuff that I wrote on the
outside of the envelope– the address, the destination, the
port number, the sequence numbers. All of that is metadata. It’s useful. But it’s not what I ultimately
want out of that whole transaction. Now, this seems pretty
compelling that no matter what, Jeffrey will get a copy
of that cat, assuming we have a physical connection
to him at the end of the day. But are there certain
types of applications where guaranteeing delivery
would be a bad design decision and an undesirable feature? Do you always want to retransmit
like I proposed just now? >>AUDIENCE: Pay for it, I guess.>>DAVID J. MALAN: If you
pay, what might you mean?>>AUDIENCE: [INAUDIBLE].>>DAVID J. MALAN: Oh, OK, good question. Might you get double charged
if it’s like checking out of Amazon or something? Short answer, no. Because in that these fragments
are, so to speak, at a lower level. And they need to be reassembled
before you could actually be charged. So good thought but not
worrisome in this case. >>Let’s reason backwards. So retransmitting required
a little more effort. That didn’t feel like a huge deal. But it does require a little more time.>>Because now, Jeffrey has to
wait few more milliseconds to get that fourth piece of data again. Minor blip, but it
will slow things down. And maybe the internet’s super crowded.>>And maybe Andrew keeps
dropping packets on the floor. So these delays start to accumulate. So after a while, this cat doesn’t
take 74 milliseconds to get there. It takes 1.5 seconds.>>And maybe the next picture of a cat
takes half a second, two seconds. In other words, we start
bogging things down. What applications might be
annoying to bog down in this way?>>AUDIENCE: Video streams or voice.>>DAVID J. MALAN: Yeah, so what if
you’re watching a baseball game online, or what if you’re Skyping
with someone, or FaceTime, especially in the case of video
conferencing, kind of not acceptable, at some point, to start hearing
your human response a second late. Wouldn’t it be better to just
leave that packet on the ground, only show 3/4 of the cat, or in
this case, a video conferencing, show 3/4 of my face with my
mouth moving as I’m talking, and just let the audio, at
least, go through, for instance. So there’s this notion
of quality of service here, more generally,
where you know what, for real-time applications– whether
it’s streaming a sporting event or streaming video conferencing–
maybe you don’t need all of the bits. And maybe it’s actually better
to just bite your tongue and just keep plowing forward with
more and more data, never looking back. Because the human will figure
it out in his or her own mind what they actually missed.>>And it would be more
annoying to buffer, buffer. Right? There’s this thing, with
which we’re all familiar, where I just start talking while being,
that’s just annoying to actually have that, to wait for me to catch up.>>Maybe it’s better if you just
miss a few seconds of what I say. But then, it comes back strong. So it’s again, it’s a trade-off. And in fact, the protocol that allows
you to do that would not be TCP, but something called UDP, which is
simply a different protocol used sometimes for those contexts. Yeah, question.>>AUDIENCE: [INAUDIBLE] certain
[INAUDIBLE] protocol slow [INAUDIBLE]?>>DAVID J. MALAN: To stop
slow in what sense?>>AUDIENCE: I want to send my
data as fast as possible.>>DAVID J. MALAN: OK.>>AUDIENCE: If somebody
doesn’t want [INAUDIBLE] transfer to stop [INAUDIBLE].>>DAVID J. MALAN: Oh, you absolutely
can interfere with any of this data. For instance, between all of
the hops, between point A and B, all of these hops here can decide
just to blacklist all UDP data. They could just stop. They could copy it knowing that
this is video data that they might want to look at. So in short, anyone with access to
the wireless or wired connectivity between two points could
absolutely stop it if they want.>>And in fact, even in
our home routers, which is the story we’ll
come back to now, might have settings where you can enable or
disable certain services whether it’s for parental reasons,
or just not wanting your kids to watch online videos,
or for corporate reasons as well. So in fact, let’s rein things back in.>>Because we’ve allowed
ourselves to look, now, at all of the servers
inside of the internet here. But if, at the end of the day,
I’m just trying to get to Amazon, what is that little home
router actually doing for me? Well, it turns out that the home router,
that we described earlier, that’s all draw disproportionately large here,
has a whole bunch of services built in.>>It has, typically, a
DHCP server built in. It often has an access point built in. And that’s often because it has these
antennas, like these things here. It often has a firewall built in.>>It often has a router, which is its
own distinct piece of functionality, built in. It might have something
called a DNS server built in, if not even other functions. So let’s tease apart just the
couple of remaining ones here. DHCP, just to recap, does what?>>AUDIENCE: Assigns the IP.>>DAVID J. MALAN: Exactly. Assigns IP address and few other things. It will also tell my Mac or
PC what my default router is and a few other details,
like we saw on my Mac screen. Access point just means, these
days, that it supports Wi-Fi. And it wirelessly will allow
people to connect, just like a physical cable from yesteryear.>>Firewall between two buildings
or two stores in a building, it’s a physical device
that, ideally, prevents fire from spreading from
one store to another. In the virtual world, it prevents data
from getting from one place to another. So in fact, if your
home network, or even your corporate or university
network, have somehow blacklisted, let’s say,
all access to Facebook.com, deeming it a waste of time, how
might your university, or home, or company do that in the
context of envelopes like these?>>In other words, if all of my computers
here– my laptop and any other– is somehow talking to the
internet through this home router, or this corporate router,
or this university router, what information would a firewall use
in order to stop traffic from flowing?>>AUDIENCE: [INAUDIBLE].>>DAVID J. MALAN: Yeah, so if
they know that Facebook’s web server, on the internet,
has the IP address 5.6.7.8, it is trivial for a system administrator
to configure a firewall, just deny and to drop all envelopes
destined for that IP address. In reality, Facebook has a few different
IPs, maybe dozens, maybe hundreds. But so long as those are
publicly known, an administrator can actually blacklist all of those.>>Or if that’s not possible, just because
Facebook, maybe, has too many IPs or they change too frequently,
well, it turns out, as we’ll see, any time you make a
request for a web page, like Facebook.com, instead of
there being a cat in the envelope, there’s going to be a mention. Oh, this user wants
Facebook.com/MarkZuckerberg.php or whatever the file may be.>>So you can just look inside the envelope
and see, oh, this is for Facebook. I’m going to drop it now. You can look inside of the
envelope as a firewall as well.>>So a firewall, in short,
can look at the IP address. It can look at the port number. It can look at the
inside of the envelope.>>And by port number, this
is an interesting one too. A firewall, therefore, could block,
it seems, all web access, if it wants, just by blacklisting any envelopes
that have the number 80 on them, or all email by blacklisting port 25,
or blocking FTP, by blocking port 21. And the list goes on and on.>>As an aside, do any of you use
Google’s DNS server– 8.8.8.8? Does this sound familiar? No?>>So turns out you can configure your
computer to use custom addresses. And we’ll come back to
this in just a moment. And it’s very common for corporate
networks and hotel networks to block that kind of
thing, as we’ll soon see.>>So the last bit of functionality,
then, here is a router and DNS. A router, again, very simple idea. It just routes data
left, right, up, and down based on the wires and the
connectivity that it has, whether it’s a small network at home
or a bigger one on the internet itself. So DNS is the last of
the big acronyms here.>>What does a DNS server do? It’s very useful functionality
often built into a home router. Well, we haven’t quite
connected two dots here. When I type out Amazon.com or cats.com
into my browser, somehow or other that ends up on an envelope,
maybe, with Amazon or cats.com on the inside of the envelope,
as I proposed with Facebook.>>But what has to go on the
outside, have we been saying? AUDIENCE: The IP address– DAVID J. MALAN: The IP address. AUDIENCE: [INAUDIBLE]
named to the IP address. DAVID J. MALAN: Exactly. A DNS server, Domain Name System
server, it’s sole purpose in life is to translate domain names
to IP addresses and vice versa. And so it, too, you can think of like
a big Excel file with two columns– domain names in one and
IP addresses in the other. But it’s a particularly big file.>>And it turns out that when I turn
on my AirPort Extreme, or my Linksys device, or my D-Link device,
or whatever you have at home, surely, that little device does
not know about, in advance, all possible IP addresses and all
possible domain names in the world. Because it can’t. Because what if someone buys a domain
name tomorrow, puts it on the internet?>>It’d be nice if your home
router could still access it. And surely, it can. So it turns out there’s a whole
hierarchy of DNS servers in the world.>>Your home router, typically, has one. But it just is a caching DNS server. And by cache I mean C-A-C-H-E, where
it just stores copies of information temporarily. But if I have internet service
through Comcast, or Verizon, or RCN, very popular vendors locally in
the US, or any other company, or even Harvard University,
Harvard, and Comcast, and Verizon, and your local ISP all
have their own DNS servers.>>And they, too, cache information. But there’s also some special big DNS
servers in the world, at least 13, so-called root servers that know where
all the dot coms are, and knows where all the dot nets are,
and all the dot orgs, and all of the dozens and dozens of
other top level domains these days. And so there’s this
whole hierarchical system to DNS such that if you don’t know
and your higher up doesn’t, hopefully, your higher up’s higher up knows. Because the buck
ultimately stops up here. And so, as we’ll see, when
you buy a domain name, you’re essentially informing
one of these top folks. And the information trickles down to
all other computers on the internet. But there’s a danger here.>>Suppose that Comcast is suddenly taken
over by someone who doesn’t, Comcast wants to put Facebook out of business. How does Comcast go about
putting Facebook out of business for quite a few people? What does it configure
its DNS server to do? What would you do?>>AUDIENCE: Just block it. Just block it.>>DAVID J. MALAN: Just block, right? So if I’m Comcast, and maybe
I’m the nontechnical CEO, I have just announced a decree, don’t
let our customers go to Facebook.com. Because for whatever
business reason, we’re not playing nicely with them right now.>>Well, what do you do? It’s a pretty trivial implementation. You just have to ask
some system administrator to tweak the DNS server to say, if
you receive requests for Facebook.com, don’t respond with an IP address, or
respond with a bogus one– 1.2.3.4, which is meaningless. Because it doesn’t belong to Facebook.>>And in fact, certain
countries have been known to do this, where if
they’ve wanted to blacklist certain sites– this sort of
great firewall of China, which can be implemented in
any number of ways– might do exactly this
just based on DNS alone. So if you tweak your user’s
DNS server to just respond no or bogus DNS or responses,
you can very easily block access.>>Now, as I alluded to
earlier, and this is only how a naive network would
do this, I can actually go in my Mac, click DNS, which
notice now is, hopefully, another familiar tab. Perhaps a bit ago, you only
knew what the term Wi-Fi meant. Now, hopefully, we know
a bit more about TCP/IP. Now, we have DNS.>>These, it seems, are the DNS servers
that Harvard has automatically assigned to my computer. When I said earlier that DHCP gives
me more than just an IP address, it gives my router’s address. Also gives me one or more DNS
servers that I’m supposed to use when here on Harvard’s network.>>I can actually override this
by clicking, oh, I can’t. Because I’m on the guest account. OK, so if I could actually
physically click this plus sign, I could type in any DNS server I want.>>A popular one to use is 8.8.8.8,
which Google bought some time ago. And if my Mac let me, I could
then tell my own Mac here, don’t use Harvard’s DNS servers. Use Google’s instead.>>So this is a common way of avoiding
either one system restrictions, like the ones we just described. If they’re poorly implemented, you
can just use a different DNS server. Very much in vogue on home
ISPs, and perhaps you too, if you’ve ever made a typo
when typing out a domain name, you should just get an error
message from your browser. That’s what they’re designed to do. 404 or, actually in this
case, something different, you could get an invalid response page. But some of you, do you ever see
advertisements if you make a typo and mistype a domain name? If so, it’s possible, and Comcast
has been known to do this. They, very obnoxiously, will
intercept incorrect DNS lookups.>>If you type Facebook.com
but make a typo, they’ll return an IP address
to you, not Facebook’s but one of Comcast’s advertising
servers’ IP addresses so that you, then, suddenly
see ads, and maybe suggested misspellings, and the like. So some people might use
Google to work around that. Sometimes it’s very common in
hotels, and airports, and the like where the DNS servers are just bad. Or they’re just broken. Or they’re dysfunctional.>>So very often, if I’m not
getting internet connectivity but my icon suggests I
should be on the network, I’ll manually change my
DNS server to Google’s just to see if it start working. And two times out of 10, that
seems to solve the problem. And the takeaway here is not so much
all these silly little work-arounds but why they actually work.>>You’re just telling your computer to
talk to some other device instead. So this home router, that you might
have paid 0 or more dollars for to put in your home, is doing all
of this functionality and even more all just in this tiny little box. But when we explode this
story to the whole internet, it tends to be dedicated
servers and computers doing each of those individual services. But our homes are just little
microcosms of the whole story. >>Any questions? Yeah. Yeah, Dan?>>AUDIENCE: Earlier, you talked about
the ports, the specific ports, but it’s specific services. So for instance, you said if I
don’t block a certain service, I say don’t log that port? Is it possible for a service to
be completed through the port?>>DAVID J. MALAN: Absolutely. Yes, in fact, you will
often find on a network that the only ports that are
allowed are, for instance, port 80 and 443– web traffic. This is very common
in hotels or airports where they presumptuously think,
eh, 90 plus percent of our users only need these services anyway. Let’s block everything else.>>And that leaves people like me out
cold, out to dry, hung out to dry. Because I can’t access certain servers
at Harvard, which use different ports. I could, preemptively
before leaving campus, change my special server
to use port 80 or 443. Even though humanity has decided
that should be for web traffic, it doesn’t have to be. I can send my email
through that or the like.>>AUDIENCE: So that was my
second question to it. So humanity decided. Is there a published list somewhere that
say these are best practices before? DAVID J. MALAN: Indeed. And in fact, if I go here,
common TCP port, here we go. On Wikipedia itself is the first hit. Here’s well-known ports.>>So the list, up to essentially
1,024, is very standardized, and even some beyond that. So there’s a lot of services that–>>AUDIENCE: So if you were
developing a service, in theory, you should go there and decide
what port lines for that service? DAVID J. MALAN: Correct. And if you’ve come up with some
new application, like Napster back in the day or like WhatsApp
more modernly, you would generally, if you’re a good designer, you would
take a look at a list like this and make sure you’re choosing
a number that is within a range that you should be
choosing from, essentially a big enough number that
no one else has chosen.>>AUDIENCE: That would be
about port designs, correct? DAVID J. MALAN: Correct, correct. And there’s a lot. I mean, a port number is
generally a 16-bit number, which gives you 65,536 possibilities. And only a few of those
are actually standardized.>>And the reality is there’s only so
many popular services these days. So there really isn’t
that much contention. So it’s not such a big deal.>>But from a clever undergraduate’s
perspective or dissident within a country, you might indeed,
if a country, or a corporate entity, or university is blocking certain
traffic, what’s very commonly done, by sophisticated enough people,
would be to tunnel, so to speak, to route all of their
traffic with envelopes that don’t say what they should say, but
instead just using 80 for everything. Even if it is FaceTime, or Skype, or
financial transactions, or whatever, you just make it look like
it’s actually web traffic. And better still is another
solution that Victoria alluded to earlier, which is a VPN.>>And quite often is VPN
traffic allowed on a network. In fact, I found myself commonly in
airports, and hotels, and on planes where I can’t access certain
secure servers at Harvard. Because they’re running on fairly
unusual port numbers– 555 or whatever the number might be.>>But if I first connect via VPN from
that airplane or hotel to Harvard University, what a VPN does is what? Do you know what it does for you
underneath the hood, Victoria?>>AUDIENCE: Well, it will presumably
change the server [INAUDIBLE]. DAVID J. MALAN: It does. It does. It makes it look, to someone else,
like you’re coming from another place. It looks like you’re coming
from your corporate headquarters when visiting some sites. And what it also does is it tunnels,
so to speak, all of your traffic, whether it is email, or web,
or printing, or the like all through this encrypted
channel between you and your corporate
headquarters, typically, so that no one– including the
local country, or airline, or cafe– knows what’s inside of
your encrypted tunnel. And so it looks like random noise. And so very often, a VPN
will work around those kinds of port restrictions, too,
if the VPN port itself is not blocked, which is sometimes the case. And Dacosta, you we’re about to say?>>AUDIENCE: What time
[INAUDIBLE] jump especially using the [INAUDIBLE] can jump group
of [INAUDIBLE] Is this cloud different? What [INAUDIBLE] to jump?
[INAUDIBLE] value [INAUDIBLE] DAVID J. MALAN: And by jump,
what do you mean exactly? AUDIENCE: That they
would block, [INAUDIBLE]. DAVID J. MALAN: Oh, and it’s
broken within a given country? AUDIENCE: Yes, it’s blocked. DAVID J. MALAN: Oh, blocked. So it can be implemented
in any number of ways. The simplest, again, would be that
the country and anyone in it, via DNS, they just don’t return the IP address
to you when you visit Facebook.com. Two, they can actually look
inside everyone’s envelopes and see if those requests
are headed to Facebook.com. In which case, they would similarly
block the traffic as well. AUDIENCE: You can block the [INAUDIBLE]. DAVID J. MALAN: Indeed. And it depends. I mean, so long as there are
relatively few internet connections coming into the country–
so dozens, or hundreds, not thousands or tens
of thousands– then yes, so long as they have control
over all wires, wireless, or otherwise coming into the country,
absolutely, they can block everything.>>So and worse yet, and
a very possible attack is if, for instance, we’re
all here on Harvard’s network. And therefore, your computers,
by the story we’ve been telling, are all using Harvard’s DHCP server. Some of you might have,
in a tab right now, Facebook.com open, or Gmail.com,
or some other random website. Do you necessarily know you’re
at the real Facebook.com?>>I mean, maybe you’re subjects in
a Harvard psychology experiment here, where we’re feeding you
fake Facebook information. Or we’re telling you you’ve been
poked by someone you haven’t been. Or we’re changing messages to sound
angrier than they actually are.>>I mean, really when you have
control over the network, you have control over quite a few
aspects of the user’s experience. Now, thankfully, it’s not
as frightening as that. Because most of you, in your
URL bars, of any such tabs, probably start with what? HTTPS, hopefully. Because the S does designate secure.>>And in theory, what that
means is that you do actually have an encrypted connection between
you and Facebook, you and Amazon, you and Gmail.com, or wherever you are. And that’s a good thing. Because there’s this
whole system of trust.>>And this is actually a good segue
to web traffic specifically. There’s this whole system of
trust, in the world, that allows us with some reassurance to trust
that if I go to Facebook.com, and I see a little padlock
icon in my browser, I am very, very, very likely
to be actually connected to the real Facebook.com. Now, why is that?>>So it turns out that when you put
a website on the world wide web, you need an IP address, it would seem. Your server needs an IP address. And you probably need a domain name. So what does that involve?>>Well, have any of you ever
bought a domain name before? Yes? Yeah? OK. And what websites have you used or
looked at for buying domain names?>>Any in particular come to mind? OK, GoDaddy is pretty popular. And there’s others– Namecheap,
Network Solutions, others.>>And so if I want to
go to something like, if I want to buy a domain like
ComputerScienceforBusinessLeaders.com– awful name because
it’s atrocious to type. It doesn’t even fit on
one line, apparently. For $11.99, I can buy that domain name.>>Now, what does that mean? If I click Select and put this into my
Shopping Cart, let me first caution. GoDaddy is atrocious about
trying to upsell you. So you will be asked if
you want email, if you want web hosting, if you want a
phone call for all this stuff. It’s hard to check out at GoDaddy.>>But when you finally get there,
you will own that domain name for a period of one year,
typically, or two, or three years. You have to renew these things. So it’s more like renting a domain name.>>But once you own that
domain name, you need to tell GoDaddy something, typically. You need to tell GoDaddy what your
web servers, DNS servers shall be. How do you know what your servers,
DNS servers are going to be?>>Well, typically, in
another tab, you have to buy, or pay, for web
hosting if you don’t actually physically own your own servers, and
your own company, or in your own data center. So you’d go to a web hosting company. And it could be GoDaddy. They offer the same service
as one of their upsells.>>But there’s hundreds,
thousands of web hosting companies of varying quality out there. And when you pay someone
else for web hosting, you get a username, and a
password, and some amount of space in the cloud, so to speak, to
which you can upload your files, and create your web pages,
and put your website online. So essentially, you have
to tell GoDaddy what the DNS servers are that that web
hosting company has provided to you. Probably in a e-mail or in
a web page, they inform you.>>And then GoDaddy’s responsibility
is to tell the rest of the world by way of those root servers
and other DNS servers. So that, the next day,
when someone tries to visit
ComputerScienceforBusinessLeaders.com, their DNS server probably
doesn’t know the answer. Because it’s a brand new website. So their DNS server asks
this one, asks this one. This one knows. And then, the information propagates
back down to the rest of the world. So this is how to if you don’t pay the
bill for renewing your domain name. All of this can just kind of stop.>>Because GoDaddy, for instance,
can delete those DNS records so that no one in the world knows
whom to ask where is your website. What is your IP address? And so that’s how they
enforce this kind of control.>>But what GoDaddy also sells, I want to
see here if we can chat with them here. They want our business. If we go to All Products,
this is overwhelming.>>I want to buy SSL. Here we go, Web Security. So, oh, it’s on sale. Nice.>>OK. So here, too, this is kind of
overwhelming at first glance for folks. So there’s different types of SSL
certificates as they’re called. So it’s not just enough to have a domain
name or have a web hosting account. If you want to have encryption, which,
frankly, is just a given nowadays. And this is becoming de facto practice.>>You should also buy an SSL certificate. Unfortunately, it can be
hard to navigate all of this. But let’s see where this leads
to this sort of system of trust. So if I just have one domain
name, www.ComputerSciencef orBusinessLeaders.com, I’m going
to go ahead and just buy the $62.99 version here. However, even this is expensive. You can go on other websites, like
Namecheap.com and a few others, where varying degrees of reputation. But you can spend even less than this. Beware.>>And in fact, let’s go somewhere
we shouldn’t– Verisign.com. This is a global leader in domain
names and internet security apparently. And you know it’s expensive when
they don’t just say what they sell. Verisign SSL certificate, you can
see how many competitors they have, who are advertising for that same query.>>All right, so via Google,
I found this page I wanted. So let’s see. Oh, here we go.>>So it looks like if
I want a Secure Site, their SSL certificates start at $399. If I want more security, with EV,
which I think is extended validation or enhanced validation,
that’s $995, point 00. Or Secure Site Pro with EV, $1,500. Almost all of this is atrocious
and, also, unnecessary.>>But let’s understand what the tradeoffs
here are and how it all works. At the end of the day, the math
and the fundamental cryptography underlying your website security is
all the same, for the most parts. All of this is upsells and,
largely, marketing things.>>Oh, and please, don’t ever put
something like this on your website, even if the consultant
proposes that you do. It means absolutely nothing. You’ll see, later today or
tomorrow, it is absolutely trivial to add an image to a website and
simply saying you are Norton secured means absolutely nothing.>>And all you’re doing is
training your customers, or humanity more generally,
to look for that symbol, which surely a bad guy could put on his or
her own website and just claim they, too, are Norton secured. So we’ve gotten into some bad habits,
as humans, as embodied even right here. So just as an aside, the reason there
are different styles of certificates, they keep wanting to talk to us.>>You can buy a SSL certificate
for just one domain name, dub dub dub dot
ComputerScienceforBusinessLeaders.com. Multiple websites, suppose
I had dub dub dub dot ComputerScienceforBusinessLeaders.com. But I also wanted users
to be able to visit ComputerScienceforBusinessLeaders.com
without the www. Or, maybe, I have a third
domain, like email.ComputerScienc eforBusinessLeaders.com. So if I have multiple domain
names, they actually each need a different type of
certificate, potentially. So I might as well get this
version, which allows exactly that.>>Or all subdomains, if you just want to
have, and this is for fancier setups, if you want to have 10 or 20
different websites or servers that start with something, dot
ComputerScienceforBusinessLeaders.com, then you get what’s called
a wildcard certificate. And it supports all of those variations.>>Now, once you buy this, you install. It’s a file that you download. And that file,
essentially, just contains a really big, random number that
has some mathematical relationship to some other number that
you’ve already generated. We’ll call it a public key and a
private key, as I did just before.>>And the idea here is that you
install into your web server by just using FTP or
some other protocol, dragging and dropping
or copying and pasting these really big numbers
into your own web server. And you follow the instructions
consistent with your server software to do this. And your web server,
henceforth, any time someone visits your business’ website–
www.ComputerScienceBusinessLeaders.com– your web server
automatically, because this is built-in functionality
these days, will just tell the world what its public key is. And remember that the public key
has this mathematical relationship with a so-called private key. And so when users, customers
talk securely to your server, their envelopes, like the ones
we’ve been passing around, have seeming nonsense inside of them. Because the contents are encrypted.>>And only your business’
private key, which you generated as part of this
process of buying an SSL certificate, can actually decrypt. And all of that happens transparently. But you can only buy these
certificates from a finite number of companies in the world.>>Because Microsoft, who makes IE and
Edge, and Google, who makes Chrome, and Mozilla, who makes Firefox,
and a few other players have all decided to ship their browsers. When you install any of those browsers–
IE, Edge, Firefox, Mozilla, Opera, or any others, Chrome– they come
with a finite number of certificates, so to speak, built into them. A finite list of, let’s call them,
companies whose SSL certificates should be allowed and considered secure.>>So this means that I, David Malan,
can’t just go on DavidMalan.com and start selling SSL certificates. Because if I don’t have
some kind of relationship with Google, and Microsoft, and
Mozilla, or contractors of theirs, no one’s browsers will trust
David Malan’s certificates, even if I sell them at a
discount versus everyone else. I can make them mathematically. But I can’t trick the
browsers into trusting them.>>And what do I mean by trust? Well, notice. We are on GoDaddy.com. And as is the case with many websites,
notice the padlock up at top right. What is that padlock presumably
indicate, either prior to today’s discussion or as of now?>>AUDIENCE: It’s secure.>>DAVID J. MALAN: That it’s secure. That just means that I am using
some kind of cryptography, encryption between me and GoDaddy.com. And it doesn’t have to be a GoDaddy. Let’s go somewhere else. Let’s go to Facebook.com. And notice I end up at
HTTPS colon slash slash. So even if you don’t type HTTPS,
increasingly, our websites today redirecting you to the
secure version of the website. This was often true when you typed in
your passwords for quite some time. But then, you would often get the
insecure version of the website after you logged in or after you checked
out with your shopping cart and credit card.>>Nowadays, increasingly, are websites–
because it’s getting easier and cheaper to use this kind of encryption, and
it’s becoming expected– are just using it for absolutely every web page. And this is a good thing. Because this means,
for instance, when you go to Google, who also has
started enabling SSL by default, this means when you search
for something on Google, it’s absolutely true that
Google knows everything you’re searching for on the
internet, for all time unless you delete your history. And even then, hopefully,
it actually deletes.>>But no one in between you
and Google, in theory, knows what you’re searching for. So if you’re searching for something
private, or medical, or whatnot, so long as that bar is green, and you
see the padlock, and the URL is HTTPS, and you’re connected to Google,
hopefully, your employer can’t see what you’re doing. Your university can’t
see what you’re doing.>>Now, if someone looks over your
shoulder, they might still. And if it ends up in your browser’s
history, people might still know. But at least that tunnel between you
and Google, in this case, is secure. And we can see this a little more. And you can do this at home, too. If I click on the padlock,
on Chrome at least, there’s a bunch of
technical information here. If I click Connection, notice that,
“Chrome verified that the Digi/Cert SHA2 High Assurance Server
CA,” certificate authority, “Issued this website’s certificate.”>>Let’s click on the
Certificate Information. And we can see that Facebook, someone
at Facebook bought this certificate. And notice the star. That’s the wildcard that
I alluded to earlier, the something dot Facebook.com. Notice that their
certificate expires when?>>December, so Facebook better pay the
SSL bill over the next few months. And they’re going to have to install
new certificates on their servers. And if I really want to get
curious, I can click on Details. And this is going to be
more arcane than I want.>>But you can see that
this is, apparently, bought by Facebook, Inc. in Menlo Park. This is some technical information,
where they bought it from. SHA-256 refers to something
similar to encryption. It’s called hash. RSA is the encryption
if you’ve heard of RSA.>>And then, there’s even
more fancy stuff in here. Elliptic Curve Public, this
refers to a type of cryptography. Most of this is way more
information than you actually need. But you can see that this is
the technical detail underlying Facebook certificate.>>Now, unfortunately, just to
speak to social engineering, this now is a pretty useful
indicator of the fact that someone, one, has a
secure connection and, in turn, that the server you visited
paid for that certificate. But it wasn’t that long ago that
websites could have default icons. In fact, do you notice these
icons in Chrome’s tabs right now? And browsers have kind
of learned their lesson and put these icons up there,
the logo for a website?>>It wasn’t that long ago
that these fav icons, or favorite icons as they’re called,
were right there next to the address. In fact, I did a search
during our break. For instance, not that long
ago, let me open this one. Just on Google Images.>>Let me zoom out. Come on. So not that long ago,
browsers were doing this. Not only did they put the
favorite icon up here in the tab, they also put it right
next to the address bar. Why? Just, eh, it looked good.>>It was kind of nice. You see the company’s logo
right next to its URL. So now, think from the perspective
of an adversary, a bad guy. If you were a bad guy and
the browsers were dumb enough to allow you to put a custom icon
right next to the browsers URL, what icon would you choose
for your fake website that’s trying to fish for people’s
credit card information and such? AUDIENCE: The original website. DAVID J. MALAN: The
original website, certainly, if you’re mimicking one websites. What else might you put there
that’s even more deceitful? A padlock icon, which looks like a
padlock and semantically suggests this site is secure, but has no
technical meaning whatsoever, and which is to say you’re
conditioning people.>>We, as a society, are conditioning
people when you see padlock, assume site is secure. And that same logic can
be completely reversed and manipulated so that
people, now, are tricked into thinking something’s secure. And the worst offenders,
frankly, are people like banks, who idiotically, to this
day– let’s see if Bank of America, a popular local one or national
one, is doing the same.>>OK. So what is this? What do you see here. This is the log in
form for their website. They’ve done the exact same thing. You’re training humans
to think when you see a button on a website
with a padlock that that means the connection is secure.>>That means only that there
is a graphic designer who knows how to make a picture of a
padlock and put it on a website. Now, in this case, it is true,
that the website is secure. Because notice the
green padlock up here. And I’m using a new
enough version of Chrome that I can’t just put an
arbitrary logo next to the URL. Now, only the secure
icon goes there or not.>>But this is absolutely meaningless here. And we humans continue to
make these kinds of mistakes. Because we condition people
to look for certain cues and infer meaning from them. But again, that same
meaning can be abused.>>So when building one’s
own corporate website, these signals are generally a bad thing. And even in emails, too,
we have, as a society, conditioned people to
click links on emails. And so it’s not surprising that bad
guys send out fake emails from PayPal, from Bank of America with links. Because we’ve trained people
to click links in email.>>A far better practice would
be for Bank of America, when emailing its customers, say only,
please visit Bank of America’s website at your earliest convenience. And don’t give people the URL. Because otherwise, they’re
just going to click it. Let it go. Let them search for it or,
actually, go to it manually.>>All right, so a bit
of a sidetrack there. But the goal here was to paint the
picture of this system of trust. With browsers, there are
these things in the world called certificate authorities–
companies, a finite number of them, that are allowed to
issue SSL certificates. Or, in turn, they are allowed to
validate other third-party contractors to issue SSL certificates. If you’re not on that list,
though, you can mathematically create these big, random numbers
that work for cryptography.>>But the browser is, generally,
going to yell at you. In fact, can I go to a website? Let me see. This site is not secure. If we just look for a Google image
here, you might see screens like this. Browser manufacturers
keep changing them.>>This is typically what you would see. You see a red line in the URL,
where HTTPS is crossed out. Because it’s trying to be secure. But something’s going on. And here it says, “This is probably
not the site you’re looking for!”>>And this is either malicious, or
it’s because of a misconfiguration. Someone’s using the wrong SSL
certificate on the server for the site that the user is
actually trying to visit. Any questions?>>Well, let’s take, before we break
for lunch, one last look at what can be inside of these envelopes. I’m going to go into a
clean browser tab here. And this is a feature. If you use Chrome, or
most any other browser, you actually have this feature.>>I’m going to go to the Menu. I’m going to go to More
Tools and Developer Tools. Though you sometimes have
to enable this special menu. And we’ll see more of
this in a little bit.>>And I’m going to go down
here to the bottom left. And I’m going to click on Network. So this is just something
an engineer would use when he or she wants to look
underneath the hood at what’s going on between a browser and a server.>>And let’s go ahead and do this. I’m going to go to, click Preserve Log. In other words, I wanted to
save everything that’s going on, what we’re about to do. And I’m going to type in HTTP
colon slash slash www.Stanford.edu for Stanford University. I’m going to clear again
just so we can start fresh.>>And here we go. So here is Stanford’s
home page– whole bunch of text, whole bunch of pictures, maybe
some videos, and some other stuff. And this web page– here,
I’m going to reload now. Because I broke it by heading back.>>This web page is written
in a language called HTML that we’ll take a brief look at later. And HTML is not a programming language. It’s what’s called a markup language. So we’ll see it’s just
English-like syntax that tells the web page what to look like,
what colors to use, what text to use, and the like.>>But juicier is in this
special Developer tab, I can actually see everything that
just went on underneath the hood. For instance, in this web page,
about how many images are there? I see 1, 2,3, 4, 5, 6, 7,
8, 9, 10, on the right, 11. So there’s a dozen or more
images on this web page.>>Each of those images is a
file on Stanford’s web server. And this home page, written
in this language called HTML, is also a file on Stanford’s web server. So it turns out that a browser
is smart enough to know, and we’ll see this afternoon, when you
receive the home page for a website, look at that HTML language,
as we’ll soon see.>>And if you notice the names of images
inside of it, go get those as well. Send additional requests,
additional envelopes. So we might have gotten back, now,
one, maybe 13 or more envelopes containing text, and images, maybe
some other stuff that we, then, assemble inside of my browser
to present this entire web page.>>And notice down here
the very first of those was a request just for HTTP colon
slash slash www.Stanford.edu itself. And if I click on this row, I’m going
to see some pretty arcane information. But let me scroll down and
see if I can understand exactly what’s going on here.>>Let me make this a little bigger
so we can see more at a time. And notice this. If I click on View Source, this
text here, that I just highlighted, when I send, my browser sends that
first envelope from here in Cambridge to Stanford, saying give me your home
page, what is inside this envelope is exactly what I’ve highlighted there.>>HTTP, Hypertext Transfer Protocol,
is the set of conventions that a web browser uses when
requesting web pages of a server. So just as I reached out
with my hand to Arwa earlier, this is the digital equivalent of
my browser reaching out digitally to Stanford’s web server, putting
this message inside this envelope. The most important
line is the very first.>>GET is a standard verb,
used in this convention, that literally just
means get the following. Get slash. Slash is just the default home page. It’s nothing more specific than that. And use the version
of HTTP known as 1.1. It’s got some newer
features than 1.0 had.>>And the second most important
line is this one– Host colon dub dub dub dot Stanford.edu. When I mentioned earlier that a firewall
could look inside of an envelope and figure out what website is being
requested– maybe it’s Facebook. And we want to blacklist it.>>The reason is the browser is very kindly
telling us, inside the envelope, what it is requesting. And then, there’s some less interesting
stuff that’s more technical. But slightly interesting, if
not a little unnerving at first, is that also inside this envelope
is apparently what information?>>AUDIENCE: [INAUDIBLE]. DAVID J. MALAN: Yeah, what
kind of computer I have. So I have a Mac. It’s running Mac OS 10.11.2, it seems. And if I read farther
down, it tells the server that I’m using a certain
version of Chrome, in fact.>>So that’s mildly disconcerting. But slightly more disconcerting
should be the fact that I already told Stanford what my IP address is. So they can already figure out, perhaps,
a little bit more about me from that. And then, there’s some
other stuff there too. Now, let me scroll up slightly. Here is what Stanford responded with. Inside of this envelope
was, first and foremost, the web page itself, the HTML that
we’ll see later this afternoon. But also inside Stanford’s envelope to
me is everything I’ve highlighted here.>>The juiciest of lines of
which is the top, which says, OK, yep, I speak HTTP 1.1. 200 is my status code, OK. Now, you might not have ever seen the
number 200 before, which makes sense. Because 200, indeed,
means OK, all is well.>>But you probably have seen a
number, on your web browser, that was sent to you from some server
inside of an envelope that’s not the number 200. What numbers have you
seen that spring to mind?>>AUDIENCE:404.>>DAVID J. MALAN: 404. So if you’ve ever wondered where is
this 404 convention coming from, of all the arcane things to tell
me, 404 file not found, that simply means that a web server,
if you request this page that doesn’t exist, it’s not there, files
not found, this message in blue is going to say HTTP
1.1 space 404 not found. And your browser notices
that and, then, presents it to you, maybe in a bigger
font, bigger, bold information with some explanatory text. But that’s all.>>And then, the rest of the information
is more arcane information, from the server to you, just telling
your browser where it came from. Every single request you
make over the internet contains information like this. This is both useful
for technical reasons.>>It’s also useful for
login reasons, to know who’s visiting your website,
what browser they’re using, maybe what browser you
should be optimizing your website for if everyone’s
using Chrome these days. Maybe you don’t need to support
Internet Explorer anymore. How do you know that? You can just log all of the information
that’s coming in these requests.>>Conversely, this clearly
means that every time you visit any website on the internet,
not only do they know your IP address, because you gave it to them in the
top left corner of the envelope, they also know what’s your browser
is, what day of time it is, what pages you’re requesting.>>And increasingly, especially on
websites that have advertisements, more worrisome here is
if you’ve got a company, and this is super common these
days, that has sells advertisements for this website, let’s
call it A.com, and also on this website, B.com,
and this website, C.com, A and B and C.com might not know
that they have a customer in common.>>But if this third-party
advertising company is seeing requests from the same IP
address visiting both A.com, B.com, and C.com, why? Because the advertising server’s being
asked to serve up ads to all three of these websites. And therefore, it will be
provided with your IP address so that your web page,
your browser sees the ad.>>There are these middlemen, so
to speak, on the internet that know even more about you than
the websites you’re visiting. And Google is certainly among the
biggest offenders, or featurerers, along those lines. And in fact, when I
mention their DNS server, before you might think at first
glance, oh, this is a handy feature. Google provides the world
with a free DNS server that sometimes helps me solve problems. Mm-mm. Now, you’re telling Google not only
every page you’re searching for, but every page you’re going to directly. Because you’re saying, hey,
Google, I want to go to Z.com. What’s its IP address?>>And this all boils down to these
very simple requests and responses that we’ve now seen from top to bottom. So why don’t we pause here for an hour. Return at 1:30 for lunch. I’m going to disappear for a bit. And we’ll resume with a hands-on
look and some more concepts. And happy to stick around, for a few
minutes, with questions individually.

Tags: , , , , , , , , , , ,

19 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *