Understand the Flow of a HTTP Request

What happens when you enter an URL into your browser?

Aakash Yadav
Better Programming

--

Ever wondered what happens when you hit enter in your browser ? Well, this article answers that from a developer’s perspective, including all the things a dev should know about what goes in fulfilling an HTTP request. Though it’s just a protocol of transferring documents, it’s essentially the backbone of the internet.

It starts either manually — when you enter an URL in the address bar of your browser — or programatically — by apps, websites (JavaScript), or other programs — and ends when response is received, and between that the magic happens.

This is how we typically understand an HTTP request (an oversimplified representation).

Let’s say you open app.mydomain.com/me on your system. The browser will automatically assume http:// at the beginning of your URL.

Let’s break down the URL http://app.mydomain.com/me:

  • http://: Protocol used for communication
  • mydomain.com: Domain of the server
  • app: Subdomain of the server
  • /me: path

HTTP protocol has standard predefined rules that are shared universally, which enables meaningful communication, just like how English has standard grammar that enables us to communicate.

So Once You Hit Enter, How Does Our Request Reach the Server?

The internet is simply a global network of cables that allows each one connected to a network to communicate with any other device connected to it.

This is how the world is connected through submarine cables that carry all of the data around the globe. (This image only shows submarine cables. Land networks aren’t shown but are there).

Submarine cable network across the globe

If this new for you, don’t worry — you’re not alone. For a long time, I thought the internet was routed through satellites (yeah, I know that’s silly!).

Let’s say you’re in London, and you want to connect to a server in the U.S. One of these cables will carry your request from London to the U.S.

But there’s a catch. This routing of packets/data uses IP addresses and not domain name, just like your mail arrives with your street name and house address and not just your name.

So How Do We Find the IP address of app.mydomain.com?

That’s where Domain Name System (DNS) comes into play. It’s the address book of internet. You connect to a DNS server and ask for app.mydomain.com’s IP address, and the DNS server returns you the IP address.

How does your client knows the address of the DNS server?

Since it’s essential for one device to know the IP address of another one to be able to communicate, the client has to know the IP address of the DNS server in advance. When you connect to a network, the client is assigned an IP address. It’s also told the DNS server’s address. Also, you can manually configure your device to use a specific DNS server.

How does a DNS server actually resolve a domain to an IP address?

The client sends a request to a DNS recursor server that queries multiple NameServers until it resolves the query:

Whenever you buy a domain, you configure a NameServer that’s responsible for keeping all of your DNS records.

Let’s say you’ve configured your domain, mydomain.com, NameServer as Cloudflare (ns1.cloudflare.com). So whatever query comes for your domain, ns1.cloudflare.com, will have the authority to answer that. That’s why this NameServer is called an authoritative NameServer.

So how does your DNS recursor reach your authoritative NameServer?

  • It queries the root NameServer to get the location of all TLD NameServers, in our case (.com).
  • The TLD NameServer stores the locations for the authoritative NameServers for all domains of that TLD (e.g., the .com TLD NameServer will store the location of all authoritative NameServes for all .com domains).
  • The authoritative NameServer then returns based on the record you’ve configured. For example, I’ve configured DNS records like this for mydomain.com:

Since we asked for app.mydomain.com, it’ll look for a record with the name app. So it’ll return an IP address based on that.

Now the client can connect to the server, as it obtained the IP address of app.mydomain.com.

How Does the Server Respond to the HTTP Request?

We run a web server that’s responsible for handling these incoming HTTP requests. The most commonly use web servers are Apache and nginx .These web servers keep listening for incoming requests and respond to those requests.

Wait!

There must be multiple processes running on a server, so how does our request reach Apache?

Just like any device, there might be multiple processes — that’s where network ports are helpful. So processes like Apache occupy network ports and listen to those ports. A port can’t be used by multiple processes, though a process can use multiple ports.

So we also have to specify the port for communication. But we didn’t provide the port, did we?

As I said, HTTP protocol has some predefined rules shared universally. One of those is the default port for HTTP is 80, HTTPS is 443, and, similarly, DNS has 53. So our Apache must be listening on port 80,443 of our server.

Just because we have a default port doesn’t mean we can’t change it. We can run Apache on, let’s say, port 3000. But since we’re using HTTP protocol and using the nonstandard HTTP port, we have to explicitly specify the port like http://app.mydomain.com:3000/me.

How Do Apache/nginx Serve the HTTP Request?

These are the two most common use cases for web server applications:

  • Storing the server’s static files from folders based on the path
  • Handing over the request to some other process like the node server and let that process handle and provide the response

Static files

Apache VirtualHost configuration file for static site

You requested app.mydomain.com/me, which will look for a folder/file in /var/www/my-static-website, and if a file exists, it’ll return that file. Or if a folder exists, it’ll look for index.html (this can also be customized) in that folder and return that.

Handing over that request to some other process

Let’s say you have Node, Java, Python, or some other server running on port 4000, and you want that process to handle the request and generate a response.

In this case, Apache can act as a proxy for that server.

Apache VirtualHost for a proxy

It’ll forward any request coming for app.mydomain.com to http://127.0.0.1:4000, and once it receives a response from that, it returns it back to the client. It doesn’t need to be limited to the localhost (127.0.0.1) but can also act as a proxy for some other server as well.

Why Not Run Our Node/Java/Python Server Directly on Port 80 and Avoid the Overhead of Apache?

A typical scenario is you might have more than one website running on a single server. Since not more than one process can occupy a single port, Apache takes requests, and based on ServerName, does the needful.

But if you’re hosting a single server that has standalone server like Node, you can get rid of Apache and run it directly on port 80.

So once the response is determined, it’s sent back to the client.

Well That’s HTTP, but What’s HTTPS?

HTTPS is an upgrade from HTTP. Everything in HTTP is transferred as plaintext. We know that all this data is transferred through those network of optical cables that are owned by a variety of different companies and are passed through various political jurisdictions. Anyone can simply tap into this network and read all the messages between the client and the server.

Imagine entering password for your Gmail account, and everyone from you to the Gmail server is able to read the email and password. That’s horrific, right? That’s the internet without HTTPS. HTTPS offers end-to-end encryption between the client and the server so no one else can read those messages.

The wide-scale modern internet wouldn’t have been possible without the security offered by HTTPS.

Conclusion

All these things work to serve up a simple HTTP request.

--

--