What happens when you type a URL on the browser and press Enter

Mohammad Humayun Khan
7 min readNov 20, 2021
Photo by Ragnar Vorel on Unsplash

It is one of the most famous questions that is asked a lot in technical interviews. In this post, we’ll see from start to finish, what actually happens, from typing a legit URL and pressing enter to get the correct response from the browser.

At the point of typing the URL in the address bar, you may notice that your browser may give you suggestions. These are based upon the data collected from your browser’s search history, bookmarks, cookies, and even from the most popular searches over the internet.

Assuming you know the actual URL, if not, for example, you can type in the domain, something like a medium.com. Browsers generally can know whether you typed in an actual URL or an incomplete URL (like the one we did) or just an item to search (using the default web search engine) if we entered an invalid protocol or domain.

The first thing that the browser does on seeing the domain name, is to construct the actual URL. The browser checks whether the website that we want to access is in the preloaded HSTS (HTTP Strict Transport Security) list. This list comprises websites that wish to be contacted via HTTPS instead of HTTP. In our case, our website is on this list, so the request is sent via the HTTPS protocol. An https will be prepended to the start of the URL and the actual URL will look like https://medium.com

HTTPS (Hypertext Transfer Protocol Secure) is a secure version of the HTTP protocol that uses the SSL/TLS protocol for encryption and authentication. Read more about this here.

After sending the request, the browser starts the DNS lookup. It first checks if the domain is in the cache by checking the DNS local cache to see if the domain is resolved recently (you can clear and view the Google Chrome DNS cache by navigating to chrome://net-internals/#dns in the browser). If it's not found here, it calls the gethostbyname library function to do the lookup (this generally varies by OS).

gethostbyname checks if the hostname can be resolved by reference in the local hosts file (whose location varies by OS) to see if the system provides the information locally before trying to resolve the hostname through DNS. If gethostbyname does not have it cached nor can it find it in the hosts file then it requests the DNS server configured in the network stack. This is generally, the local router or the ISP's caching DNS server.

The address of the DNS server is stored in the system preferences. The most popular DNS servers with their primary DNS are Google (8.8.8.8), Quad9 (9.9.9.9), and Cloudflare (1.1.1.1).

DNS servers translate the friendly domain name you enter into a browser (like medium.com) into the public IP address that’s needed by your browser to actually communicate with that website.

If the DNS server is on the same subnet, the network library follows ARP process for the DNS server. If the DNS server is on a different subnet, the network library follows ARP process for the default gateway IP.

The browser performs the DNS request using the UDP protocol. The DNS server may have the domain IP cached locally. If not, it will ask the root DNS server which is a system comprising of 13 actual servers (indicated by the letters A through M), overseen by a nonprofit called the Internet Corporation for Assigned Names and Numbers (ICANN) which also manages all the domain names on the internet. These servers are distributed across the planet driving the entire internet. Quite a huge thing, isn’t it?

The DNS server does not know the address of every domain name on the planet. It just knows the location of the top-level DNS resolvers. A top-level domain is the domain extension: .com, .in, .eu and so on.

Once the root DNS server receives the request, it forwards the request to that top-level domain (TLD) DNS server. The root DNS server essentially extracts the Top Level Domain (TLD) from the user’s query.

For example, you are looking for medium.com. The root domain DNS server returns the IP address of the .com TLD server. Now, our DNS resolver will cache the IP of that TLD server, so that it does not have to ask the root DNS server again for it. The TLD DNS server will have the IP addresses of the Authoritative Name Servers (which does not provide cached answers that were obtained from another name server) for the domain we are looking for.

When you update the name servers (for example, when you change the hosting provider), this information will be automatically updated by your domain registrar.

There are more than 1 DNS server for a particular hosting provider so that if one shuts down, the others can serve as a backup. The DNS resolver starts with the first and tries to ask the IP address of the domain you are looking for (with the subdomain, too). This is where we finally receive the destination IP Address. Now, the browser initiates a TCP connection with the server.

The client computer sends a Hello Server! message to the server with its Transport Layer Security (TLS) version, list of cipher algorithms, and compression methods available.

The server replies with a Hello Client!message to the client with the TLS version, selected cipher, selected compression methods, and the server's public certificate signed by a CA (Certificate Authority). The certificate contains a public key that will be used by the client to encrypt the rest of the handshake until a symmetric key can be agreed upon.

The client verifies the server digital certificate against the list that it has, of trusted CAs. If trust can be established based on the CA, the client generates a string of pseudo-random bytes and encrypts it with the server’s public key. These random bytes can be used to determine the symmetric key.

The server decrypts the random bytes using its own private key and uses these bytes to generate its own copy of the symmetric master key. The client sends a Finished message to the server, encrypting a hash of the transmission up to this point with the symmetric key.

The server generates its own hash and then decrypts the client-sent hash to verify that it matches. If it does, it sends its own Finished message to the client, also encrypted with the symmetric key.

From now on, the TLS session transmits the application (HTTP) data encrypted with the agreed symmetric key.

Sometimes, due to network congestion or flaky hardware connections, TLS packets will be dropped before reaching final destination. The sender then has to decide how to react. The algorithm used for this is called TCP congestion control.

Now, a TCP connection is established and we can send a request. The request is a plain text document structured in a precise way and is determined by the communication protocol. It consists of 3 parts: a request line, a request header, and a request body.

The request line sets, on a single line, the HTTP method, resource location, and protocol version: GET / HTTP/1.1

The request header is a set of field: value pairs that set values. There are 2 mandatory fields, one of which is Host, and the other is Connection, while all the other fields are optional:

Host: medium.com
Connection: close

Host indicates the domain name which we want to target, while Connection is always set to close unless the connection must be kept open. Some of the most used header fields are:Origin, Accept, Accept-Encoding, Cookie, Cache-Control, Dnt, etc. The header part is terminated by a blank line.

The request body is optional, not used in GET requests but very much used in POST requests and sometimes in other verbs too, and it can contain data in JSON format. Since we’re now analyzing a GET request, the body is blank.

Once the request is sent, the server processes it and sends back a response. The response starts with the status code and the status message. If the request is successful and returns a 200, it will start with: 200 OK

The request might return a different status code and message, like one of these:

301 Moved Permanently, 304 Not Modified, 401 Unauthorized, 403 Forbidden, 404 Not Found, 500 Internal Server Error

The response then contains a list of HTTP headers and the response body (which, since we’re making the request in the browser, is going to be HTML).

The browser now has received the HTML and starts to parse it, and will execute a process according to the resources required by the page: CSS, images, favicons, and Javascript files.

Then, it finally initiates the process of Rendering the page: Construct DOM Tree → Render Tree → Layout of Render Tree → Painting the render tree

The output tree (the “parse tree”) is a tree of DOM element and attribute nodes. DOM (Document Object Model) is the object representation of the HTML document and the interface of HTML elements to the outside world like JavaScript.

🚀 And here we go… the homepage of Medium is displayed on your screen.

Wrapping up, we saw things that happen behind the scenes when we send a single request over the internet. The appropriate page is displayed within a flash of an eye but it actually takes a lot of effort to do it seamlessly.

I really appreciate that you continued reading until the end. 😸

I hope you liked it. Thanks a lot for reading, Have a great week ahead!

--

--