Learn how to join URLs in Python

When you need to join multiple URL segments into a single full URL, you can use the urljoin() function from the urllib.parse module.

If you have more than two URL segments, then you can use the posixpath.join() function instead.

Here’s how you can join two URL segments using urljoin():

from urllib.parse import urljoin

base_url = "https://sebhastian.com"
url_2 = "images/feature.jpg"

full_url = urljoin(base_url, url_2)

print(full_url)  # https://sebhastian.com/images/feature.jpg

Keep in mind that if you have a URL section in base_url, then you need to end the string with a forward slash / or the last section will be truncated.

Consider the example below:

from urllib.parse import urljoin

base_url = "https://sebhastian.com/assets"
url_2 = "images/feature.jpg"

full_url = urljoin(base_url, url_2)

print(full_url)  # https://sebhastian.com/images/feature.jpg

As you can see, the assets segment is truncated in the full_url returned by urljoin(). To prevent this, you need to add a forward slash / by the end of the base_url string:

from urllib.parse import urljoin

base_url = "https://sebhastian.com/assets/"
url_2 = "images/feature.jpg"

full_url = urljoin(base_url, url_2)

print(full_url)  
# https://sebhastian.com/assets/images/feature.jpg

Using the urljoin() function, you can join only two URLs at a time.

If you have a base URL and more than one URL segment, then you need to use the urljoin() function twice. The second call can be placed inside the first as follows:

from urllib.parse import urljoin

base_url = "https://sebhastian.com/"
url_2 = "assets/"
url_3 = "images/feature.jpg"

full_url = urljoin(base_url, urljoin(url_2, url_3))

print(full_url)  
# https://sebhastian.com/assets/images/feature.jpg

Now of course the code is a bit confusing to read. As an alternative, you can use the join() function from the posixpath module to join multiple URL segments.

Here’s an example of joining many URL segments with the base URL:

import posixpath

base_url = "https://sebhastian.com/"
url_2 = "assets/"
url_3 = "images/"
url_4 = "people/user_01.jpg"

full_url = posixpath.join(base_url, url_2, url_3, url_4)

print(full_url)  
# https://sebhastian.com/assets/images/people/user_01.jpg

Why use posixpath.join instead of os.path.join? This is because os.path.join will give the wrong result in Windows.

In Linux/ Mac systems, the os.path.join is an alias of posixpath.join which uses forward slash / to join path segments. But on Windows, os.path.join is an alias of ntpath.join, which uses backward slash \.

Because we’re working with URLs, we always want forward slash for the segment separator regardless of the OS we’re using.

Now you’ve learned how to join URLs in Python! If you have two URL segments, then you can use the urljoin() function.

When you have more than two URL segments, you can use the posixpath.join() function for an easy win.

I hope this tutorial helps. See you in other tutorials! 👍

Learn how to join URLs in Python

Take your skills to the next level ⚡️

About

Search

Tags