When you need to join multiple URL segments into a single full URL, you can use the urljoin()
function from the urllib.parse
module.
If you have more than two URL segments, then you can use the posixpath.join()
function instead.
Here’s how you can join two URL segments using urljoin()
:
from urllib.parse import urljoin
base_url = "https://sebhastian.com"
url_2 = "images/feature.jpg"
full_url = urljoin(base_url, url_2)
print(full_url) # https://sebhastian.com/images/feature.jpg
Keep in mind that if you have a URL section in base_url
, then you need to end the string with a forward slash /
or the last section will be truncated.
Consider the example below:
from urllib.parse import urljoin
base_url = "https://sebhastian.com/assets"
url_2 = "images/feature.jpg"
full_url = urljoin(base_url, url_2)
print(full_url) # https://sebhastian.com/images/feature.jpg
As you can see, the assets
segment is truncated in the full_url
returned by urljoin()
. To prevent this, you need to add a forward slash /
by the end of the base_url
string:
from urllib.parse import urljoin
base_url = "https://sebhastian.com/assets/"
url_2 = "images/feature.jpg"
full_url = urljoin(base_url, url_2)
print(full_url)
# https://sebhastian.com/assets/images/feature.jpg
Using the urljoin()
function, you can join only two URLs at a time.
If you have a base URL and more than one URL segment, then you need to use the urljoin()
function twice. The second call can be placed inside the first as follows:
from urllib.parse import urljoin
base_url = "https://sebhastian.com/"
url_2 = "assets/"
url_3 = "images/feature.jpg"
full_url = urljoin(base_url, urljoin(url_2, url_3))
print(full_url)
# https://sebhastian.com/assets/images/feature.jpg
Now of course the code is a bit confusing to read. As an alternative, you can use the join()
function from the posixpath
module to join multiple URL segments.
Here’s an example of joining many URL segments with the base URL:
import posixpath
base_url = "https://sebhastian.com/"
url_2 = "assets/"
url_3 = "images/"
url_4 = "people/user_01.jpg"
full_url = posixpath.join(base_url, url_2, url_3, url_4)
print(full_url)
# https://sebhastian.com/assets/images/people/user_01.jpg
Why use posixpath.join
instead of os.path.join
? This is because os.path.join
will give the wrong result in Windows.
In Linux/ Mac systems, the os.path.join
is an alias of posixpath.join
which uses forward slash /
to join path segments. But on Windows, os.path.join
is an alias of ntpath.join
, which uses backward slash \
.
Because we’re working with URLs, we always want forward slash for the segment separator regardless of the OS we’re using.
Now you’ve learned how to join URLs in Python! If you have two URL segments, then you can use the urljoin()
function.
When you have more than two URL segments, you can use the posixpath.join()
function for an easy win.
I hope this tutorial helps. See you in other tutorials! 👍