To get the base URL of a full URL string in Python, you can use the urlsplit
function from the urllib.parse
module.
The urlsplit()
function returns segments of your URL as an object:
from urllib.parse import urlsplit
full_url = "https://sebhastian.com/images/feature.jpg"
url_parts = urlsplit(full_url)
print(url_parts)
Output:
SplitResult(
scheme='https',
netloc='sebhastian.com',
path='/images/feature.jpg',
query='',
fragment=''
)
To get the base URL from the url_parts
result, you can concatenate the scheme
and netloc
parts together with a ://
between them.
If you only need the domain name, then you can access the netloc
section:
from urllib.parse import urlsplit
full_url = "https://sebhastian.com/images/feature.jpg"
url_parts = urlsplit(full_url)
base_url = url_parts.scheme + "://" + url_parts.netloc
domain_only = url_parts.netloc
print("Base URL:", base_url)
print("URL domain:", domain_only)
Output:
Base URL: https://sebhastian.com
URL domain: sebhastian.com
Next, suppose you want to get the base URL up until before the last section.
You can use the str.rsplit()
method and remove just the last part as follows:
full_url = "https://sebhastian.com/assets/images/feature.jpg"
url_parts = full_url.rsplit('/', 1)
print(url_parts)
# ['https://sebhastian.com/assets/images', 'feature.jpg']
base_url = url_parts[0]
print(base_url)
# https://sebhastian.com/assets/images
And that’s how you get the base URL using Python. If you want to get the base URL or just the domain, you can use the urlsplit()
function.
To get the base URL up to before the last segment, you can use the rsplit()
method.
I’ve also written additional tutorials that explain how to work with URLs in Python:
How to join URLs in Python
How to download a file from a URL in Python
I hope these tutorials are helpful in your Python programming journey. See you in other tutorials! 👋