Python Requests - Handling Redirection

The requests library in Python is a powerful and popular tool for making HTTP requests. One of its useful features is handling redirection automatically. Redirection is a common occurrence on the web, where a server responds to a client's request by directing it to another URL. This can be happen for different reasons, such as URL restructuring, load balancing, or content relocation. Understanding how to handle these redirects using the requests library is essential for building robust and efficient web scraping or web interaction tools.

Understanding HTTP Redirection

Before diving into the specifics of handling redirection with the requests library, it's important to understand the basics of HTTP redirection. HTTP redirection is indicated by 3xx status codes in the server's response. Common 3xx status codes include:

301 Moved Permanently: The resource has been permanently moved to a new URL. Future requests should use the new URL.
302 Found: Temporarily this resource is found at a different URL. Future requests should tend to continue to use the original URL.
303 See Other: The response to the request can be found at another URL using a GET method.
307 Temporary Redirect: The resource resides temporarily under a different URL, and the request method should not change.
308 Permanent Redirect: Similar to 301, but the request method and the body will not change.

When a client receives one of these status codes, it needs to follow the redirect to the new URL provided in the Location header of the response.

Handling Redirection with requests

The requests library simplifies handling redirection. By default, it follows redirects automatically. Here's a basic example of how it works:

import requests
response = requests.get('http://github.com')
print(response.url)
print(response.status_code)

Output:

https://github.com
200

In this example, if you visit http://github.com, you'll notice it redirects to https://github.com. The requests library follows this redirect automatically, and the final URL is printed out.

Controlling Redirection Behavior

While the default behavior of following redirects is convenient, there are scenarios where you might want more control over the redirection process. The requests library provides several ways to manage this.

Disabling Redirection

To disable automatic redirection, you can use the allow_redirects parameter:

response = requests.get('http://github.com', allow_redirects=False)
print(response.status_code)
print(response.headers['Location'])

Output:

301
https://github.com/

In this case, requests will not follow the redirect. The response will contain the original status code (e.g., 301) and the Location header with the URL to which the request would have been redirected.

Limiting the Number of Redirects

By default, requests will follow up to 30 redirects. You can change this limit using a custom session and the max_redirects attribute:

session = requests.Session()
session.max_redirects = 5

try:
    response = session.get('http://example.com')
except requests.exceptions.TooManyRedirects as e:
    print('Too many redirects:', e)

Output:

http://example.com/

In this example, the session will follow a maximum of 5 redirects. If the redirection chain exceeds the respective desired limit, a TooManyRedirects exception will be raised.

Inspecting the Redirection History

The requests library allows you to inspect the history of redirects that occurred during the request. This is available via the history attribute of the response object:

response = requests.get('http://github.com')
print(response.history)
for resp in response.history:
    print(resp.status_code, resp.url)

Output:

[]
301 http://github.com

The history attribute is a list of response objects that were created during the redirection process. You can iterate over this list to see each intermediate step.

Practical Applications

Web Scraping

When scraping websites, handling redirects is crucial because many sites use redirection to manage their content. For example, a website might redirect users to a mobile version of the site if accessed from a mobile device. Here's how you can handle such scenarios:

import requests
from bs4 import BeautifulSoup

url = 'http://example.com'
response = requests.get(url)

if response.history:
    print('Redirected History:')
    for resp in response.history:
        print(resp.status_code, resp.url)

soup = BeautifulSoup(response.content, 'html.parser')
print(soup.title.string)

Output:

Redirected History:
301 http://example.com
200 https://www.iana.org/domains/example
Example Domain

In this example, the script handles redirection automatically and then uses BeautifulSoup to parse the final content.

API Requests

Some APIs use redirection to balance load across different servers. When working with such APIs, it's important to handle redirects to ensure that your requests reach the correct server:

url = 'http://api.example.com/data'
response = requests.get(url)
if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f'Request failed with status code: {response.status_code}')

Output:

Request failed with status code: 404

In this case, the script follows redirects to ensure that the API request is completed successfully.

Conclusion

Handling redirection with the requests library in Python is straightforward thanks to its built-in support for automatic redirection. By understanding the basics of HTTP redirection and using the various features and customization options provided by the requests library, you can effectively manage redirection in your web scraping and web interaction tasks. Whether you need to follow redirects automatically, disable them, limit the number of redirects, or implement custom redirect handling logic, the requests library offers the flexibility and power to meet your needs.

Next TopicReturn the frobenius norm of the matrix in linear algebra in python

← prev next →