为什么微博的图片用 wget 的 UA 可以下载，但是用浏览器的 UA 却无法下载？

Question

为什么微博的图片用 wget 的 UA 可以下载，但是用浏览器的 UA 却无法下载？

rabbitcoder

3.6k22542963

发布于
2024-06-13 浙江

更新于
2024-10-08

测试的图片地址来自：https://weibo.com/3209519182/OixHyvyva

下面的代码无法下载图片

import requests
from urllib.parse import urlparse

def extract_domain(url: str) -> str:
    parsed_url = urlparse(url)
    return parsed_url.netloc

# URL of the image
url = "https://wx1.sinaimg.cn/wap360/bf4d604egy1hqlqupxg7jj20yw1db15q.jpg"

# Headers to be included in the request
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:126.0) Gecko/20100101 Firefox/126.0",
    "Accept": "*/*",
    "Accept-Encoding": "identity",
    "Host":extract_domain(url),
    "Connection":"Keep-Alive"
}

# Send a GET request to the URL with the headers
print(headers)
response = requests.get(url, headers=headers,timeout=10)

# Save the image to a file
if response.status_code == 200:
    with open("G18LCI_2023.png", "wb") as file:
        file.write(response.content)
    print("Image downloaded successfully.")
else:
    print(f"Failed to download image. Status code: {response.status_code}")

会报错

╰─➤  python -u "/home/pon/code/work/pon/pon-it/crawler_console/dev/download_image.py"
{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:126.0) Gecko/20100101 Firefox/126.0', 'Accept': '*/*', 'Accept-Encoding': 'identity', 'Host': 'wx1.sinaimg.cn', 'Connection': 'Keep-Alive'}
Failed to download image. Status code: 40

但是下面的代码可以下载图片（只修改了 User-Agent 为 wget ）

import requests
from urllib.parse import urlparse

def extract_domain(url: str) -> str:
    parsed_url = urlparse(url)
    return parsed_url.netloc

# URL of the image
url = "https://wx1.sinaimg.cn/wap360/bf4d604egy1hqlqupxg7jj20yw1db15q.jpg"

# Headers to be included in the request
headers = {
    "User-Agent": "Wget/1.21.2",
    "Accept": "*/*",
    "Accept-Encoding": "identity",
    "Host":extract_domain(url),
    "Connection":"Keep-Alive"
}

# Send a GET request to the URL with the headers
print(headers)
response = requests.get(url, headers=headers,timeout=10)

# Save the image to a file
if response.status_code == 200:
    with open("G18LCI_2023.png", "wb") as file:
        file.write(response.content)
    print("Image downloaded successfully.")
else:
    print(f"Failed to download image. Status code: {response.status_code}")

输出如下

╰─➤  python -u "/home/pon/code/work/pon/pon-it/crawler_console/dev/download_image.py"                                                                                                                                                           130 ↵
{'User-Agent': 'Wget/1.21.2', 'Accept': '*/*', 'Accept-Encoding': 'identity', 'Host': 'wx1.sinaimg.cn', 'Connection': 'Keep-Alive'}
Image downloaded successfully.

图片.png