为什么微博的图片用 wget 的 UA 可以下载,但是用浏览器的 UA 却无法下载?

测试的图片地址来自:https://weibo.com/3209519182/OixHyvyva

下面的代码无法下载图片

import requests
from urllib.parse import urlparse

def extract_domain(url: str) -> str:
    parsed_url = urlparse(url)
    return parsed_url.netloc

# URL of the image
url = "https://wx1.sinaimg.cn/wap360/bf4d604egy1hqlqupxg7jj20yw1db15q.jpg"

# Headers to be included in the request
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:126.0) Gecko/20100101 Firefox/126.0",
    "Accept": "*/*",
    "Accept-Encoding": "identity",
    "Host":extract_domain(url),
    "Connection":"Keep-Alive"
}

# Send a GET request to the URL with the headers
print(headers)
response = requests.get(url, headers=headers,timeout=10)

# Save the image to a file
if response.status_code == 200:
    with open("G18LCI_2023.png", "wb") as file:
        file.write(response.content)
    print("Image downloaded successfully.")
else:
    print(f"Failed to download image. Status code: {response.status_code}")

会报错

╰─➤  python -u "/home/pon/code/work/pon/pon-it/crawler_console/dev/download_image.py"
{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:126.0) Gecko/20100101 Firefox/126.0', 'Accept': '*/*', 'Accept-Encoding': 'identity', 'Host': 'wx1.sinaimg.cn', 'Connection': 'Keep-Alive'}
Failed to download image. Status code: 40

但是下面的代码可以下载图片(只修改了 User-Agent 为 wget )

import requests
from urllib.parse import urlparse

def extract_domain(url: str) -> str:
    parsed_url = urlparse(url)
    return parsed_url.netloc

# URL of the image
url = "https://wx1.sinaimg.cn/wap360/bf4d604egy1hqlqupxg7jj20yw1db15q.jpg"

# Headers to be included in the request
headers = {
    "User-Agent": "Wget/1.21.2",
    "Accept": "*/*",
    "Accept-Encoding": "identity",
    "Host":extract_domain(url),
    "Connection":"Keep-Alive"
}

# Send a GET request to the URL with the headers
print(headers)
response = requests.get(url, headers=headers,timeout=10)

# Save the image to a file
if response.status_code == 200:
    with open("G18LCI_2023.png", "wb") as file:
        file.write(response.content)
    print("Image downloaded successfully.")
else:
    print(f"Failed to download image. Status code: {response.status_code}")

输出如下

╰─➤  python -u "/home/pon/code/work/pon/pon-it/crawler_console/dev/download_image.py"                                                                                                                                                           130 ↵
{'User-Agent': 'Wget/1.21.2', 'Accept': '*/*', 'Accept-Encoding': 'identity', 'Host': 'wx1.sinaimg.cn', 'Connection': 'Keep-Alive'}
Image downloaded successfully.

图片.png

阅读 1.6k
1 个回答

header 加个 "Referer":"https://weibo.com/" 就都可以

所以猜测 防盗链,禁止浏览器上其他网站使用这个图片 (wget 非浏览器)

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题