测试的图片地址来自:https://weibo.com/3209519182/OixHyvyva
下面的代码无法下载图片
import requests
from urllib.parse import urlparse
def extract_domain(url: str) -> str:
parsed_url = urlparse(url)
return parsed_url.netloc
# URL of the image
url = "https://wx1.sinaimg.cn/wap360/bf4d604egy1hqlqupxg7jj20yw1db15q.jpg"
# Headers to be included in the request
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:126.0) Gecko/20100101 Firefox/126.0",
"Accept": "*/*",
"Accept-Encoding": "identity",
"Host":extract_domain(url),
"Connection":"Keep-Alive"
}
# Send a GET request to the URL with the headers
print(headers)
response = requests.get(url, headers=headers,timeout=10)
# Save the image to a file
if response.status_code == 200:
with open("G18LCI_2023.png", "wb") as file:
file.write(response.content)
print("Image downloaded successfully.")
else:
print(f"Failed to download image. Status code: {response.status_code}")
会报错
╰─➤ python -u "/home/pon/code/work/pon/pon-it/crawler_console/dev/download_image.py"
{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:126.0) Gecko/20100101 Firefox/126.0', 'Accept': '*/*', 'Accept-Encoding': 'identity', 'Host': 'wx1.sinaimg.cn', 'Connection': 'Keep-Alive'}
Failed to download image. Status code: 40但是下面的代码可以下载图片(只修改了 User-Agent 为 wget )
import requests
from urllib.parse import urlparse
def extract_domain(url: str) -> str:
parsed_url = urlparse(url)
return parsed_url.netloc
# URL of the image
url = "https://wx1.sinaimg.cn/wap360/bf4d604egy1hqlqupxg7jj20yw1db15q.jpg"
# Headers to be included in the request
headers = {
"User-Agent": "Wget/1.21.2",
"Accept": "*/*",
"Accept-Encoding": "identity",
"Host":extract_domain(url),
"Connection":"Keep-Alive"
}
# Send a GET request to the URL with the headers
print(headers)
response = requests.get(url, headers=headers,timeout=10)
# Save the image to a file
if response.status_code == 200:
with open("G18LCI_2023.png", "wb") as file:
file.write(response.content)
print("Image downloaded successfully.")
else:
print(f"Failed to download image. Status code: {response.status_code}")
输出如下
╰─➤ python -u "/home/pon/code/work/pon/pon-it/crawler_console/dev/download_image.py" 130 ↵
{'User-Agent': 'Wget/1.21.2', 'Accept': '*/*', 'Accept-Encoding': 'identity', 'Host': 'wx1.sinaimg.cn', 'Connection': 'Keep-Alive'}
Image downloaded successfully.
header 加个 "Referer":"https://weibo.com/" 就都可以
所以猜测 防盗链,禁止
浏览器上其他网站使用这个图片 (wget 非浏览器)