Now proxy IP has been integrated into our daily life, such as crawler crawl, website detection, advertising testing and other businesses are inseparable from proxy IP. Currently, there are three common proxy IP addresses, namely, HTTP proxy, HTTPS proxy, and SOCKS proxy. The most common proxy is HTTP proxy.
What are the usage scenarios for HTTP proxies? (http proxy)
An HTTP proxy is a common network application that passes HTTP requests and responses between a client and a server through a middleman. Here are some common usage scenarios for HTTP proxies:
1. Access control: HTTP proxies can be used to restrict access to certain websites or content. For example, a school or company can control the websites that employees or students can access through a proxy server.
2. Caching: HTTP proxies can cache requests and responses to improve performance. When the proxy server receives a request, it can first check the cache, and if there is a response to the request in the cache, it can immediately return the response without having to make a request to the server.
3. Geolocation camouflage: HTTP proxies can be used to disguise the geographic location of clients. When the proxy server receives a request, it can change the source IP address of the request to make the server think the request came from another geographic location.
There are a lot of scenarios that can be used.
So how to use the purchased IP proxy? What are the methods? (python proxy)
1: Web crawler use
Compared with dynamic IP proxy, crawler users are the most popular, because crawler users need to constantly change IP addresses to avoid ip address blocking. Let's take a look at how to use crawler program to link proxy IP:
The code is as follows:
import requests
from bs4 import BeautifulSoup
Url = 'https://www.SmartProxy.cn//nn/' # smart agent IP address
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
Get the source code of the page
html = requests.get(url, headers=headers).text
Parse the source code of a web page using BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
Find all IP proxies
ips = soup.find_all('tr')
Loop to get the details of each IP proxy
for i in range(1, len(ips)):
ip_info = ips[i]
tds = ip_info.find_all('td')
ip = tds[1].text
port = tds[2].text
address = tds[3].text.replace('\n', '').replace(' ', '')
proxy_type = tds[5].text.replace('\n', '').replace(' ', '')
# Displays detailed IP proxy information
print(f'IP:{ip} port :{port} address :{address} Proxy type :{proxy_type}')
1.2 How to Use crawler program to automatically change IP proxy address? (rotate proxy python)
Since different sites may require different crawlers, here is a sample program that uses the requests library and proxy pool in Python, which you can modify to suit your needs.
import requests
from urllib3.exceptions import MaxRetryError, NewConnectionError
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
import random
# Customize requests requests function using proxy pooling and retry mechanism
def requests_retry_session(retries=3, backoff_factor=0.3, status_forcelist=(500, 502, 504), proxy=None):
session = requests.Session()
retry = Retry(
total=retries,
read=retries,
connect=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
if proxy:
session.proxies = {
'http': proxy,
'https': proxy
}
return session
Define the proxy pool list
proxies_list = [
'http://proxy1:port',
'http://proxy2:port',
'http://proxy3:port',
More proxy addresses can be added
]
# Select a proxy address at random
proxy = random.choice(proxies_list)
# Send the request using the requests_retry_session function
try:
response = requests_retry_session(proxy=proxy).get(url)
# Handle the response content here
except (MaxRetryError, NewConnectionError) as e:
print(f" request error: {str(e)}")
In this example program, you use the custom requests_retry_session function, which implements a retry mechanism and specifies the proxy address to use through the session.proxies parameter.
Then, before each request, a proxy address is randomly selected from the proxy pool list using the random.choice function.
If the request fails, either MaxRetryError or NewConnectionError is thrown, which can be handled as needed.
2. Use it directly through the computer system
The computer system directly sets up IP proxy use
To set up IP proxy use from your computer system, you can follow these steps:
1. Open the Control Panel: Click the "Start" button, type "Control Panel" in the search box, and press the "Enter" key.
2. Locate the "Network and Sharing Center" : In the Control Panel, locate the "Network and Sharing Center" option and click on it.
3. Find "Change Adapter Settings" : In the Network and Sharing Center, click the "Change Adapter Settings" option in the left panel.
4. Locate the network adapter on which you want to set the proxy: Locate the network adapter on which you want to set the proxy, right-click on it, and select "Properties".
5. Locate "Internet Protocol Version 4 (TCP/IPv4)" : In the adapter properties window, locate the "Internet Protocol Version 4 (TCP/IPv4)" option and click "Properties".
6. Configure the proxy server: In the Internet Protocol Version 4 (TCP/IPv4) Properties window, select Use the following proxy server address and enter the proxy server address and port number.
7. Click "OK" and save the changes: After completing the above Settings, click "OK" button to save the changes.
Please note that this is a basic guide to the steps, which may vary slightly for different operating system versions or network configurations.