How do I test whether this overseas HTTP proxy is suitable for use by a crawler proxy?

Many people need to introduce proxy IP to crawl content when doing overseas (爬虫代理) business. In order to sustain business operation, it is necessary to constantly construct, maintain and verify proxy IP. In order to bypass the server's restrictions on IP and frequency, the server should be prevented from obtaining real IP address.

Proxy4Free

So which overseas HTTP proxy is suitable for (节点爬虫)?

Here we can check whether the overseas HTTP proxy is suitable for crawlers by testing criteria.

1. Availability

The availability rate is the percentage of the proxy ip under test that can be used normally. If we cannot use the proxy to request a website or the request times out, then the proxy ip is unavailable. For example, if your test sample size is 1000, extract 1000 proxies and see what percentage of these 1000 proxies are available.

2. Response speed

The response speed of crawler agent can be measured by the time spent, that is, the time spent by the proxy ip used when you test from the request website to the website response. The shorter the response time, the faster the speed must be. It is important to note that the response speed depends on the geographical location where the agent machine is used. Different geographical locations will vary.

3. Stability

The stability of proxy ip resources will directly affect the work progress and data results. This depends on whether the connection times out during the test. If the test finds that the first response is particularly fast, but the next request waits 60 seconds for a response, or even longer. Then this kind of proxy is extremely unstable, quite affect the crawling efficiency.

4. Service

Finally, we must check the after-sales service of this company in the process of testing, which is not easy to ignore. If there is no problem in the test, but there is a problem in the process of use, it is not worth the loss to find someone, which will still affect the work, so after-sales is also very important!

Proxy4Free

How does （python requests proxy） disguise?

One: browser camouflage

Since the web server can easily identify the source browser for requests, for example, the default header does not contain browser information, so it is simply "streaking" when interacting with the browser. So we can add "User-Agent" information to pretend to be a real browser, as follows:

import requests

headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0'} # Simulate Firefox browser

response = requests.get("http://www.baidu.com",headers=headers) # mock request url

Smartproxy is an overseas HTTP proxy server provider, whose IP can accurately locate the city level and update the IP pool every month. With first-hand IP, SmartProxy serves in the field of big data acquisition and helps enterprises/individuals obtain data sources quickly and efficiently. It is really very cheap and affordable, but the speed is fast and stable.