Scrape pokemon's images from website

import requests import re import time #Download counter for output pok_counter = 1 def Proccess(id): "Get source code of url" link = "http://www.psypokes.com/dex/sprites.php?view=regular&gen=" + str(id) response = requests.get(link) Extract(response.text) def Extract(source): "Extract pokemon download number and name, also download image" global pok_counter regex = re.compile("<br />(.*?)/a>") pokes = re.findall(regex, source) count = 0 for poke in pokes: number = GetBetween(poke, '#', '<br />') name = GetBetween(poke, '<br />', '<') print str(pok_counter) + '. Downloading: ' + name pok_counter += 1 DownloadImage(number, name) count += 1 if count == 30: time.sleep(3) print "PAUSE" count = 0 def DownloadImage(number, name): "Download image from given url" link = "http://www.psypokes.com/dex/regular/" + number + ".png" file_download = './Images/' + name + '.png' response = requests.get(link) if response.status_code == 200: f = open(file_download, 'wb') f.write(response.content) f.close() def GetBetween(line, start, end): "Get substring between two strings" return ((line.split(start))[1].split(end)[0]) if __name__ == "__main__": for i in range(1,7): Proccess(i)
This was a Upwork job. The task was to get all images of pokemons from few pages.
Link to page: http://www.psypokes.com/dex/sprites.php?view=regular&gen=1 (to 7)

Pokemon's images names on server were in format 001.png, 002.png but client wanted to rename all images with real pokemon's name.

In tasks like this we can't use threads because server may BAN our IP. And it is good thing to make pause on few connections.

Be the first to comment

You can use [html][/html], [css][/css], [php][/php] and more to embed the code. Urls are automatically hyperlinked. Line breaks and paragraphs are automatically generated.