…and here’s how you can hack for good: make porn accessible

by Gilbert Keith

http://pythonadventures.wordpress.com/2013/11/08/extracting-relevant-images-from-xxx-galleries-using-text-clustering/

" Problem
On the web you can find lots of free XXX galleries. There are also sites that collect these galleries and update their list at a daily frequence. When you visit such a gallery, you get either (1) images, or (2) links to images through thumbnails. But! Beside these relevant images, there is always some noise: banners, other thumbnails, links to other galleries, etc.

How to write a universal scraper that gets the URL of a gallery and it extracts just the relevant images without any noise? How to separate real content from noise?"

Advertisements