Those of you rolling around the internet back in the day might remember some images that displayed some seemingly "personal" information to you. Granted I'm not exactly THAT much of an old timer that I programmed my own Ethernet protocol in assembly or anything- so you too might remember something like the image at right here. It's a little disconcerting at first, "How does some random image know this much about my computer?" Well today I'd like to revisit one of my old favorites for quick prototyping, django, to figure out just that.
When you think about it, it's not really too complicated. When you load a website, your browser sends packets to the site requesting information, the server creates a response addressed to you, and then the browser recreates all this disparate information into a single webpage. It's pretty clear IP addresses are part of this scheme, all data has to have to be sent somewhere after all and the server needs to know how to send the data back to its clients (you). This is part of information in the packet called headers. HTTP Headers give some sort of metadata or context to a server or client telling them how to interpret the actual data that's about to come across. Sometimes it's used, sometimes not, it's really up to the developer but the key point is that almost all browsers send this information. You can take my word that it's not really dangerous or revealing... ...that is if you choose to trust me.
Okay, let's get down to the meat of this thing. There are a couple of key features that we want in here (assuming we'll just duplicate this above image).
We need to get the IP address, this one's pretty easy, we know it will be in a header.
We need a way to get the ISP probably from the IP address.
We need to figure out the Operating System, probably from a header as well.
Finally, we need to return this all as an image
Looking through a list of the HTTP Headers available on the Django website, which are just available as a dictionary in the request, it seems the most relevant are the REMOTE_ADDR to get the client's IP and the HTTP_USER_AGENT, which tends to give the most information about the client. The user agent string is how the client identifies itself to the server, it contains some info about the browser and operating system to let the server know whether it is speaking to a Windows 3.1 desktop or an iPhone 4g. They aren't always reliable and it's pretty easy to spoof them, but for now it's not really a mission critical problem.
So the IP and OS can be gleaned from headers... what about the ISP? This one is a little more tricky, but essentially ISPs buy up blocks of the IP space and then lease them out to their users. Fortunately, these mappings don't change that often, and there are many sources online for a mapping from IP to ISP. I chose to go with an open source geoip wrapper for python, pygeoip. It basically loads a mapping .dat file and then allows the user to query easily by IP, putting it into an ISP name or region.
Well, we've identified the information we need, let's take a look at how to extract it efficiently. The first part of my function is designed to get the information and make it into a nice, user-friendly string. So far, this is what we have in the views.py of my application:
from django.shortcuts import render_to_response from django.http import HttpResponse import pygeoip as geo import re gi = geo.GeoIP('GeoIPISP.dat') SYSTEMS = ("Windows", "Macintosh", "Linux") def generate(request): # Get the names for things os = findOS(request.META['HTTP_USER_AGENT']) ip = request.META['REMOTE_ADDR'] org = gi.org_by_addr(ip) # Put them in a friendly string ip_string = "Welcome, your IP is: " + ip org_string = "Your ISP is: " + org os_string = "And your operating system is: " + os # I <3 Regex for extracting the OS name def findOS(ua_string): result = [system for system in SYSTEMS if \ re.search(system, ua_string, re.I) != None] result = result if len(result) > 0 else "Other" return result
So that is pretty simple, we do a little bit of Regex fanciness combined with a list comprehension to get the Operating System name that we needed. We do a single function call to get ISP from the IP in the HTTP header dictionary. Man I love Python.
But wait! We're not quite done. It has to be an image, just the info isn't good enough! For this I used PIL, the aptly named Python Imaging Library. In PIL it's pretty easy to create a new solid image, so that's what I did. I added a little text at strategic locations, and made both the text and the background a random color with one other function. That's it! Here's the entire views.py file:
from django.shortcuts import render_to_response from django.http import HttpResponse import pygeoip as geo import re import random from PIL import Image, ImageDraw, ImageFont gi = geo.GeoIP('GeoIPISP.dat') SYSTEMS = ("Windows", "Macintosh", "Linux") size = (400, 100) font = ImageFont.truetype("arial.ttf", 15) top, middle, bottom = (5,10), (5,40), (5,70) def generate(request): # Get the names for things os = findOS(request.META['HTTP_USER_AGENT']) ip = request.META['REMOTE_ADDR'] org = gi.org_by_addr(ip) # Put them in a friendly string ip_string = "Welcome, your IP is: " + ip org_string = "Your ISP is: " + org os_string = "And your operating system is: " + os image = Image.new("RGB", size, randomColor()) draw = ImageDraw.Draw(image) color = randomColor() # Draw the text draw.text(top, ip_string, fill=color, font=font) draw.text(middle, org_string, fill=color, font=font) draw.text(bottom, os_string, fill=color, font=font) response = HttpResponse(mimetype="image/png") image.save(response, "PNG") return response # I <3 Regex for extracting the OS name def findOS(ua_string): result = [system for system in SYSTEMS if \ re.search(system, ua_string, re.I) != None] result = result if len(result) > 0 else "Other" return result # Generate a random color def randomColor(): return tuple([random.randint(0,255) for i in *3])
BAM! Was that quick or what? I've put up the entire django project here if you'd like to read through the source. It comes with two free GeoIP.dat files, so you lucked out! The result is something like the right image here, not a bad facsimile for only a few lines of code, and if the image template had already existed, it would be even shorter. Well, that's it for now. Until next time feel free to check out more information in HTTP headers, it's interesting to see how much extra data is being passed around!