Introduction
In this post I will explain how I did an interactive map of my flickr contacts using the python flickr api along with a few python packages:
- pandas to parse and structure the information from json strings,
- geopy to geocode the contact locations,
- folium to create the map.
I used the IPython notebook, which is a great environment to work interactively with data.
First, let's see how the map looks like:
from IPython.display import IFrame
IFrame('assets/map.html', 700, 500)
The cool thing is that you can not only zoom in and out, but clicking on a marker opens a popup with the name of the contact and its location.
Note: Some fields are marked as 'unknown' because it seems folium doen't deal with utf-8 and I had to convert everything to normal strings. In some cases it raised an error.
I. Extracting the contact list for one user
As we will see, we need to clean a little bit the data given by flickr api so that we can use it for our purpose. This means:
- Extracting the valid json part
- Putting it into a data frame
- Extracting again the list which is nested inside the frame and creating a new DataFrame object from this list.
But first, we need to import the modules we need:
import flickrapi
import pandas as pd
from geopy.geocoders import Nominatim
import folium
from IPython.display import HTML
Then we create a flickr object which we'll use for our api requests. To do so, you'll need to get an api key from flickr.
api_key = '' # put your api key in between the quotes.
flickr = flickrapi.FlickrAPI(api_key, format='json')
Our first api request will be to get the public contact list of a flickr user. Here I use my id number.
strPubList = flickr.contacts_getPublicList(user_id='92362511@N00')
strPubList[:600]
Kind of a mess, huh? Let's remove a few characters so that we get a valid json string (readable by pandas) and inject it into a pandas data frame.
dfPubList = pd.read_json(strPubList[14:-1])
dfPubList
The structure is now much easier to read. We notice that the 'contacts' columns at 'contact' index contains a list. Let's extract this list and see what is inside by injecting it into another data frame.
dfContacts=pd.DataFrame(dfPubList['contacts']['contact'])
dfContacts.head()
This is the information we are looking for: a list of the contacts with their user id's. Now for each user we want to find the location (if it is available).
Let's add an empty location column to our data frame that will have a string type.
dfContacts['location']=''
We can even delete columns we don't need.
del dfContacts['iconfarm']
del dfContacts['iconserver']
del dfContacts['ignored']
del dfContacts['rev_ignored']
dfContacts.head()
II. Extracting the location for each contact
Let's read info from any user to see what is inside:
strInfo = flickr.people_getInfo(user_id='92362511@N00')
print strInfo[14:-1]
This user does not have a 'location'. Let's try another one:
strInfo = flickr.people_getInfo(user_id='37451064@N00')
print strInfo[14:-1]
Again we put this into a data frame:
dfInfo = pd.read_json(strInfo[14:-1])
dfInfo
Now we know that the index 'location' contains a dict object with the location we need.
Then, it is possible to extract the corresponding string (only if the index exists - i.e. the user entered his/her location - to avoid errors)
if 'location' in dfInfo.index:
strLoc=dfInfo.ix['location']['person']['_content']
strLoc
Victory!
We can do this for every user in our first data frame by looping on the 'nsid' column (might take a while depending on the number of contacts)
for idx, nsid in enumerate(dfContacts['nsid']):
strInfo = flickr.people_getInfo(user_id=nsid)
dfInfo = pd.read_json(strInfo[14:-1])
if 'location' in dfInfo.index:
strLoc=dfInfo.ix['location']['person']['_content']
dfContacts['location'][idx] = strLoc
dfContacts.head()
Now we have completed the location names of our contacts (if available). Note that it is not always very accurate (sometimes only the country is given). To place this on a map we need to find the coordinates of these locations: this is where geopy enters!
III. Extracting location coordinates with geopy
We add two columns for latitude and longitude. They will be set to NaN if the location is unknown.
dfContacts['lat'] = float('nan')
dfContacts['lon'] = float('nan')
We select a geolocator (we chose to use the first one described in the docs) which is basically a web service to retrieve coordinates from a given location string.
Then we loop on each index of the data frame to extract the coordinates from the location string.
geolocator = Nominatim()
for idx, it in enumerate(dfContacts.location):
try: # Avoids interrupting the loop by time-out errors
location = geolocator.geocode(dfContacts.location[idx].encode('utf-8'))
except:
location = None
if location is not None:
dfContacts.lat[idx] = location.latitude
dfContacts.lon[idx] = location.longitude
dfContacts.head()
We have added for each user the latitude and longitude. The only thing left to do is to put them on a map with folium.
# Optional: write to a file to avoid re-running everything if your browser crashes.
# encoding to utf 8 avoids an error while writing the file
# Uncomment the following line to write the data frame to a file
#dfContacts.to_csv('myContactInfo.csv', encoding='utf-8')
# Uncomment to read it from the file
#dfContacts = pd.read_csv('myContactInfo.csv', encoding='utf-8')
IV. Putting the contacts on a map with folium
Folium is a simple tool that allows to map data based on the Leaflet.js framework. The maps are saved into a html file. We can choose custom tiles available at this link: http://leaflet-extras.github.io/leaflet-providers/preview/ and display user names and locations in popups that appear when ones click on the marker.
Let's create a map:
mapContacts = folium.Map(location=[20, 0], zoom_start=2,
tiles= r'http://{s}.tile.thunderforest.com/landscape/{z}/{x}/{y}.png',
attr='© <a href="http://www.opencyclemap.org">OpenCycleMap</a>,\
© <a href="http://openstreetmap.org">OpenStreetMap</a> contributors,\
<a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>',
width=700,
height=500)
and populate this map with the contacts that have been successfuly located. As the cherry on top we use the 'popup' option that allows to show a text when a marker is clicked. We use this to show the user name and the location string interactively.
Note the exceptions to avoid errors when str cannot convert from utf-8.
for idx in dfContacts.index:
lat = dfContacts.lat[idx]
lon = dfContacts.lon[idx]
# folium doesn't seem to accept utf-8 strings in the popup
try:
name = str(dfContacts.username[idx].decode())
except:
name = "unknown"
try:
loc = str(dfContacts.location[idx].decode())
except:
loc = "unknown"
if not(isnan(lat)) and not(isnan(lon)):
mapContacts.polygon_marker(location=[lat, lon],
radius=4,
line_color='#0063db',
fill_color='#0063db',
popup=name + " (" + loc + ")")
mapContacts.create_map('map.html')
V. Summary
Finally, when you know where is the information to extract, you only need a few steps:
1) Parse the contact list with contacts_getPublicList
strPubList = flickr.contacts_getPublicList(user_id='92362511@N00')
dfPubList = pd.read_json(strPubList[14:-1])
dfContacts=pd.DataFrame(dfPubList['contacts']['contact'])
dfContacts['location']=''
2) Extract the locations with people_getInfo
for idx, nsid in enumerate(dfContacts['nsid']):
strInfo = flickr.people_getInfo(user_id=nsid)
dfInfo = pd.read_json(strInfo[14:-1])
if 'location' in dfInfo.index:
strLoc=dfInfo.ix['location']['person']['_content']
dfContacts['location'][idx] = strLoc
3) Geocode the locations with geopy
dfContacts['lat'] = float('nan')
dfContacts['lon'] = float('nan')
geolocator = Nominatim()
for idx, it in enumerate(dfContacts.location):
try: # Avoids interrupting the loop by time-out errors
location = geolocator.geocode(dfContacts.location[idx].encode('utf-8'))
except:
location = None
if location is not None:
dfContacts.lat[idx] = location.latitude
dfContacts.lon[idx] = location.longitude
4) Create and populate the map with folium
mapContacts = folium.Map(location=[10, 0], zoom_start=2,
tiles= r'http://{s}.tile.thunderforest.com/landscape/{z}/{x}/{y}.png',
attr='© <a href="http://www.opencyclemap.org">OpenCycleMap</a>, \
© <a href="http://openstreetmap.org">OpenStreetMap</a> \
contributors, \
<a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>',
width= 700, height= 500)
for idx in dfContacts.index:
lat = dfContacts.lat[idx]
lon = dfContacts.lon[idx]
# folium doesn't seem to accept utf-8 strings in the popup
try:
name = str(dfContacts.username[idx].decode())
except:
name = "unknown"
try:
loc = str(dfContacts.location[idx].decode())
except:
loc = "unknown"
if not(isnan(lat)) and not(isnan(lon)):
mapContacts.polygon_marker(location=[lat, lon],
radius=4,
line_color='#0063db',
fill_color='#0063db',
popup=name + " (" + loc + ")")
mapContacts.create_map('map.html')
5) Open the html map in your favorite browser!
# The end!
Comments !