Making a map of your flickr contacts

Introduction

In this post I will explain how I did an interactive map of my flickr contacts using the python flickr api along with a few python packages:

  • pandas to parse and structure the information from json strings,
  • geopy to geocode the contact locations,
  • folium to create the map.

I used the IPython notebook, which is a great environment to work interactively with data.

First, let's see how the map looks like:

In [88]:
from IPython.display import IFrame
IFrame('assets/map.html', 700, 500)
Out[88]:

The cool thing is that you can not only zoom in and out, but clicking on a marker opens a popup with the name of the contact and its location.

Note: Some fields are marked as 'unknown' because it seems folium doen't deal with utf-8 and I had to convert everything to normal strings. In some cases it raised an error.

I. Extracting the contact list for one user

As we will see, we need to clean a little bit the data given by flickr api so that we can use it for our purpose. This means:

  • Extracting the valid json part
  • Putting it into a data frame
  • Extracting again the list which is nested inside the frame and creating a new DataFrame object from this list.

But first, we need to import the modules we need:

In [3]:
import flickrapi
import pandas as pd
from geopy.geocoders import Nominatim
import folium
from IPython.display import HTML

Then we create a flickr object which we'll use for our api requests. To do so, you'll need to get an api key from flickr.

In [5]:
api_key = '' # put your api key in between the quotes.
flickr = flickrapi.FlickrAPI(api_key, format='json')

Our first api request will be to get the public contact list of a flickr user. Here I use my id number.

In [6]:
strPubList = flickr.contacts_getPublicList(user_id='92362511@N00')
In [59]:
strPubList[:600]
Out[59]:
'jsonFlickrApi({"contacts":{"page":1, "pages":1, "per_page":1000, "perpage":1000, "total":730, "contact":[{"nsid":"37451064@N00", "username":"*Cinnamon", "iconserver":"3751", "iconfarm":4, "ignored":0, "rev_ignored":0}, {"nsid":"77557459@N02", "username":"*December Sun", "iconserver":"3844", "iconfarm":4, "ignored":0, "rev_ignored":0}, {"nsid":"33564809@N02", "username":"*eduardoa*", "iconserver":"2848", "iconfarm":3, "ignored":0, "rev_ignored":0}, {"nsid":"33645378@N04", "username":"*green leaves*", "iconserver":"7456", "iconfarm":8, "ignored":0, "rev_ignored":0}, {"nsid":"64387153@N00", "user'

Kind of a mess, huh? Let's remove a few characters so that we get a valid json string (readable by pandas) and inject it into a pandas data frame.

In [7]:
dfPubList = pd.read_json(strPubList[14:-1])
In [8]:
dfPubList
Out[8]:
contacts stat
contact [{u'username': u'*Cinnamon', u'ignored': 0, u'... ok
page 1 ok
pages 1 ok
per_page 1000 ok
perpage 1000 ok
total 730 ok

The structure is now much easier to read. We notice that the 'contacts' columns at 'contact' index contains a list. Let's extract this list and see what is inside by injecting it into another data frame.

In [9]:
dfContacts=pd.DataFrame(dfPubList['contacts']['contact'])
In [10]:
dfContacts.head()
Out[10]:
iconfarm iconserver ignored nsid rev_ignored username
0 4 3751 0 37451064@N00 0 *Cinnamon
1 4 3844 0 77557459@N02 0 *December Sun
2 3 2848 0 33564809@N02 0 *eduardoa*
3 8 7456 0 33645378@N04 0 *green leaves*
4 1 4 0 64387153@N00 0 *laikanet*

This is the information we are looking for: a list of the contacts with their user id's. Now for each user we want to find the location (if it is available).

Let's add an empty location column to our data frame that will have a string type.

In [11]:
dfContacts['location']=''

We can even delete columns we don't need.

In [13]:
del dfContacts['iconfarm']
del dfContacts['iconserver']
del dfContacts['ignored']
del dfContacts['rev_ignored']
In [15]:
dfContacts.head()
Out[15]:
nsid username location
0 37451064@N00 *Cinnamon
1 77557459@N02 *December Sun
2 33564809@N02 *eduardoa*
3 33645378@N04 *green leaves*
4 64387153@N00 *laikanet*

II. Extracting the location for each contact

Let's read info from any user to see what is inside:

In [17]:
strInfo = flickr.people_getInfo(user_id='92362511@N00')
print strInfo[14:-1]
{"person":{"id":"92362511@N00", "nsid":"92362511@N00", "ispro":1, "iconserver":"7376", "iconfarm":8, "path_alias":"tyldurd", "username":{"_content":"Enhanced Reality"}, "description":{"_content":"I mostly post pictures of my wanderings in Berlin and travels.\n\nI like taking pictures of lesser known places in the city and small details that most people won't notice. Trying to capture the atmosphere in the moment.\n\n<b>New!<\/b> \nI also have a photoblog where I post series:\n<a href=\"http:\/\/extrealphotography.weebly.com\/\" rel=\"nofollow\">extrealphotography.weebly.com\/<\/a>\nPlease, feel free to visit and comment!"}, "photosurl":{"_content":"https:\/\/www.flickr.com\/photos\/tyldurd\/"}, "profileurl":{"_content":"https:\/\/www.flickr.com\/people\/tyldurd\/"}, "mobileurl":{"_content":"https:\/\/m.flickr.com\/photostream.gne?id=2233731"}, "photos":{"firstdatetaken":{"_content":"2006-07-18 10:23:00"}, "firstdate":{"_content":"1153923091"}, "count":{"_content":976}}}, "stat":"ok"}

This user does not have a 'location'. Let's try another one:

In [18]:
strInfo = flickr.people_getInfo(user_id='37451064@N00')
print strInfo[14:-1]
{"person":{"id":"37451064@N00", "nsid":"37451064@N00", "ispro":1, "iconserver":"3751", "iconfarm":4, "path_alias":"cloughridge", "username":{"_content":"*Cinnamon"}, "realname":{"_content":"Cindy"}, "location":{"_content":"San Francisco, USA"}, "timezone":{"label":"Pacific Time (US & Canada); Tijuana", "offset":"-08:00"}, "description":{"_content":"I would randomly take pictures of nothing in particular. How else could you record life as it happens?\n\nSimon Von Booy\n\n\nI've accumulated a lot of gear over the years. You can view it <a href=\"https:\/\/www.flickr.com\/photos\/cloughridge\/5812300380\/in\/photostream\">here<\/a>\n\nElsewhere:\n\n<a href=\"http:\/\/cindyloughridgephotography.com\/portfolio.php\" rel=\"nofollow\">portfolio<\/a>\n\n<a href=\"http:\/\/www.cinnamonroseactions.com\" rel=\"nofollow\">Cinnamonroseactions<\/a>\n\n\n<a href=\"http:\/\/www.gettyimages.com\/Search\/Search.aspx?assettype=image&amp;family=creative&amp;artist=Cindy Loughridge#\" rel=\"nofollow\">Getty Images<\/a>\n\n<a href=\"http:\/\/cindyloughridge.tumblr.com\/\" rel=\"nofollow\">Tumblr<\/a>    \n\nFollow me on <a href=\"http:\/\/twitter.com\/\" rel=\"nofollow\">twitter<\/a>\n\nPurchase prints <a href=\"http:\/\/cindyloughridge.smugmug.com\/\" rel=\"nofollow\">here<\/a>\n\n\n\n\n\n\n\n\n"}, "photosurl":{"_content":"https:\/\/www.flickr.com\/photos\/cloughridge\/"}, "profileurl":{"_content":"https:\/\/www.flickr.com\/people\/cloughridge\/"}, "mobileurl":{"_content":"https:\/\/m.flickr.com\/photostream.gne?id=2502313"}, "photos":{"firstdatetaken":{"_content":"2002-08-03 01:17:18"}, "firstdate":{"_content":"1140392914"}, "count":{"_content":5103}}}, "stat":"ok"}

Again we put this into a data frame:

In [19]:
dfInfo = pd.read_json(strInfo[14:-1])
In [20]:
dfInfo
Out[20]:
person stat
description {u'_content': u'I would randomly take pictures... ok
iconfarm 4 ok
iconserver 3751 ok
id 37451064@N00 ok
ispro 1 ok
location {u'_content': u'San Francisco, USA'} ok
mobileurl {u'_content': u'https://m.flickr.com/photostre... ok
nsid 37451064@N00 ok
path_alias cloughridge ok
photos {u'count': {u'_content': 5103}, u'firstdatetak... ok
photosurl {u'_content': u'https://www.flickr.com/photos/... ok
profileurl {u'_content': u'https://www.flickr.com/people/... ok
realname {u'_content': u'Cindy'} ok
timezone {u'offset': u'-08:00', u'label': u'Pacific Tim... ok
username {u'_content': u'*Cinnamon'} ok

Now we know that the index 'location' contains a dict object with the location we need.

Then, it is possible to extract the corresponding string (only if the index exists - i.e. the user entered his/her location - to avoid errors)

In [21]:
if 'location' in dfInfo.index:
    strLoc=dfInfo.ix['location']['person']['_content']
In [22]:
strLoc
Out[22]:
u'San Francisco, USA'

Victory!

We can do this for every user in our first data frame by looping on the 'nsid' column (might take a while depending on the number of contacts)

In [23]:
for idx, nsid in enumerate(dfContacts['nsid']):
    strInfo = flickr.people_getInfo(user_id=nsid)
    dfInfo = pd.read_json(strInfo[14:-1])
    if 'location' in dfInfo.index:
        strLoc=dfInfo.ix['location']['person']['_content']
        dfContacts['location'][idx] = strLoc  
In [25]:
dfContacts.head()
Out[25]:
nsid username location
0 37451064@N00 *Cinnamon San Francisco, USA
1 77557459@N02 *December Sun Croatia
2 33564809@N02 *eduardoa* Milan, Italy
3 33645378@N04 *green leaves* Canada
4 64387153@N00 *laikanet* Helsinki, Finland

Now we have completed the location names of our contacts (if available). Note that it is not always very accurate (sometimes only the country is given). To place this on a map we need to find the coordinates of these locations: this is where geopy enters!

III. Extracting location coordinates with geopy

We add two columns for latitude and longitude. They will be set to NaN if the location is unknown.

In [26]:
dfContacts['lat'] = float('nan')
dfContacts['lon'] = float('nan')

We select a geolocator (we chose to use the first one described in the docs) which is basically a web service to retrieve coordinates from a given location string.

Then we loop on each index of the data frame to extract the coordinates from the location string.

In [28]:
geolocator = Nominatim()

for idx, it in enumerate(dfContacts.location):
    try: # Avoids interrupting the loop by time-out errors
        location = geolocator.geocode(dfContacts.location[idx].encode('utf-8'))
    except:
       location = None
    if location is not None:
        dfContacts.lat[idx] = location.latitude
        dfContacts.lon[idx] = location.longitude
In [29]:
dfContacts.head()
Out[29]:
nsid username location lat lon
0 37451064@N00 *Cinnamon San Francisco, USA 37.778960 -122.419199
1 77557459@N02 *December Sun Croatia 45.564344 17.011895
2 33564809@N02 *eduardoa* Milan, Italy 45.466621 9.190617
3 33645378@N04 *green leaves* Canada 61.066692 -107.991707
4 64387153@N00 *laikanet* Helsinki, Finland 60.171320 24.941457

We have added for each user the latitude and longitude. The only thing left to do is to put them on a map with folium.

In [147]:
# Optional: write to a file to avoid re-running everything if your browser crashes.
# encoding to utf 8 avoids an error while writing the file
# Uncomment the following line to write the data frame to a file
#dfContacts.to_csv('myContactInfo.csv', encoding='utf-8')
# Uncomment to read it from the file
#dfContacts = pd.read_csv('myContactInfo.csv', encoding='utf-8')

IV. Putting the contacts on a map with folium

Folium is a simple tool that allows to map data based on the Leaflet.js framework. The maps are saved into a html file. We can choose custom tiles available at this link: http://leaflet-extras.github.io/leaflet-providers/preview/ and display user names and locations in popups that appear when ones click on the marker.

Let's create a map:

In [60]:
mapContacts = folium.Map(location=[20, 0], zoom_start=2, 
                         tiles= r'http://{s}.tile.thunderforest.com/landscape/{z}/{x}/{y}.png',
                         attr='&copy; <a href="http://www.opencyclemap.org">OpenCycleMap</a>,\
                         &copy; <a href="http://openstreetmap.org">OpenStreetMap</a> contributors,\
                         <a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>',
                         width=700,
                         height=500)

and populate this map with the contacts that have been successfuly located. As the cherry on top we use the 'popup' option that allows to show a text when a marker is clicked. We use this to show the user name and the location string interactively.

Note the exceptions to avoid errors when str cannot convert from utf-8.

In [61]:
for idx in dfContacts.index:
    lat = dfContacts.lat[idx]
    lon = dfContacts.lon[idx]
    
    # folium doesn't seem to accept utf-8 strings in the popup
    try:
        name = str(dfContacts.username[idx].decode())
    except:
        name = "unknown"
    try:
        loc = str(dfContacts.location[idx].decode())
    except:
        loc = "unknown"
    if not(isnan(lat)) and not(isnan(lon)):
        mapContacts.polygon_marker(location=[lat, lon],
                                  radius=4,
                                  line_color='#0063db', 
                                  fill_color='#0063db',
                                  popup=name + " (" + loc + ")")
mapContacts.create_map('map.html')

V. Summary

Finally, when you know where is the information to extract, you only need a few steps:

1) Parse the contact list with contacts_getPublicList

In []:
strPubList = flickr.contacts_getPublicList(user_id='92362511@N00')
dfPubList = pd.read_json(strPubList[14:-1])
dfContacts=pd.DataFrame(dfPubList['contacts']['contact'])
dfContacts['location']=''

2) Extract the locations with people_getInfo

In []:
for idx, nsid in enumerate(dfContacts['nsid']):
    strInfo = flickr.people_getInfo(user_id=nsid)
    dfInfo = pd.read_json(strInfo[14:-1])
    if 'location' in dfInfo.index:
        strLoc=dfInfo.ix['location']['person']['_content']
        dfContacts['location'][idx] = strLoc 

3) Geocode the locations with geopy

In []:
dfContacts['lat'] = float('nan')
dfContacts['lon'] = float('nan')

geolocator = Nominatim()

for idx, it in enumerate(dfContacts.location):
    try: # Avoids interrupting the loop by time-out errors
        location = geolocator.geocode(dfContacts.location[idx].encode('utf-8'))
    except:
       location = None
    if location is not None:
        dfContacts.lat[idx] = location.latitude
        dfContacts.lon[idx] = location.longitude

4) Create and populate the map with folium

In [89]:
mapContacts = folium.Map(location=[10, 0], zoom_start=2, 
                         tiles= r'http://{s}.tile.thunderforest.com/landscape/{z}/{x}/{y}.png',
                         attr='&copy; <a href="http://www.opencyclemap.org">OpenCycleMap</a>, \
                         &copy; <a href="http://openstreetmap.org">OpenStreetMap</a> \
                         contributors, \
                         <a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>',
                         width= 700, height= 500)

for idx in dfContacts.index:
    lat = dfContacts.lat[idx]
    lon = dfContacts.lon[idx]
    
    # folium doesn't seem to accept utf-8 strings in the popup
    try:
        name = str(dfContacts.username[idx].decode())
    except:
        name = "unknown"
    try:
        loc = str(dfContacts.location[idx].decode())
    except:
        loc = "unknown"
    if not(isnan(lat)) and not(isnan(lon)):
        mapContacts.polygon_marker(location=[lat, lon],
                                  radius=4,
                                  line_color='#0063db', 
                                  fill_color='#0063db',
                                  popup=name + " (" + loc + ")")
mapContacts.create_map('map.html')

5) Open the html map in your favorite browser!

In []:
# The end!

Comments !

blogroll

social