Monday, April 8, 2013
Use foursquare to locate a twitter user using R
I've been doing some work with Twitter data. In much of this work, my life would be so much easier if we could geographically locate the origin of the tweets. There are some ways to do this using the twitter APIs. For example, if a user has geo-location turned on, you can get the precise lat-lng info for a specific tweet. Also, each user has the option to set a location in their profile. Using this free-form info, you can get an idea of where the user is located, but this is not relevant to each specific tweet, it's a user attribute. So, a Chicago native traveling and tweeting in Florida will be a problem. Using this information, you can get some information for some people. But it's not complete... so I thought about trying some other ways of location a twitter user.
My first crack at this was to see if a given twitter user is a foursquare user. Foursquare users use the service to check in to places for points or other purposes. Any how, using the foursquare API, you can retrieve a lat-lng pair for a given checkin. My idea was to look at a user's tweet history to see if there are any fourquare links. Take these foursquare links and use the foursquare API to get lat-lng pairs. Then, cluster these points and choose the mid point of the largest cluster, that is, the cluster with the most points, as the guess for the twitter users location.
First, we start off using the twitteR package to download n of a users most recent tweets...
Then, I extract the links and resolve them using a link expander service. Once that is done, I can take the foursquare links, and bounce them off of the foursquare API to get a lat-lng pair. I made a function to do this. Note that you will need a foursquare API key saved in a file as noted in the function.
Now, you have a list of lat-lng pairs from foursquare. I use the Mclust to find the largest cluster. Then, using the points in that largest cluster, take the average lat and average lng to be the center of the cluster. That is my guess for where my twitter user of interest lives.
Next, I create an openstreets map to display this. The red dots represent the various lat-lng pairs from foursquare, and the blue dot is the cluster center.
The code can be found on github.