rtweet library

This section is the one for all the functions inside the rtweet library that I have learned to use. I provide understandable code and examples of how to use each function.

Scraping Twitter for data is a superpower, it is wise to wield it with caution in the name of data science

Twitter API Rate Limits

Returns the data for all Twitter API function calls

It is important to know your rate limits when calling functions, it is very important to avoid rate limiting.

The tibble that gets returned includes a time stamp of when you last ran a specific function call and shows when you are safe to call it again.

#--- returns a full list of functions and their rate limits
Rate_Limit =  rate_limit()

#--- get rate limit info for specific token (function)
token <- get_tokens()

rate_limit(token)

rate_limit(token, "search_tweets")

Search

Search Tweets by User

Returns up to 90,000 statuses (tweets).

  • grab tweets by status_id or screen_name
  • tweets must be <= 90,000
  • MUST AVOID RATE LIMITS WHILE ITERATING every 15 min when grabbing >90,000 tweets use next_cursor() to wait & scrape tweets every 15 min (p.46 in rtweet docs pdf)

Note: It is important to hold onto these status_id numbers, as it is one sure way to retrieve old tweets without getting premium Twitter API involved which is required for older tweets.

This code is from the documentation, shows how that using a status_id can fetch old tweets

statuses <- c(
  "567053242429734913", # Andrew Malcolm   2015-02-15 
  "266031293945503744", # Barack Obama     2012-11-07
  "440322224407314432"  # Ellen DeGeneres  2014-03-03
)
tweet_statuses = lookup_statuses(statuses)

tweet_statuses %>% 
  select(status_id, name, screen_name, user_id, created_at, text)

Search Multiple Users

Using a vector to grab tweets.

twitter_names = c("usr1",'usr2', 'usr3','usr4')

twitter_users_search = lookup_users(
  users = twitter_names,
  parse = TRUE
)

Search Multiple Queries

There are 2 methods for retrieving multiple Twitter search queries, but I will show you only one, the easiest way.

Note: there is search_tweets and search_tweets2, using the latter is more flexible.

This is how to search for 3 queries using a vector, and capturing 1000 tweets.

dataSci_tweets = search_tweets2(
  c("data science","RStats","dataviz"),
  n = 1000
)

#-- look at the dataframe
head(dataSci_tweets)

#--- look at each query tweet tally of the 3 queries 
table(dataSci_tweets$query)

# Tally the 3 queries
## data science      dataviz       RStats 
##          999         1000         1000

Get User’s Followers

Twitter API calls a Twitter follower a friend, here is how to search for and get a list of a user’s followers/ friends.

  • 5,000 is rate limit max (default)
  • “-1” = 1st page of JSON, else if user has >5,000
usr_friends = get_friends(
  '<user_name>',       # @<user_name>
  n = 5000,  
  page = "-1", 
  parse= TRUE
)

This returns a dataframe with integer values for user_id.

Get User’s Likes

Twitter API calls a user’s likes favorites, the {❤️}. The rate limit is <= 3,000 statuses / tweets.

usr_likes = get_favorites(
    '<user_name>',  # @<user_name>
    n = 200,
    parse = TRUE
) 

Get a List of Users Likes

Use a vector for multiple twitter accounts to get each of their liked tweets.

  • users = c('usr1','usr2','usr3','usr4','usr5')
usr_faves = get_favorites( users, n = 400)

usr_faves %>% view()

usr_faves %>% 
  select(screen_name, text, favorite_count) %>% view()

List Search Tweets

Grab more than one Twitter user tweets with function lapply().

  • Twitter users : c('usr1','usr2','usr3','usr4')
list_tweets = lapply(c("usr1", 
                       "usr2", 
                       "usr3", 
                       "usr4"), 
                     search_tweets, # twitter dataframe variable used 
                     n = 5000       # number of tweets
                     )

Note: list_tweets %>% view() doesn’t work for this, you need to callrbind() the list into a dataframe

tweet_df = do_call_rbind( list_tweets )

now we have a df: tweet_df %>% view()

view the dataframe users_data( tweet_df ) %>% view()

Get Twitter Mentions of User

This returns <= 200 mentions of a twitter user. Returns the last 200 tweets you were tagged in a reply.

usr_mentions = get_mentions(
  n = 200,
  parse = TRUE
)

usr_mentions$text

Your Twitter Timeline

Returns your timeline, the ‘home’ Twitter tab if you were on the app. The default of timeline tweets is 100, the checks is for the rate limit.

my_twitter_ = get_my_timeline(
  n = 100, 
  parse = TRUE,
  check = TRUE  
)

my_twitter_$screen_name

Get a User’s Timeline

Returns <= 3200 statuses (tweets) of single Twitter user. The home values are FALSE for user-timeline and TRUE for home-timeline.

user_timeline = get_timeline(
  '<user_name>',
  n = 100,
  home = FALSE, 
  parse = TRUE,
  check = TRUE
)

user_timeline %>% 
  select(screen_name, text)

Get Users Timelines

Returns <= 3200 statuses (tweets) for each Twitter user specified.

  • users: c('usr1','usr2', 'usr3')
group_timelines = get_timelines(
  c('usr1','usr2', 'usr3'), 
  n = 100,
  home = F,
  parse = T,
  check = T
)

group_timelines %>% 
  select(screen_name, text) %>% view()

group_timelines$text

Grab Direct Messages

Retrieve 50 of your direct messages in the last 30 days

direct_messages(n=50, 
                next_cursor = NULL,
                parse = TRUE,
                token = NULL)

Get Twitter Retweeters

Returns IDs of users who retweeted status The maximum of queries is 100 per request.

status_id is required and is the long integer associated with a tweet (status).

retweeters = get_retweeters(
  '< 19 integer numbers >',  
  n = 100,
  parse = TRUE
)
retweeters

Get Retweets

Returns a collection of 100 recent retweets of a specific tweet (status). The maximum for queries is 100.

  • One way of finding a status_id can be done by going to your own Twitter timeline or notifications and find a tweet that was retweeted, click on one of the account users who retweeted your tweet. Then scroll to find where it shows on their timeline of retweeting your tweet, click on it. Look at the url address bar to find the integer value.
re_tweets = get_retweets(
  '<long integer number>', 
  n = 100,
  parse = TRUE
)

re_tweets %>% 
  select(screen_name, text) %>% view()

You want to stay up with the trends of today, this hour, this minute, you can get the latest trends by using get_trends(), so trendy ! Returns Twitter trends for specific location.

  • ‘city-name’ or ‘country-name’ can be used.
  • Where On Earth ID (WOEID) can be used : e.g. Toronto: 4118
  • can use lat = and lng = values

Here is some coordinates for starting:

  • Vancouver, BC, Canada 49.2827, -123.1207
  • Halifax, NS, Canada 44.651070, -63.582687
  • Yellowknife, NT, Canada 62.453972, -114.371788
  • Edmonton, AB, Canada 53.631611, -113.323975

This code is to show how to get the trending tweets for Vancouver, see if the tweets are promoted (ads) and the tweet volume.

trending = get_trends('canada')

trending %>% 
  select(trend, place, promoted_content, tweet_volume) %>% 
  arrange( desc(tweet_volume) )

get_city_trend = get_trends(lat = 49.28,  # Vancouver
                            lng = -123.12)
get_city_trend

Twitter List Members

Returns users on a given list, memberships with given user.

  • A slug is account name associated with a list, this is an option for function call
  • list_id is a numeric value
  • owner_user is the account that created/ owns a list
  • query max limit is 4000 and is the default

Twitter account rstats List created by “@owenlhjphillips”

You can use slug instead of list_id with owner_user

membersList = lists_members(
  list_id = "1................8",     # option 1 (no slug)
  # slug = 'rstats',                  # optional 2 (no list_id)
  owner_user = "<usr_name>", 
  parse = TRUE,
  n= 4000 
)

membersList %>% 
  select(name, 
        screen_name, 
        location, 
        followers_count) %>%
  view()

This will return a large list of twitter users who are members of a List.

Twitter user Lists memberships

Returns the lists a Twitter user is a member of.

usr_memberships = lists_memberships(
  user = "<user_name>",
  n = 200,
  parse = TRUE
)

usr_memberships %>% 
  select(name, full_name) %>% view()

Timeline Tweets by User of a List

Returns a timeline of tweets of a list by user

  • include_rts (optional) takes TRUE or FALSE for including retweets
  • parse is by default set to TRUE
  • since_id (optional) argument available which is for returning older tweets and is subject rate limits
  • max_id (optional) returns tweets that is older or equal to the specified ID
  • A slug is account name associated with a list, this is an option for function call
usr_list_timeline = lists_statuses(
  slug = "<slug_name>", 
  owner_user = "<usr_name>",
  n = 200,
  parse = TRUE,
  include_rts = FALSE
)

usr_list_timeline %>% view()

Twitter List Subscribers

There is a Twitter list and people subscribe to it, this function is to get who is on this list specified.

This example uses New York Times politics list subscribers

NYT_subs = lists_subscribers(
  slug = "new-york-times-politics",
  owner_user = "nytpolitics",
  n= 1000
)

NYT_subs %>% head()

Twitter List Subscriptions by User

Returns a list of a user’s subscriptions. The what does Twitter user X subscribe to? is answered here.

  • user is user_id or screen_name
  • n has maximum of 1000
usr_List_Subs = lists_subscriptions(
  user = "<usr_name>",
  n = 400,
  parse= TRUE
)

usr_List_Subs %>% view()

Tidy Text

Tidy Twitter Text

You have searched Twitter for a hashtag, you saved the data and loaded it, this where you can have clean Twitter text by using the plain_tweets() function. This has 2 steps to get the clean text.

# step 1 
twitter_df_text = searched_tweets$text

# step 2
clean_tweets_text = plain_tweets( twitter_df_text )

clean_tweets_text

Twitter Stop Words

Returns Twitter’s dataframe of stop words.

  • stopwordslang has 24,000 rows

  • words associated with 10 different languages: including c("ar","en","es","fr","in","ja","pt","ru","tr","und").

  • variables:

    • word - potential stop words
    • lang - 2 or 3 word code
    • p - probability value associated with frequency (normal distribution), higher values mean word occurs more frequently (vice versa)

head(stopwordslangs)


Post

Post Direct Message

Posts a message to specified user in Messages

  • use screen_name or user_id for targeted message

Send a message from RStudio to a user !

Note: using a variable name still runs the function, but this allows for you to see all the information regarding direct messages on Twitter

# ===== post a direct message to user

DM_message = post_message(
  "Hi from RStudio {rtweet} message #1",
  user = "<usr_name>",
  media = NULL
)

Post a Tweet

Posts a tweet to a Twitter user

Tweet from RStudio !

  • status (tweet) must be < 280 characters
  • media is the file path for a image or video to be included in tweet
  • destroy_id is used to delete a tweet, you need to provide a single status_id integer value in order for it to work
post_tweet(
  status = "my 1st {rtweet} #rstats from Rstudio",
  media = NULL,
  token = NULL,
  in_reply_to_status_id = NULL,
  destroy_id = NULL,
  retweet_id = NULL,
  auto_populate_reply_metadata = FALSE
)

Post a Tweet to a Thread 🧵

Returns up to 3200 tweets posted to a timeline by 1 or more Twitter users

  • user or user_id can be used
  • the parse argument is meant to save you anger and frustration when using this data, be kind to yourself and always set it to TRUE, as it returns a parsed dataframe
##------ lookup status_id for my own timeline
my_timeline <- get_timeline(rtweet:::home_user()) 
my_timeline

##------ ID for reply, slice the first one (latest tweet) to get status_id integer
reply_id <- my_timeline$status_id[1] 
reply_id

##------ post reply
post_tweet("second in the thread {rtweet}",
           in_reply_to_status_id = reply_id)

Plots

#TidyTuesday plot

You can post your #TidyTuesday plots from your RStudio or any generated plots. This is an example from the documentation.

##---------- generate data to make/save plot (as a .png file)
x <- rnorm(300)
y <- x + rnorm(300, 0, .75)
col <- c(rep("#002244aa", 50), rep("#440000aa", 50))
bg <- c(rep("#6699ffaa", 50), rep("#dd6666aa", 50))

##--------- crate temporary file name
tmp <- tempfile(fileext = ".png")

##-------- save as png
png(tmp, 6, 6, "in", res = 127.5)
par(tcl = -.15, family = "Inconsolata",
    font.main = 2, bty = "n", xaxt = "l", yaxt = "l",
    bg = "#f0f0f0", mar = c(3, 3, 2, 1.5))
plot(x, y, xlab = NULL, ylab = NULL, pch = 21, cex = 1,
     bg = bg, col = col,
     main = "This image was uploaded by rtweet")
grid(8, lwd = .15, lty = 2, col = "#00000088")
dev.off()

##------- post tweet with media attachment

post_tweet("a tweet with media attachment {rtweet}", media = tmp)

Time series plot 1

Returns a ggplot2 time interval plot based on Twitter data. This is an example of searching Twitter for the Trending #ClimateEmergency.

  • by == secs, mins, hours, days, months, years. The default is in seconds when integer is given
#--------- search for the #ClimateEmergency 
# ClimateEmerg = search_tweets2(
#   "ClimateEmergency",
#   n = 10000
# )

ClimateEmerg %>% head()

#-- time series plot
ClimateEmerg_freq = ts_plot(ClimateEmerg, by="mins")
ClimateEmerg_freq

ClimateEmerg %>% 
  group_by(is_retweet) %>% 
  ts_plot("hours")
  
# Compare tweets by retweet or not

Time Series plot 1.1

Extract tweets from users data object (parsed data). Use of tweets_data to return a dataframe.

ClimateEmerg_users =  tweets_data( users = ClimateEmerg )
ClimateEmerg_users

Parse the data into dataframes/ tibbles

tweets.and.users=  tweets_with_users(ClimateEmerg)
tweets.and.users

Time series plot 2

Using the searched tweets dataframe (previously run Twitter search for tweets) we can use the time series plotting function to generate a plot. The ts_plot is a time series frequency plot, you can use ggplot2 alone if you like or use it in conjunction with the ts_plot().

# Twitter searched: 'rstats' OR 'RStats' tweets by minute 

ts_plot(rstats_searched_tweets,  # dataframe used to scrape Twitetr
        by = "mins",              # min | secs | days | weeks | year
        tz = "America/Edmonton",  # your timezone
        trim = 1) +              # slice 1 min from start and end of data
        

Live Twitter

Returns Twitter data on query specified for duration set in function call

Returns public tweets, with 4 methods:

  • 1 - small random sample of tweets available
  • 2 - filtering using search query (<= 400 keywords)
  • 3 - tracking vector of user ids (<= 5000 user_ids)
  • 4 - geolocation coordinates

This function can be used for trends and to grab users data.

Note:

  • the timeout function can be set to higher values
  • the folder generated by rtweets will have a integer string in the name, where it stores the JSON file. Moving a copy to current working directory makes for use of file easier.

This data stream collection is on the Trend #ADayOffTwitch which was a protest against online hate and harassment.

Twitter_LiveStream = stream_tweets2(
  "ADayOffTwitch",
  timeout = 90,     # 30 sec is default
  parse = TRUE,
  verbose = TRUE,
  file_name = "TwitterLiveTweets",
  append = TRUE  
  # default is FALSE which overwrites pre-existing data
)

#-- parse
Twitter_LiveStream = parse_stream('TwitterLiveTweets.json')

#-- users data into dataframe
twitch_tweet_users = users_data(Twitter_LiveStream)
twitch_tweet_users %>% view()

# time series plot of tweets based on seconds
ts_plot( Twitter_LiveStream, "secs")