Using the rtweet pkg

Necessary information to know :

  • You need a Twitter Developer Account to perform the tasks on your own
  • need to access token for API use
  • running search_words variable launches Twitter Developer webpage where you click authorize rtweets access, then your code runs (it is advised to have the browser window open and ready for the prompt to ask for authorization for Rwteet package)

User API requests

The basic use of Twitter API has limits on how often you can request data, every 15 minutes between request calls is required (small data requests can be less than 15 minutes but caution is advised). Based on personal experience, it is very easy to rate limit yourself and get restricted access and use of your Twitter bot/ Developer account. It is strongly advised that every function made to Twitter have a variable name as to avoid running executive code unnecessarily when typos and changes take place, as well as making saving the data easier. Check your function call variables before running them.

Factors to keep in mind when you run rtweet functions:

  • your data is highly influenced by the hashtag or phrase being searched
  • the day and time of scraping data
  • the language of searched tweets

Running your functions more than once is required when wanting to capture as many people as possible in the data.


4 steps to Twitter data

I state 4 steps here only to ensure good work practice as to avoid rate limit warnings.

step 1 Libraries

library(rtweet)
library(tidytext)
library(ggplot2)

step 2 Search

Specific tweets with word / phrase / hashtag are returned.

  • tweets <= 9 days ago

  • maximum tweets per API request is 18,000. For tweets >18,000 you need to set retryonratelimit = TRUE which requires waiting 15 minutes before you can run the function again.

  • query needs to be <= 500 characters

  • spaces are treated as ‘AND’

  • the word or MUST be capitalized: OR ex. “rstats OR RStats”

  • filtering options:

    • filtering tweets exclude items: ‘-filter’
    • “-filter:quote”
    • “-filter:replies”
    • “filter:news” returns tweets with links to news articles only
    • “filter:media” returns tweets with media only

I suggest commenting-out the whole function call just for another safety measure, sometimes you hit Run by mistake or whatever and it is best to make sure that the function is not called again unintentionally. This happened to me, I accidentally executed all the code within 15 minutes of my last API request and got the rate limit warning.

step3 Save your twitter data

Here is code on how to save your Twitter data as a CSV. The prepend_ids set to true is helpful for later data manipulation. Whatever variable name you used to search tweets is the first argument, then the new name (no .csv required in the name).

save_as_csv( twitter_var_df, 
              file_name = "new-name",
              prepend_ids = TRUE,
              na="",
              fileEncoding = "UTF-8")

step 4 Read in the Twitter data

twitter_df = read_twitter_csv(file ="<path>" , unflatten = FALSE).

for an in depth guide see Detailed Guide