Movie Data

Week 30 of tidy Tuesday is data from, it documents movies and their profit/cost.


The plotly library creates interactive graphs. Graphs can be created using base R graphics or using ggplot2.

The graph below shows the median production cost of movies by genre and year. Genre’s can be hidden interactively.


Loading and tidying data


 # Read data
horror_movie <- read_csv("")

 # Tidy dates and calculate return
horror_movie <- horror_movie %>% select(-X1) %>%
  mutate(release_date = lubridate::mdy(release_date), 
         release_year = lubridate::year(release_date),
         return = worldwide_gross - production_budget)

  # Aggregate and summarise data by year and genre
horror_movie %>% 
  group_by(release_year, genre) %>%
  summarise(avg_return = mean(return), 
            median_return = median(return),
            avg_production = mean(production_budget), 
            median_production = median(production_budget),
            avg_gross = mean(worldwide_gross), 
            median_gross = median(worldwide_gross)) -> movie_agg

ggplot graph

  # ggplot of median production cost versus year by genre
yearly_cost_plot <- movie_agg %>% 
  ggplot(aes(release_year, median_production, colour=genre)) +
  geom_line() +
  scale_y_continuous(labels = dollar_format(scale=0.000001, suffix="M")) +  # Change y-axis labels to dollars
  labs(title = "Median Production Cost", x = "Year", y = NULL, colour = NULL)  # remove legend title


plotly graph

After creating a ggplot graph, the ggplotly function will create a plotly version of the ggplot.

 # Create plotly graph
yearly_cost_plotly <- ggplotly(yearly_cost_plot) %>% 
  layout(margin = list(l = 45))  # add extra space on left for axis labels

shiny::div(yearly_cost_plotly, align = "center")  # need the div function to centre the graph on the webpage