Which Are the Most Expensive Cities in America to Book a Tree Fort?

It’s the year 2020 and the sharing-economy is in full gear. Everything from hosting city dinners to ride-sharing the great outdoors is fair game. Capitalizing on their own interests and experience, eight-year-old startup prodigees, Sally Hershfield and Felicia Alvarez, have taken the sharing-economy by storm with their ever-growing TreefortBnB enterprise.

Their nascent business venture has become such a hit that industry powerhouse AirBnb has not only taken notice but also action. They’ve identified markets where TreefortBnb has expanded most successfully and it’s been rumored that AirBnb aims to undercut their marketshare by providing incentives that push their users to advertise more competitive prices.

Using data obtained from Priceonomics, I’ve identified the median price of tree fort rentals for the top 100 cities with the most units on the market. Indianapolis IN, Malibu CA, and Park City UT are the most expensive cities in the US to rent a tree fort by median pricing, something AirBnb hopes to use to its own advantage by provding lower cost classical home alternatives.

##Load dependencies
library(pipeR)
library(dplyr)
library(formattable)

##URL to download data
url<-"https://s3.amazonaws.com/pix-media/Data+for+TreefortBnB+Puzzle.csv"

#Read in data and aggregate # of units by city and subset to top 100
df1 <- url %>%
  read.csv(col.names=c("ID","City","State","Price","Reviews")) %>%
  group_by(State, City) %>%
  summarize(Unit_Sum=n()) %>% 
  arrange(desc(Unit_Sum)) %>%
  head(100) 

#Read in data again and obtain median rental price by city
df2 <- url %>%
  read.csv(col.names=c("ID","City","State","Price","Reviews")) %>%
  group_by(State, City) %>%
  summarize(Price_Median=median(Price)) %>%
  arrange(desc(Price_Median)) 

#Merge two data frames
df3 <-merge(df1, df2, by=c("City", "State"), all.x=TRUE, all.y=FALSE)
df3<-arrange(df3, desc(Price_Median))

#Rename columns
colnames(df3)[3]<-"Unit Total"
colnames(df3)[4]<-"Median Price"

#Produce output table in html
formattable(df3, list(
  'Median Price' = color_bar("orange")))
City State Unit Total Median Price
Indianapolis IN 251 650.0
Malibu CA 92 304.0
Park City UT 229 299.0
Healdsburg CA 49 275.0
Truckee NV 67 275.0
Laguna Beach CA 68 268.5
Incline Village NV 118 259.0
Manhattan Beach CA 55 209.0
Charlotte NC 225 200.0
Napa CA 68 200.0
Sonoma CA 115 200.0
Austin TX 2836 199.0
La Jolla CA 53 195.0
Sausalito CA 71 190.0
Hermosa Beach CA 48 189.5
Sunny Isles Beach FL 161 180.0
New York NY 8043 170.0
Beverly Hills CA 74 160.0
Boston MA 613 160.0
Newport Beach CA 84 160.0
Marina del Rey CA 80 155.0
Venice CA 242 155.0
Louisville KY 86 150.0
Miami Beach FL 1345 150.0
Mill Valley CA 80 150.0
New Orleans LA 833 150.0
San Francisco CA 3622 150.0
Santa Monica CA 500 150.0
Miami FL 640 149.5
Taos NM 52 148.5
Sebastopol CA 49 145.0
San Diego CA 986 130.0
Scottsdale AZ 121 129.0
Washington DC 1402 129.0
West Hollywood CA 229 129.0
Cambridge MA 300 125.0
Charleston SC 84 125.0
Las Vegas NV 291 125.0
Naples FL 50 125.0
San Rafael CA 61 125.0
Santa Cruz CA 127 125.0
Anchorage AK 50 123.5
Brookline MA 47 120.0
Palo Alto CA 100 120.0
Santa Rosa CA 71 120.0
Hollywood FL 150 115.5
Colorado Springs CO 54 113.0
Davenport FL 108 113.0
Mountain View CA 72 113.0
Brooklyn NY 4368 110.0
Kissimmee FL 266 110.0
Los Angeles CA 3236 110.0
Savannah GA 91 107.0
Arlington VA 214 106.5
Paris RI 48 105.5
Honolulu HI 363 105.0
Alexandria VA 66 104.5
Berkeley CA 328 100.0
Boulder CO 265 100.0
Carlsbad CA 54 100.0
Chicago IL 1255 100.0
Fort Lauderdale FL 151 100.0
Houston TX 175 100.0
Long Beach CA 113 100.0
Philadelphia PA 457 100.0
San Jose CA 104 100.0
Seattle WA 824 100.0
Somerville MA 99 100.0
Tampa FL 78 100.0
Pasadena CA 86 99.5
Queens NY 457 99.0
Ann Arbor MI 63 98.0
Minneapolis MN 138 98.0
Long Island City NY 190 95.5
Glendale CA 70 90.0
Oakland CA 434 90.0
Tucson AZ 195 90.0
Salt Lake City UT 140 89.5
Atlanta GA 302 89.0
Dallas TX 114 87.0
Denver CO 283 86.0
Eugene OR 78 85.0
Jersey City NJ 82 85.0
Madison WI 59 85.0
Phoenix AZ 167 85.0
Providence RI 72 85.0
Baltimore MD 137 80.0
Nashville TN 185 80.0
Pittsburgh PA 99 80.0
San Antonio TX 77 80.0
Portland OR 819 79.0
Albuquerque NM 95 75.0
Bronx NY 113 75.0
New Haven CT 52 75.0
Orlando FL 116 75.0
Silver Spring MD 53 75.0
Cincinnati OH 47 70.0
Sacramento CA 57 65.0
Durham NC 63 60.0
Raleigh NC 57 60.0

Basic listed prices, however, may not reflect actual rental costs because not all advertised listings are actually rented out. One way to identify median rental prices for listings that have actually been rented is to subset the data to include only those listings with reviews. The working assumption here is that listings with no reviews may never have actually been rented. For listings with at least one review, Carmel CA, Malibu CA, and Incline Village NV are the most expensive by median rental price.

#Read in data and aggregate # of units by city and subset to top 100
df1.b <- url %>%
  read.csv(col.names=c("ID","City","State","Price","Reviews")) %>%
  subset(Reviews > 0) %>% #subset to only include reviewed listings
  group_by(State, City) %>%
  summarize(Unit_Sum=n()) %>% 
  arrange(desc(Unit_Sum)) %>%
  head(100) 

#Read in data again and obtain median rental price by city
df2.b <- url %>%
  read.csv(col.names=c("ID","City","State","Price","Reviews")) %>%
  subset(Reviews > 0) %>% #subset to only include reviewed listings
  group_by(State, City) %>%
  summarize(Price_Median=median(Price)) %>%
  arrange(desc(Price_Median)) 

#Merge two data frames
df3.b <-merge(df1.b, df2.b, by=c("City", "State"), all.x=TRUE, all.y=FALSE)
df3.b<-arrange(df3.b, desc(Price_Median))

#Rename columns
colnames(df3.b)[3]<-"Unit Total"
colnames(df3.b)[4]<-"Median Price"

#Produce output table in html
df3.b<-head(df3.b, 10)
formattable(df3.b, list(
  'Median Price' = color_bar("orange")))
City State Unit Total Median Price
Carmel CA 21 300.0
Malibu CA 39 225.0
Incline Village NV 21 200.0
Laguna Beach CA 25 200.0
Truckee NV 30 200.0
Napa CA 44 184.5
Park City UT 101 179.0
Sunny Isles Beach FL 98 177.5
Sonoma CA 92 175.0
New York NY 5597 165.0

Let’s Map it!

Let’s see if these rental prices cluster by location across the lower 48 states. Using a function I wrote based on Microsoft’s Bing’s API, I’ve managed to batch geocode the locations provided in the TreefortBnb data file.

#Load Dependencies
library(ggplot2)
library(cowplot)

#Batch geocode with Bing
options(BingMapsKey="AtquMkrNaB7ME7krIpwQgrTwEqwB0HbUEpRKb9wfpBW-xCbgBzrGabEyUGkdpO0G")
df3$Location<-paste(df3$City, df3$State, sep = ", ")
df30<-t(geocodeVect(df3$Location, service="bing", returntype="coordinates"))
df10<-as.data.frame(df30)
colnames(df10)<-c("lat","lon")
df3<-cbind(df3, df10)
df3<-as.data.frame(df3)

#Subset to only include lower 48
df3<- df3 %>%
  subset(State!="HI") %>%
  subset(State!="AK")
row.names(df3) <- NULL 

states <- map_data("state")
states <- states[order(states$order), ]

ggplot() +
  geom_polygon(data=states, aes(long, lat, group=group),
               size=0.1,fill="black", color="white", alpha=0.75) +
  geom_point(data=df3, aes(lon, lat, size= `Median Price`), 
             color="blue", alpha=0.5) +
  geom_point(data=df3, shape = 1, 
             aes(lon, lat, size=`Median Price`), 
             colour = "white", alpha=0.7) +
   theme(axis.line=element_blank(),
         axis.text.x=element_blank(),
         axis.text.y=element_blank(),
         axis.ticks=element_blank(),
         axis.title.x=element_blank(),
         axis.title.y=element_blank(),
         legend.position="bottom",
         panel.background=element_blank(),
         panel.border=element_blank(),
         panel.grid.major=element_blank(),
         panel.grid.minor=element_blank(),
         plot.background=element_blank())

plot of chunk unnamed-chunk-4

As a final exercise, let’s compare median AirBnb prices to median TreefortBnb prices. To accomplish that, I’ve scraped data from Priceonomics’s online AirBnb to hotel comparison table. Unfortunately, the data on AirBnb rental prices is not as exhaustive as the data for TreefortBnb (less populated locations are missing for AirBnb), thus, I’ve restricted my analysis to the top 25 most rented locations for TreefortBnb in an effort to increase the overlap between the two samples. Data on AirBnb rental prices for cities on the TreeforBnb list but not on Priceonomics’s list of AirBnb prices I’ve imputed using data drawn from current averages for each location as shown on AirBnb’s own website (I used this method for 5 cities: Brooklyn, Queens, Kissimmee, Miami Beach, and Santa Monica).

The graph below shows the first difference between TreefortBnb and AirBnb median rental prices across the 25 busiest TreefortBnb cities across the US. Negative values indicate locations where median AirBnb rental prices are more expensive than their TreefortBnb counterparts. Queens, NY (highlighted in red) is the only location where median prices are equivalent, although, median prices in Washington DC, Boulder, CO, and Denver, CO are very close.

# Load dependencies
library(jsonlite)
library(data.table)

# Scrape data from site
appData <- fromJSON("http://priceonomics.com/static/js/hotels/all_data.json")

# replicate table
data2 <- data.frame(City = names(appData), Price = sapply(appData, function(x) x$air$apt$p), 
    stringsAsFactors = FALSE)

# Arrange data
setDT(data2)
data2 <- data2[order(Price, decreasing = TRUE)]
data2 <- as.data.frame(data2)
colnames(data2) <- c("Location", "AirPrice")

# Subset to only include 25 busiest tree fort markets
dddd <- df3
colnames(dddd)[3] <- "UnitTotal"
data3 <- arrange(dddd, desc(UnitTotal))
data3 <- head(data3, 25)

# Imput missing AirBnb values
data4 <- merge(data3, data2, by = "Location", all.x = TRUE, all.y = FALSE)
data4$AirPrice <- as.character(data4$AirPrice)
data4[6, 8] <- 104
data4[10, 8] <- 172
data4[13, 8] <- 158
data4[20, 8] <- 99
data4[23, 8] <- 141
colnames(data4)[5] <- c("PriceMedian")
data4$PriceMedian <- as.numeric(data4$PriceMedian)
data4$AirPrice <- as.numeric(data4$AirPrice)

# Generate first difference
data4$first_diff <- data4$PriceMedian - data4$AirPrice  #Negative value 
# mean AirBnb more expensive - positive means TreefortBnb is more
# expensive

# Create variable distinguishing neg from pos values of 1st difference
data4$abs_first_diff <- abs(data4$first_diff)
data4$Air_pricier <- ifelse(data4$first_diff < 0, 0, 1)
data4$Air_pricier <- as.factor(data4$Air_pricier)

# Plot
library(cowplot)  #theme
a <- ifelse(data4$first_diff < 1, ifelse(data4$Air_pricier != 0, "red", 
    "darkgrey"), "blue")

ggplot(data4, aes(x = Location, y = first_diff)) + geom_bar(stat = "identity", 
    aes(fill = Air_pricier)) + ylab("First Difference") + xlab("") + theme(axis.text.x = element_text(angle = 35, 
    hjust = 1)) + theme(axis.text.x = element_text(colour = a, size = 10)) + 
    scale_fill_manual(values = c("darkgrey", "darkblue")) + theme(legend.position = "none", 
    panel.background = element_blank())

plot of chunk unnamed-chunk-5