Case Study: How does a bike-share navigate speedy success?

Background information

In 2016, BikeMeNow launched a successful bike-share offering across Chicago. The bikes can be unlocked from one station and returned to any other station and anyone can choose from flexible pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders and customers who purchase annual memberships are BikeMeNow members. The finance department has concluded that annual members are more profitable than casual riders. The analytics department will focus to understand how casual riders and annual members use BikeMeNow bikes differently by analyzing historical bike trip data to identify trends. Primary stakeholders are the director of marketing and executive team, while the marketing analytics team is a secondary stakeholder.

Ask

The main problem for both teams is designing marketing strategies aimed at converting BikeMeNow casual riders into annual members. My role will be to analyze how annual members and casual riders use BikeMeNow bikes differently. Examining the data will help me get a broad sense of certain patterns that are occurring in the two different groups. Understanding the differences will provide more accurate customer profiles for each group - insights produced will help the marketing analytics team design high quality targeted ads for converting casual riders into members. The executive team will benefit from these insights to maximize the number of annual members causing growth to the company.

Prepare

I will utilize BikeMeNow historical trip data from February 2021 to January 2022 in order to analyze and identify trends. (Datasets have a different name because BikeMeNow is a fictional company. The data has been made available by Motivate International Inc. under this license. Due to data privacy issues, personal information has been removed or encrypted.) This is public data that can be used to explore how different customer types are using BikeMeNow bikes. Data is stored in ".csv" format in spreadsheets with 12 .CSV files and is organized into records and fields.  For data visualization purposes I will be using this bicycle station data

Data description 

01) 2021-02_divvy_trip-data.csv  02) 2021-03_divvy_trip-data.csv  03) 2021-04_divvy_trip-data.csv  04) 2021-05_divvy_trip-data.csv  05) 2021-06_divvy_trip-data.csv  06) 2021-07_divvy_trip-data.csv  07) 2021-08_divvy_trip-data.csv  08) 2021-09_divvy_trip-data.csv  09) 2021-10_divvy_trip-data.csv  10) 2021-11_divvy_trip-data.csv  11) 2021-12_divvy_trip-data.csv  12) 2022-01_divvy_trip-data.csv 

* ride_id               #Ride id - unique* rideable_type         #Bike type - Classic, Docked, Electric* started_at            #Trip start day and time* ended_at              #Trip end day and time* start_station_name    #Trip start station* start_station_id      #Trip start station id* end_station_name      #Trip end station* end_station_id        #Trip end station id* start_lat             #Trip start latitude  * start_lng             #Trip start longitute   * end_lat               #Trip end latitude  * end_lat               #Trip end longitute   * member_casual         #Rider type - Member or Casual  

Bias and credibility of data

Reliable: Data used is public, contains accurate, complete, and unbiased information on BikeMeNow historical bike trips. 

Original: Primary data source, BikeMeNow gathers this data first-hand. 

Comprehensive: Data includes trip duration, user type, and gender, and destination station name which is needed for analyzing user trends.

Current: Historical data records are from 2013 - 2022. 

Cited: All datasets used are publicly available by BikeMeNow and the City of Chicago - government data is often reliable. 

Process

I used Excel to remove any unwanted, duplicate data, cells with identical values but different letter cases, and formatting issues. I verified that the data is stored appropriately and ready for analysis. I stationed the files in a temporary folder, created subfolders for the .CSV and .XLS files to have a copy of the original data. After launching Excel I opened each file and clicked to Save As an Excel Workbook file. I did the following for each .XLS file. After making these updates, I saved each .XLS file as a new .CSV file.

Changed format of started_at and ended_at columns 

Created column ride_length  

Created column ride_date  

Created column ride_month 

Created column ride_year

Created column start_time 

Created column end_time

Created column day_of_week

The datasets being used are large therefore I moved the analysis to BigQuery, it's a great platform to use since there's no infrastructure to manage and database administrator, so that the user can focus on analyzing data using familiar SQL. 

 Creating quarterly tables 


To perform analysis by season I created Q1, Q2, Q3 and Q4 tables. Note: There will be two Q1 tables– one for 2021 and 2022 since the data being used is from February / March  2021 and January 2022:

# I created 2021_Q2 and then repeat it again for the four tables# Using UNION to join 2021_Q2 tables: APR, MAY, JUN  SELECT             ride_id,             rideable_type,             started_at,             ended_at,            ride_length,             ride_date,            ride_month,            ride_year,            start_time,            end_time,            day_of_week,            start_station_name,             start_station_id,             end_station_name,             end_station_id,             start_lat,             start_lng,             end_lat,             end_lng,             member_casual,            'Q2' AS quarterFROM         `BikeMeNow-cs-341119.biketrips.2021_04_biketrips`UNION DISTINCT  SELECT         ride_id,         rideable_type,         started_at,         ended_at,         ride_length,         ride_date,        ride_month,        ride_year,        start_time,        end_time,        day_of_week,        start_station_name,        start_station_id,         end_station_name,         end_station_id,         start_lat,         start_lng,        end_lat,         end_lng,         member_casual, 'Q2' AS quarterFROM         `BikeMeNow-cs-341119.biketrips.2021_05_biketrips`UNION DISTINCT  SELECT         ride_id,         rideable_type,         started_at,         ended_at,         ride_length,         ride_date,        ride_month,        ride_year,        start_time,        end_time,        day_of_week,        start_station_name,         start_station_id,         end_station_name,         end_station_id,         start_lat,         start_lng,        end_lat,         end_lng,         member_casual, 'Q2' AS quarterFROM         `BikeMeNow-cs-341119.biketrips.2021_06_biketrips`

Clean and transform day of week

Additional data cleaning is needed on the new table. I updated the format for day_of_week from FLOAT to STRING. Then I changed the values from numbers to their corresponding day names (i.e. 1 = Sunday, 7 = Saturday. We’ll start with 2021_Q1 and repeat for the remaining four tables. After the tables are organized into quarters, I can start to delete the original monthly tables. 

# Update 'day_of_week' format with CAST() SELECT  ride_id,         rideable_type,         started_at,         ended_at,  ride_length,         ride_date,        ride_month,        ride_year,        start_time,        end_time,        CAST(day_of_week AS STRING) AS day_of_week,  start_station_name,  start_station_id,         end_station_name,  end_station_id,         start_lat,         start_lng,  end_lat,         end_lng,         member_casual, quarterFROM  `BikeMeNow-cs-341119.biketrips.2021_Q1`
# Update 'day_of_week' values with CASE WHEN UPDATE  `BikeMeNow-cs-341119.biketrips.2021_Q1`SET   day_of_week =             CASE                WHEN day_of_week = '1' THEN 'Sunday'                WHEN day_of_week = '2' THEN 'Monday'                WHEN day_of_week = '3' THEN 'Tuesday'                WHEN day_of_week = '4' THEN 'Wednesday'                WHEN day_of_week = '5' THEN 'Thursday'                WHEN day_of_week = '6' THEN 'Friday'                WHEN day_of_week = '7' THEN 'Saturday'             ENDWHERE        day_of_week IN ('1', '2', '3', '4', '5', '6', '7')

Analyze

Quarterly data exploration  

I selected a couple of columns from 2021_Q1 to preview in a temporary table - doing this allows me spot any potential trends and relationships to explore more in depth. The query below returned 278,119 rows which is the number of recorded trips in this quarter. 

# Select columns from Q1 data to preview
SELECT          ride_id,        started_at,        ended_at,        ride_length,        day_of_week,         start_station_name,        end_station_name,        member_casualFROM         `BikeMeNow-cs-341119.biketrips.2021_Q1`ORDER BY         ride_id DESC

Total Trips    

I created total columns for overall, annual members, casual rider and calculated the overall total percentages for both types. Out of 278,118 total trips in 2021_Q1, 66% were from annual members while 34% were from casual riders.

# Total Trips: Members vs Casual # Looking at overall, annual member and casual rider totals
SELECT         TotalTrips,        TotalMemberTrips,        TotalCasualTrips,        ROUND(TotalMemberTrips/TotalTrips,2)*100 AS MemberPercentage,        ROUND(TotalCasualTrips/TotalTrips,2)*100 AS CasualPercentageFROM         (        SELECT                COUNT(ride_id) AS TotalTrips,                COUNTIF(member_casual = 'member') AS TotalMemberTrips,                COUNTIF(member_casual = 'casual') AS TotalCasualTrips,        FROM                `BikeMeNow-cs-341119.biketrips.2021_Q1`        ) 

Average ride length

I want to know how average ride_length is different than these groups. The results are 14 minutes for member riders, 37 minutes for casual riders, and 22 minutes for overall riders.  Casual riders average about 23 minutes  more per ride. 

# Avergage Ride Lengths: Members vs Casual # Looking at overall, member and casual average ride lengths

SELECT        (        SELECT                 AVG(ride_length)        FROM                 `BikeMeNow-cs-341119.biketrips.2021_Q1`        ) AS AvgRideLength_Overall,        (        SELECT                 AVG(ride_length)         FROM                 `BikeMeNow-cs-341119.biketrips.2021_Q1`        WHERE                 member_casual = 'member'        ) AS AvgRideLength_Member,        (        SELECT                 AVG(ride_length)         FROM                 `BikeMeNow-cs-341119.biketrips.2021_Q1`        WHERE                 member_casual = 'casual'        ) AS AvgRideLength_Casual

Maximum ride lengths 

Next, I will want to find the maximum values for ride_length to see if there's anything influencing the average casual rider. The results show that the casual riders average was impacted by outlier(s) - the longest trip duration for causal riders was 528 hours and for annual riders it was about 26 hours. 

# Max Ride Lengths: Members vs Casual  # Looking at max ride lengths to check for outliers
SELECT         member_casual,        MAX(ride_length) AS ride_length_MAXFROM         `BikeMeNow-cs-341119.biketrips.2021_Q1`GROUP BY         member_casualORDER BY         ride_length_MAX DESCLIMIT         2

Median ride lengths

I am going to use the median instead of average because there's outliers impacting the average - median will be more accurate for the analysis. The results show 18 minutes for casual riders and 10 minutes for annual members.  

# Looking at median ride lengths
SELECT        DISTINCT median_ride_length,        member_casualFROM         (        SELECT                 ride_id,                member_casual,                ride_length,                PERCENTILE_DISC(ride_length, 0.5 IGNORE NULLS) OVER(PARTITION BY member_casual) AS  median_ride_length        FROM                 `BikeMeNow-cs-341119.biketrips.2021_Q1`        )ORDER BY         median_ride_length DESC LIMIT 2

Busiest day for rides

I want to find out which day out of the week has the most rides for annual members and casual rides. It turns out that Saturday is the most popular day for annual members and casual riders. 

# Which days have the highest number of rides SELECT        member_casual,         day_of_week AS mode_day_of_weekFROM         (        SELECT                DISTINCT member_casual, day_of_week, ROW_NUMBER() OVER (PARTITION BY member_casual ORDER BY COUNT(day_of_week) DESC) rn        FROM                `BikeMeNow-cs-341119.biketrips.2021_Q1`        GROUP BY                member_casual, day_of_week        )WHERE        rn = 1ORDER BY        member_casual DESC LIMIT 2

Median ride length per day   

I'm going to determine what the median ride lengths per day are for annual members and casual riders. The results indicate that median ride length for casual riders on Sunday, Saturday, Monday, Tuesday, and Wednesday is almost double the amount for annual members on Saturday, Sunday, Monday, Tuesday, and Wednesday. 

# Median ride lengths per day for annual members  
SELECT        DISTINCT median_ride_length,        member_casual,        day_of_weekFROM         (        SELECT                 ride_id,                member_casual,                day_of_week,                ride_length,                PERCENTILE_DISC(ride_length, 0.5 IGNORE NULLS) OVER(PARTITION BY day_of_week) AS  median_ride_length        FROM                 `BikeMeNow-cs-341119.biketrips.2021_Q1`        WHERE                member_casual = 'member'        )ORDER BY         median_ride_length DESC LIMIT 7
# Median ride lengths per day for casual riders   SELECT        DISTINCT median_ride_length,        member_casual,        day_of_weekFROM         (        SELECT                 ride_id,                member_casual,                day_of_week,                ride_length,                PERCENTILE_DISC(ride_length, 0.5 IGNORE NULLS) OVER(PARTITION BY day_of_week) AS  median_ride_length        FROM                 `BikeMeNow-cs-341119.biketrips.2021_Q1`        WHERE                member_casual = 'casual'        )ORDER BY         median_ride_length DESC LIMIT 7

Total rides per day

I want to know the total rides per day therefore I will create columns for overall total, annual members and casual riders.  

# Looking at total number of trips per day 
SELECT          day_of_week,        COUNT(DISTINCT ride_id) AS TotalTrips,        SUM(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END) AS MemberTrips,        SUM(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END) AS CasualTripsFROM         `BikeMeNow-cs-341119.biketrips.2021_Q1`GROUP BY         1ORDER BY         total_trips DESC LIMIT 7

Share

Over the course of 12 months, annual members represented 55% and casual riders represented  45% of BikeMeNow total trips. As shown in the chart below, the percentage shifts throughout different parts of the year. 

Summer vs Winter

Quarter 3 (July, August, and September) was the busiest time of the year for overall bike trips due to the warm weather during the summer months. There's a major decrease in total rides during the winter months of quarter 1 (January, February and March). Annual members outnumbered casual riders in every quarter except quarter 3. The annual members nearly doubled the casual ridership in quarter 1 and quarter 4.

Median ride length 

According to the analysis; casual riders consistently have longer rides than annual members.

Preferred day of the week

According to the mode for each quarter, casual riders preferred Saturday as the most popular day of week and annual members favored Wednesday being the most popular day. 

Median ride length per day of the week 

Type of bike

The bar chart below represents the type of bike preferred by casual riders and annual members during a 12 month period. It turns out that classic bicycles are favored by both groups. 

Left: Annual average ride length by bike type, Right: Annual longest ride length by bike type

From the charts above I noticed that docked bikes are the outliers influencing ride length averages. 

Act

Based on the data collected I have produced recommendations that will help the marketing analytics team design targeted ads for converting casual riders into members. 

Focus

Due to the increased demand by casual riders during Spring and Summer it serves as an opportunity for promotional and advertisement campaigns.  

Recommendation

Begin an email campaign that's targeted towards casual riders starting in Spring through the Summer. The campaign will offer a variety of promotions that will be released on a set schedule. For example; offering discounts on annual memberships if riders sign up before the busy season, introduce visual advertisements on social media platforms that emphasizes to pros of becoming an annual member compared to casual rider, and launch aminated countdowns until Summer starts to engage the audience. 

Focus: 

Annual members use BikeMeNow to travel back and forth from work, it's more active during the week and there's steady ride length. On the contrast casual riders are most noticeable during the weekends for entertainment use. Ride length increases during the weekends. 

Recommendation

Build new member packages that are custom-made towards entertainment and weekend activities, create user-centered marketing channels to recommend personalized membership packages. These packages include entertainment pairings that users can pick from, discounted weekend rates, and discounted entertainment packages for users that refer friends to the program. 

Focus: 

The frequency of trips from annual members and casual riders are similar and seem to be the highest during evening commute hours except mornings, specifically between 5 AM and 9 AM. 

Recommendation

Develop digital media campaigns that highlight the advantages of early morning exercising. The mornings are an opportunity for growth and could potentially lead casual riders to switch over to an annaul membership plan.