**Analysis of Potential Matatu Income Data**

Brief History of Matatus

Matatus are the go to method of public transportation in Kenya, originating in the 1960s. The term "matatu" comes from the Swahili word for "three," referring to the three cents charged per trip when they first began operating. Initially, they were informal, privately owned minibuses or vans that filled a gap in public transportation. Over the years, matatus have become a cornerstone of Kenya's transport sector, evolving into a more structured yet still vibrant and culturally significant industry.

**Background on Analyzing Hypothetical Matatu Income Data**

In this project, I aim to analyze hypothetical income data for matatus, which has been generated synthetically to simulate real scenarios. Using hypothetical data allows us to focus on analytical methods, data modeling, and interpretation without the need for access to actual operational data, which may be sensitive or unavailable.

What the Hypothetical Data Includes:

The data contains simulated data for a week which includes:

Daily Income:

This is fare paid by passengers.

Time Period:

The income data is categorized into specific time periods to better analyze trends throughout the day. Below is a breakdown of the time periods:

Time Period	Description	Approximate Time Range
Early Morning	The first part of the day, often busy with commuters starting their day.	4:00 AM - 6:59 AM
Morning	The regular morning rush with peak passenger traffic.	7:00 AM - 9:59 AM
Late Morning	A quieter period as the morning rush subsides.	10:00 AM - 11:59 AM
Early Afternoon	Transition period with moderate passenger flow.	12:00 PM - 1:59 PM
Afternoon	Midday activity with commutters.	2:00 PM - 3:59 PM
Late Afternoon	Increasing activity as people begin their commute home.	4:00 PM - 5:59 PM
Evening	High traffic as the majority return home from work.	6:00 PM - 6:59 PM
Night	Reduced activity with occasional late-night travelers.	7:00 PM - 11:59 PM

This breakdown helps identify peak and off-peak periods, providing insights into revenue fluctuations and operational efficiency.

Date:

The simulated dates that the transactions happened.

Payment Channels:

This comprises of cash and mobile money channels (M-PESA, Airtel Money & T-Kash)

Gender:

This comprises of male and female genders.

The generated matatu data can be accessed here : Download the Excel file

Data Findings

Total Revenue per Day

The above plot gives a breakdown on the total revenue earned per day with the highest revenue being earned on the 07/11/2024.

Number of transactions per time period

The graph shows the average number of transactions happening in each time period with the highest number of transactions taking place late morning periods.

Gender Distribution

The above plot shows that females appear to use matatus more than males.

Payment Distribution by Time Period

The above plot shows the breakdown of the payment distribution used at different time periods.

Average Fare Per Time Period

The above plot shows the average fare paid during each time period.

Payment Channel by Gender

The above plot show the above breakdown of the payment channel by gender.

Average Weekly Fare Comparison

The above plot shows the comparison between the average farea paid between the off-peak and peak periods.

Transaction Load vs Vehicle Availability

The above plot shows the transaction volume between the transaction load versus the vehicle availability of 1 matatu.

Transaction Volume Trend

The above plot show the volume trend of the data showing trends in transactions across different days.

Profitability by Payment Channel

The above plot show the profitability across the different payment channel by each prefered payment channel that may be preferred by a matatu passenger

Forecast of transactions

The above plot shows the observed and predicted fare rates.

Linear Regression of Actual Vs Predicted Fare Amounts

The above plot shows the linear regression between the predicted and actual fare amounts where there is comparison of the model's predicted values to the actual values and shows how well they align

Findings from the Apriori Analysis

The analysis conducted using the Apriori algorithm yielded a set of association rules that highlight patterns in the hypothetical matatu income data.

Link to the association rule spreadsheet:Download the Excel file

Below is a summary of the key findings from the spreadsheet:

Key Metrics Overview:

Support:
- Support values across the rules are relatively low (<5%), indicating that the specific combinations of antecedents (e.g., time periods) and consequents (e.g., payment methods) are not very frequent.
- Example: The rule "Early Morning → Cash" has a support of 3.1%, meaning this combination is present in 3.1% of all transactions.
Confidence:
- Confidence values indicate the likelihood of a consequent occurring given the antecedent.
- Higher confidence rules include "Morning → Cash" with 28.8%, suggesting a strong association between the morning time period and cash payments.
Lift:
- Lift values mostly hover slightly above 1 (e.g., 1.05–1.12), indicating weak but positive associations.
- For example, the rule "Afternoon → Cash" has a lift of 1.079, showing that afternoon transactions are only slightly more likely to involve cash payments than by random chance.
Jaccard, Leverage, and Conviction:
- These secondary metrics further validate the associations:
  - Jaccard values are low (<0.05), reinforcing that the intersections of antecedents and consequents occur infrequently.
  - Leverage values are close to zero, confirming that the discovered rules do not drastically deviate from random co-occurrences.
  - Conviction values (>1) suggest modest dependency between antecedents and consequents.

Notable Patterns:

Cash as a Predominant Payment Method:
- Many rules indicate Cash as a frequent consequent across various time periods, reflecting its dominance as the primary payment channel.
Time Period Influence:
- Rules highlight specific patterns, such as:
  - Early Morning → Cash (28.8% confidence)
  - Late Morning → Cash (27.1% confidence)
Weak Relationships Overall:
- While the confidence levels suggest some predictable trends, the low support and moderate lift values indicate that these rules are not strongly definitive for most transactions.

Practical Applications:

Optimize Payment Methods:
- Insights into dominant cash payments during specific times can help introduce digital payment incentives during low-confidence periods (e.g., late afternoon or night).
Peak Time Adjustments:
- High-confidence rules for busy periods like Morning can guide scheduling and resource allocation strategies.
Future Data Needs:
- The weak associations suggest potential benefits in expanding the dataset or including additional variables, such as passenger demographics or route types, to uncover more actionable patterns.

Cluster Analysis Findings

The cluster analysis conducted on the transaction amounts grouped the data into three clusters. Each cluster represents a group of transactions with similar patterns in terms of the average transaction amount. Below is a summary of the findings:

Cluster Grouping

Cluster	Average Transaction Amount (KES)	Description
0	103.38	Transactions with slightly lower-than-average amounts.
1	104.42	Cluster with the highest average amount, possibly representing peak times or premium routes.
2	103.43	Similar to Cluster 0 but with a marginally higher average amount.

Key Observations

Marginal Differences:
- The average transaction amounts across clusters range from KES 103.38 to KES 104.42, showing only slight variations.
Cluster 1 Dominance:
- Cluster 1 stands out with the highest average amount, suggesting potential alignment with high-demand periods or specific routes.
Homogeneity in Data:
- The minimal variation in amounts suggests a relatively uniform dataset in terms of pricing or income patterns.

Random Forest Classifier Results

The Random Forest Classifier was applied to the income data, producing exceptional results. Below is a summary and interpretation of the findings:

Performance Metrics

Metric	Value
Precision	1.00
Recall	1.00
F1-Score	1.00
Accuracy	1.00

Key Observations:

Perfect Classification:
- The classifier achieved 100% accuracy, with precision, recall, and F1-score all at their maximum values of 1.00. This means the model made no errors in classifying the dataset.
Support:
- A total of 421 data points were used, all belonging to a single class (labeled as "0"). This suggests uniformity or imbalance in the dataset.
Feature Importance:
- Random Forest provides a measure of feature importance, which can identify the most influential factors in income classification.

Insights from Feature Importance:

The most critical features influencing classification should be analyzed to guide practical decision-making. For example:

Time of Day: Which time periods generate the highest or lowest income?
Payment Method: Is cash or digital payment a more significant contributor to income patterns?

This information can help optimize operations or pricing strategies.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Matatu Data		Matatu Data
Plots		Plots
Matatu_Income_Data_Analysis.ipynb		Matatu_Income_Data_Analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

**Analysis of Potential Matatu Income Data**

Brief History of Matatus

**Background on Analyzing Hypothetical Matatu Income Data**

Data Findings

Findings from the Apriori Analysis

Key Metrics Overview:

Notable Patterns:

Practical Applications:

Cluster Analysis Findings

Cluster Grouping

Key Observations

Random Forest Classifier Results

Performance Metrics

Key Observations:

Insights from Feature Importance:

About

Releases

Packages

Languages

LexMainye/Matatu-Income-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Analysis of Potential Matatu Income Data

Brief History of Matatus

Background on Analyzing Hypothetical Matatu Income Data

Data Findings

Findings from the Apriori Analysis

Key Metrics Overview:

Notable Patterns:

Practical Applications:

Cluster Analysis Findings

Cluster Grouping

Key Observations

Random Forest Classifier Results

Performance Metrics

Key Observations:

Insights from Feature Importance:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

**Analysis of Potential Matatu Income Data**

**Background on Analyzing Hypothetical Matatu Income Data**

Packages