Affordable Housing in San Diego


Identifying Housing Needs Through Machine Learning

Purpose and Overview

The purpose and goal of this project is to determine where is San Diego would residents benefit the most from low-moderate
income housing broken down by zipcode. We used a K-means model to analyze several different categories. The categories
that we chose to focus on are accessibility to transit, food(markets), hospitals, and parks. To get this data we obtained census flat files
for the population, income, and living percentages, performed web-scraping for the hospital information, utilized the San Diego transit
geoJSON data, as well as used Google's PlacesAPI for the markets and parks. We preprocessed the all data, compiled, trained, and
evaluated the data to get our results.

Census Data

2020 Census Data was sourced from Census.Gov. We filtered this data to show the number of households, families, and non-family households for each zipcode. From this dataset we display the Median income, mean income, and the Percent of Income allocated to Housing Cost for each type of household.

Select Household Type:
Select Zip Code:

Machine Learning Model

For our analysis we devised an unsupervised K-Means Machine Learning model. The dataset inputted into the model includes median income, mean income, and percent of income allocated to housing costs for households, families, and non-family households. Additionally, we identified how many amenties are found in each zipcode, including hospitals, schools, markets, parks, and public transportation.

3D Scatter Plot of Principal Component Analysis (PCA)

Cluster 4

The zipcodes identified in the K-Means cluster model as being in Class 4 should be first priority when considering where to place low-income affordable housing units. These areas are highlighted in the map on the right and infomration on these areas can be found in the table below.

Zip Code Median Income (dollars) Percent of Income allocated to Housing Cost Class
91910 70283.0 46.8 4
91911 65074.0 50.9 4
91950 48359.0 44.8 4
91977 71396.0 44.8 4
92020 61830.0 43.0 4
92021 60510.0 43.7 4
92105 48072.0 47.2 4
92113 43958.0 41.9 4
92114 73090.0 51.2 4
92115 56137.0 36.2 4
92126 99376.0 33.0 4
92154 70846.0 45.2 4

Cluster 0

The zipcodes identified in the K-Means cluster model as being in Class 0 would be second priority when considering where to place low-income affordable housing units. These areas are highlighted in the map on the left and infomration on these areas can be found in the table below.

Zip Code Median Income (dollars) Percent of Income allocated to Housing Cost Class
91932 59795.0 39.6 0
91941 94111.0 39.4 0
91942 66551.0 34.1 0
91945 67236.0 44.1 0
92008 86046.0 25.8 0
92019 85067.0 40.1 0
92025 56866.0 40.4 0
92026 76534.0 36.3 0
92027 70009.0 43.5 0
92028 80775.0 37.1 0
92040 83692.0 40.3 0
92054 63355.0 31.2 0
92056 85047.0 34.3 0
92057 81339.0 33.8 0
92058 57213.0 31.7 0
92065 100645.0 34.1 0
92069 77618.0 35.7 0
92071 85751.0 36.2 0
92078 91564.0 33.6 0
92081 80584.0 30.3 0
92083 68551.0 32.1 0
92084 77970.0 39.8 0
92102 54862.0 35.3 0
92107 84190.0 35.0 0
92108 80572.0 26.8 0
92110 77579.0 33.7 0
92111 77249.0 40.8 0
92117 91347.0 34.9 0
92120 102597.0 31.4 0
92123 90602.0 35.6 0
92124 94485.0 30.3 0
92139 75576.0 52.7 0
92173 48967.0 44.1 0
92672 89029.0 35.7 0

Tableau Storyboard

Affordability and Accessibility

Through the K-Means Machine Learning model we identified Zip Codes that have a need for more affordable housing.

Next we asked the question: “In which of these areas would the people living in those accommodations benefit the most?”

To answer this question we investigated the quantity and quality of amenities in these areas. Using Google Places API search we were able to scrape the quantity of operational supermarkets and public parks in each zip code. Public transportation data was gathered from the San Diego MTS website. We were able to find the quantity of public hospitals in each zip code using ushospitalfinder.com. Lastly, from data gathered from greatschools.com we were able to identify which of these zip codes have the best ranking schools.

Finally, the accessibility data was compared to the K-Means Machine Learning to identify 3 areas where affordable housing is needed and which areas would bring the greatest benefit to the people who would live in them.