Skip to content

rmwenzel/ames_house_prices

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A learning project, based on the Kaggle knowledge competition House Prices: Advanced Regression Techniques. The aim is to experience and document all the steps in an end-to-end predictive modeling problem in great detail.

Think of it as an overblown kernel :)

The original dataset is available here. A version of the dataset is available on Kaggle - this is the dataset we'll be working with.

Overview

The project consists of three stages, processing, exploratory analysis, and predictive modeling. It has the following directories:

  • /notebooks - Jupyter notebooks for processing, exploratory analysis, and predictive modeling
  • /codes - Supplemental code for the notebooks
  • /data - Datasets.
  • /training - Model training artifacts for persistence purposes
  • /submissions - Model predictions for submission to Kaggle competition (neccesary for evaluating performance predictive models since Kaggle has the test set)

Data

There are several related data files in /data:

  • train.csv, test.csv - Original Kaggle train and test data
  • orig.csv - Train and test data together with some metadata (dtypes and MultiIndex) for convenience in loading to pandas.DataFrame
  • clean.csv - Processed and cleaned version of orig.csv

Both orig.csv and clean.csv are created and discussed in the notebook process.ipynb. They can also be built by running the script process.py

About

Analysis and predictive modeling of Ames housing dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published