Skip to content

Analysis for the ICPSR 36404 dataset using descriptive machine learning

License

Notifications You must be signed in to change notification settings

gahag-ml/icpsr-36404-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ICPSR 36404 Analysis

Analysis of the ICPSR 36404 dataset using descriptive machine learning. This work was produced as our final project for the Descriptive Learning discipline in Universidade Federal de Minas Gerais.

Paper

The paper (portuguese) for this work can be found under the paper directory.

Frequent Itemset Mining

Author: Gabriel Bastos [email protected]

First, download the delimited version of the dataset. It is a tsv file, which is used as input for the analysis program.

Then, install the Rust stable toolchain.

Compile this project with cargo build --release. No additional steps should be necessary in order to compile.

The produced program provides the following usage:

analyzer 0.1.0
gahag <[email protected]>


USAGE:
    icpsr-36404-analysis [SUBCOMMAND]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    distribution    load the original dataset from stdin and display the data distribution
    help            Prints this message or the help of the given subcommand(s)
    load            load the serialized matrix from stdin and run the algorithm
    run             runs the entire pipeline
    save            load the original dataset from stdin and output the serialized matrix to stdout
icpsr-36404-analysis-run 
runs the entire pipeline

USAGE:
    icpsr-36404-analysis run [FLAGS] [OPTIONS] <min_sup>

FLAGS:
    -h, --help           Prints help information
        --recidivists    whether to include only recidivists
    -V, --version        Prints version information

OPTIONS:
        --admission-type <admission_type>    include only the given admission type [possible values: parole, new, other]
        --race <race>                        include only the given race [possible values: black, white, hispanic,
                                             other]
        --sex <sex>                          include only the given sex [possible values: male, female]

ARGS:
    <min_sup>    the minimum support ratio ([0, 1.0])

Subgroup Discovery

Author: Fernanda [email protected]

First, install the necessary python packages to run the notebook:

pip install scikit-learn
pip install pandas
pip install datetime
pip install numpy
pip install pysugbroup

After, it is necessary to put the data file on the same directory, or update the path in the notebook:

data_path = "36404-0001-Data.tsv"

That's it. Now just run the notebook with Jupyter. You can also select the subgroup max_size by altering the depth parameter in the Subgroup Discovery section.

Additional work

The following Rust crates were developed in order to support this work:

Licence

This project is licenced under the MIT Licence.

About

Analysis for the ICPSR 36404 dataset using descriptive machine learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published