Skip to content
/ vframes Public

A powerful data manipulation library inspired by Pandas, designed specifically for the V language, using VDuckDB under the hood.

License

Notifications You must be signed in to change notification settings

rodabt/vframes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vframes 0.1.2

A DataFrame library inspired by Python's Pandas. Should work on Linux, Windows and Mac (still testing). Uses the powerful DuckDB database as a backend.

This is still a WIP. More functions, documentation, tutorials, and examples will be added soon.

Dependencies

VDuckDB wrapper library

Installation

v install https://github.com/rodabt/vduckdb
v install https://github.com/rodabt/vframes

Basic usage example

Make sure the files people-500000.csv, titanic.parquet, and data.json are in the same directory as your .v file (check the examples dir)

import vframes

// A convenience function for better printing
fn printlne(s string) {
    println('\n${s}\n')
}

fn main() {

    printlne("VFrames version: ${vframes.version()}")

    printlne("First initialize a new context. If no arguments are give, memory is used")
    mut ctx := vframes.init()
    
    printlne("Load 500.000 records from a CSV")
    df := ctx.read_auto('people-500000.csv')!
    
    printlne("Print first 5 records:")
    df.head(5)
    
    printlne("Assign first 10 records to variable x as []map[string]json2.Any")
    data := df.head(10, to_stdout: false)
    println(data)

    printlne("Print last 5 records:")
    df.tail(5)

    printlne("DataFrame info:")
    df.info()

    printlne("DataFrame shape: ${df.shape()}")

    printlne("Describe DataFrame:")
    df.describe()

    printlne("Create new DF with new column 'new_col'=Index*5, and select a subset of columns (Email, Phone, new_col):")
    df2 := df
        .add_column('new_col', 'Index*5')
        .subset(['Email','Phone','new_col'])  
    df2.head(10)

    printlne("Delete Email from new DF:")
    df3 := df2.delete_column('Email')
    df3.head(10)
    
    printlne("Load parquet (Titanic):")
    df4 := ctx.read_auto('titanic.parquet')!
    df4.head(10)

    printlne("Describe:")
    df4.describe()
    
    printlne("Average of Age and Fare by Sex and Embarked:")
    df5 := df4.group_by(['Sex','Embarked'],{"age_avg": "avg(Age)", "avg_fare": "avg(Fare)"})
    df5.head(10)
    
    printlne("Slice(2,3) of first DataFrame:")
    df6 := df.slice(2,3)
    df6.head(10)
    
    println("Reading a JSON file:")
    df7 := ctx.read_auto("data.json")!
    df7.head(10)

    printlne("Error control: try to load a non valid file")
    _ := ctx.read_auto('no_valid.csv') or { 
        eprintln(err.msg())
        vframes.empty()
    }

    ctx.close()
}

Considerations

  • VFrames uses DuckDB under the hood through the VDuckDB wrapper library, so in theory all operations allowed by DuckDB should be supported by VFrames eventually.
  • Currently by design DataFrames are inmutable, so when mutating you should create a new DataFrame to store each new result

Initial settings

DataFrameContext

To use the library you must first initialize a DataFrame to define which kind of storage you will use for DataFrames, like this:

mut ctx := vframes.init()                               // In memory
mut ctx := vframes.init(location: 'mycontext.db')       // Persisted to 'mycontext.db'

DataFrame

If you want to suppress console output for functions returning Data, set the optional parameter to_stdout to false (see examples dir).

Accepted data file formats

The structure of most CSV, Parquet, and JSON files is infered automatically. In the future, there will be options to fine tune loading parameters such as delimiters, column renames, partial loading, etc.

Current functions

Last updated on 2024-12-21

  • columns
  • dtypes
  • empty
  • groupby
  • head
  • info
  • query
  • shape
  • tail
  • values
  • abs
  • add
  • add_prefix
  • add_suffix
  • max
  • min
  • mean
  • median
  • sum
  • dropna
  • pow

Roadmap

Although the library is very inspired by Pandas, it's purpose is NOT to be a one-on-one replacement. Some of the funcionalities planned are listed below:

  • DataFrame joins
  • Query deferral
  • Basic plotting

How to contribute

Comments, bug reports, requests, and pull requests are welcome.

About

A powerful data manipulation library inspired by Pandas, designed specifically for the V language, using VDuckDB under the hood.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published