This is the DATA-FRAME Reference Manual, version 1.1.0, generated automatically by Declt version 4.0b2.
Copyright © 2019-2022 Steve Nunez
Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the section entitled “Copying” is included exactly as in the original.
Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be translated as well.
This program is distributed under the terms of the Microsoft Public License.
The main system appears first, followed by any subsystem dependency.
A data manipulation library for statistical computing
Data frames for Common Lisp
Steve Nunez <steve@symbolics.tech>
(GIT https://github.com/Lisp-Stat/data-frame.git)
MS-PL
A data frame is a common way of storing data for statistical analysis. Under the hood, a data frame is a vector of equal-length vectors. Each element of the vector can be thought of as a column and the length of each element of the vector is the number of rows. As a result, data frames can store different classes of objects in each column (i.e. numeric, character, factor). In essence, the easiest way to think of a data frame is as an Excel worksheet that contains columns of different types of data but are all of equal length rows.
From a design perspective, Lisp-Stat’s data frame is conceptually most similar to the ’tibble’ from the tidyverse, but using Common Lisp idioms, style and syntax.
1.1.0
Files are sorted by type and then listed depth-first from the systems components trees.
pkgdcl.lisp (file).
data-frame (system).
utils.lisp (file).
data-frame (system).
data-frame.lisp (file).
data-frame (system).
pprint.lisp (file).
data-frame (system).
formatted-output.lisp (file).
data-frame (system).
summary.lisp (file).
data-frame (system).
defdf.lisp (file).
data-frame (system).
conditions.lisp (file).
data-frame (system).
show-properties (function).
properties.lisp (file).
data-frame (system).
drop-na (function).
missing.lisp (file).
data-frame (system).
filter-rows (function).
key-list (function).
filter.lisp (file).
data-frame (system).
ensure-plist (macro).
plist-aops.lisp (file).
data-frame (system).
data (function).
data.lisp (file).
data-frame (system).
Packages are listed by definition order.
df
dfio.
common-lisp.
Definitions are sorted by export status, category, package, and then by lexicographic order.
If non-nil, the system will ask the user for confirmation before redefining a data frame
If a string/factor variable has > *distinct-maximum* values, exclude it
If an integer variable has <= discrete values, consider it a factor
An indication that the data set is large for a particular use case.
This should be bound by a user to the maximum number of data points they consider to be ’normal’. The function can then signal a large-data warning if it is exceeded.
E.g. (let ((df:*large-data* 50000))
(handler-bind ((large-data ...
(some-data-operation ; this will signal if the data is too large
(restart-bind ...
If the number of unique reals exceeds this threshold, they will be summarized with quantiles, otherwise print frequency table
All data sets included by default in R
Columns are only summarised when longer than this, otherwise they are returned as is.
Ability and Intelligence Tests
Passenger Miles on Commercial US Airlines, 1937-1960
Monthly Airline Passenger Numbers 1949-1960
New York Air Quality Measurements
Anscombe’s Quartet of ’Identical’ Simple Linear Regressions
The Joyner-Boore Attenuation Data
The Chatterjee-Price Attitude Data
Quarterly Time Series of the Number of Australian Residents
Base URL for datasets included in R
Sales Data with Leading Indicator
Biochemical Oxygen Demand
Speed and Stopping Distances of Cars
Weight versus age of chicks on different diets
Chicken Weights by Feed Type
Carbon Dioxide Uptake in Grass Plants
Mauna Loa Atmospheric CO2 Concentration
Student’s 3000 Criminals Data
Yearly Numbers of Important Discoveries
Elisa assay of DNase
Smoking, Alcohol and (O)esophageal Cancer
Conversion Rates of Euro Currencies
Daily Closing Prices of Major European Stock Indices, 1991-1998
Old Faithful Geyser Data
Determination of Formaldehyde
Freeny’s Revenue Data
Hair and Eye Color of Statistics Students
Harman Example 2.3
Harman Example 7.4
Pharmacokinetics of Indomethacin
Infertility after Spontaneous and Induced Abortion
Effectiveness of Insect Sprays
Edgar Anderson’s Iris Data
Edgar Anderson’s Iris Data
Areas of the World’s Major Landmasses
Quarterly Earnings per Johnson & Johnson Share
Level of Lake Huron 1875-1972
Luteinizing Hormone in Blood Samples
Intercountry Life-Cycle Savings Data
Growth of Loblolly pine trees
Longley’s Economic Regression Data
Annual Canadian Lynx trappings 1821-1934
Michelson Speed of Light Data
Fuel economy data from 1999 to 2008 for 38 popular models of cars
Motor Trend Car Road Tests
Average Yearly Temperatures in New Haven
Flow of the River Nile
Average Monthly Temperatures at Nottingham, 1920-1939
Classical N, P, K Factorial Experiment
Airline name lookup table by carrier code
Airport metadata
On-time data for all flights that departed NYC (i.e. JFK, LGA or EWR) in 2013
Metadata for all airplane tail numbers found in the FAA aircraft registry
Hourly meterological data for LGA, JFK and EWR in 2013
Occupational Status of Fathers and their Sons
Growth of Orange Trees
Potency of Orchard Sprays
Results from an Experiment on Plant Growth
Annual Precipitation in US Cities
Quarterly Approval Ratings of US Presidents
Vapor Pressure of Mercury as a Function of Temperature
Reaction Velocity of an Enzymatic Reaction
Locations of Earthquakes off Fiji
Random Numbers from Congruential Generator RANDU
Lengths of Major North American Rivers
Measurements on Petroleum Rock Samples
Road Casualties in Great Britain 1969-84
Brownlee’s Stack Loss Plant Data
Monthly Sunspot Data, from 1749 to Present
Yearly Sunspot Data, 1700-1988
Monthly Sunspot Numbers, 1749-1983
Swiss Fertility and Socioeconomic Indicators (1888) Data
Pharmacokinetics of Theophylline
Survival of passengers on the Titanic
The Effect of Vitamin C on Tooth Growth in Guinea Pigs
Yearly Treering Data, -6000-1979
Diameter, Height and Volume for Black Cherry Trees
Student Admissions at UC Berkeley
Road Casualties in Great Britain 1969-84
UK Quarterly Gas Consumption
Accidental Deaths in the US 1973-1978
Violent Crime Rates by US State
Lawyers’ Ratings of State Judges in the US Superior Court
Personal Expenditure Data
Populations Recorded by the US Census
Death Rates in Virginia (1940)
Topographic Information on Auckland’s Maunga Whau Volcano
The Number of Breaks in Yarn during Weaving
Average Heights and Weights for American Women
The World’s Telephones
Internet Usage per Minute
Define a data-frame and package by the same name.
Also defines symbol-macros for variable access, e.g. mtcars:mpg
Destructively modifies N, a SEQUENCE by removing the Nth item.
Example:
LS-USER> (defparameter *v* #(a b c d))
*V*
LS-USER> (delete-nth* *v* 1)
#(A C D)
LS-USER> *v*
#(A C D)
Modify DATA (a data-frame or data-vector) by adding COLUMN with KEY. Return DATA.
Return a new data-frame or data-vector with keys and columns added. Does not modify DATA.
Modify DATA (a data-frame or data-vector) by adding columns with keys. If a data-frame environment exists, add columns to it as well.
Return column corresponding to key.
Set column corresponding to key.
Return a list of column names in DF, as strings
Return the most specific type found in COL
Return the columns of DATA as a vector, or a selection if given (keys are resolved).
Copy data frame or vector. Keys are copied (and thus can be modified), columns or elements are copied using KEY, making the default give a shallow copy.
Count the number of rows for which PREDICATE called on the columns corresponding to KEYS returns non-NIL.
Load a data frame from a CSV or LISP data source located on the local filesystem named by D. Intended for example data sets for Lisp-Stat system. Parameters may be either a KEYWORD or STRING. JSON files require application specific loaders, so not handled here. Use (read-vega ...) for example.
Description
Each package using lisp-stat should define its own logical host, and a directory called DATA. Once done, you can load the example data sets like so:
LS-USER> (data my-example :system :glimpse)
If the system is named GLIMPSE. To load a data set from R, assuming you have configured a logical host, RDATA:
LS-USER> (data :antigua :system :rdata :directory :daag :type :csv)
Create a package with the same name as DATA-FRAME. Within it, create a symbol-macro for each column that will return the columns value. Can also be used to remove and update the environment as the DATA-FRAME changes in destructive operations
Return SEQUENCE with the Nth item removed.
Note: DELETE-IF makes no guarantee of being destructive, so you cannot rely on this side-effect. You must SETF the original sequence to the values returned from this function, or use the modify-macro DELETE-NTH*
Print DF to *standard-output* in table format
Return a modified copy of DATA from which any element (row, if a DATA-FRAME) that matches another element has been removed
Traverse rows from first to last, calling FUNCTION on the columns corresponding to KEYS. Return no values.
Filter DATA by a predicate given in BODY
Example
(data :mtcars) ; load a data set
(head mtcars) ; view first 6 rows
;; MODEL MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ;; 1 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ;; 2 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ;; 3 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ;; 4 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ;; 5 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
(filter-rows mtcars ’(< mpg 17))
#<DATA-FRAME (11 observations of 12 variables)>
(head *) ; view first 6 rows of filtered data frame
;; MODEL MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ;; 1 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 ;; 2 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 ;; 3 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ;; 4 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 ;; 5 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Return the PROPERTY of data VARIABLE
Return a list of summaries of the variables in DF
Coerce each element of the column vectors to the most specific type in the column
Often when reading in a data set, the types will be inconsistent in a variable. For example one observation might be 5.1, and another 5. Whilst mathmatically equivalent, we want our variable vectors to have identical types. The COLUMN-TYPE function returns the most specific numeric type in the column, then coerces all the vector elements to this type
Wrap FUNCTION in a closure that removes missing values and applys FUNCTION in case any of the arguments are :MISSING, :NA or NIL to arguments. Intended for functions accepting vectors.
Return a vector of keys.
Map columns of DATA-FRAME or DATA-VECTOR using FUNCTION. The result is a new DATA-FRAME with the same keys.
Map DATA-FRAME to another one by rows. Function is called on the columns corresponding to KEYS, and should return a sequence with the same length as RESULT-KEYS, which give the keys of the resulting data frame. RESULT-KETS should be either symbols, or of the format (symbol &optional (element-type t)).
Map rows using FUNCTION, on the columns corresponding to KEYS. Return the result with the given ELEMENT-TYPE.
Return a bit-vector containing the result of calling PREDICATE on rows of the columns corresponding to KEYS (0 for NIL, 1 otherwise).
Convert a matrix to a data-frame with the given keys.
Print an array to STREAM, defaulting to *standard-output*, in a tabular format. If ROW-NUMBERS-P, print row numbers.
Print DATA-FRAME to STREAM using the pretty printer
Print data frame DF, in markdown format, to STREAM
If ROW-NUMBERS is true, also print row numbers as the first column
Modify DATA (a data-frame or data-vector) by removing COLUMN with KEY. Return DATA.
Return a new data-frame or data-vector with keys and columns removed. Does not modify DATA.
ARGS: DATA data frame
KEYS list of keys (variables) to be removed
Create a new data frame with new column KEY from data-frame DATA by replacing it either with the given column, or applying the function to the current values (ELEMENT-TYPE is used.)
Modify column KEY of data-frame DATA by replacing it either with the given column, or applying the function to the current values (ELEMENT-TYPE is used.)
Return the rows of DATA as a vector
Set the PROPERTY of each variable in DF to a value. The value is specified in the plist PROP-VALUES.
Example:
To give the variables in the mtcars dataset a unit, use:
(set-properties mtcars :unit ’(:mpg m/g
:cyl :NA
:disp in³
:hp hp
:drat :NA
:wt lb
:qsec s
:vs :NA
:am :NA
:gear :NA
:carb :NA))
Set the PROPERTY of SYMBOL to VALUE
Return up to the first newline
This is useful when docstrings are multi-line. By convention, the first line is the title.
Print all data frames in the current environment in reverse order of creation, i.e. most recently created first. If HEAD is not NIL, print the first six rows, similar to the (head) function
Return a summary struct for COLUMN
Print a summary of DF to STREAM, using heuristics for better formatting
Remove one or more data frames from the environment
PARAMS: a list of DATA-FRAMEs
Essentially reverses what DEFDF does. Returns the data frames that were removed. Don’t use this if you have a data frame bound via DEFPARAMETER.
Examples:
(undef mtcars vlcars)
Remove all values from VAR that are missing according to PREDICATE.
Returns values:
1. the vector with missing values removed
2. the number of elements removed
Remove all rows from DF that are missing values according to PREDICATE
Return the first N rows of DF; N defaults to 6
Return a vector indicating the position of any missing value indicators. They currently are :na and :missing
The name of the data frame. MUST be the same as the symbol whose value cell points to this data frame. This slot essentially allows us to go ’backwards’ and get the symbol that names the data frame.
name.
Substitute NEW, a SYMBOL, for OLD in DF
Useful when reading data files that have an empty or generated column name.
Example: (rename-column! cars ’name :||) will replace an empty symbol with ’name
Replace missing values with the values specified
The alist consists of a column name in the CAR and the replacement value in the CDR
Example: (replace-missing mtcarsm ’((mpg . foo)))
Return the last N rows of DF; N defaults to 6
array-operations/generic.
array-operations/generic.
select-dev.
select-dev.
array-operations/generic.
array-operations/generic.
array-operations/generic.
array-operations/generic.
Print DATA-FRAME dimensions and type
After defining this method it is permanently associated with data-frame objects
select.
select.
select.
An operation attempted to use a key that already exists in ORDERED-KEYS
error.
:key
An operation was attempted on a non-existant key.
A operation was requested on a data set large enough to potentially cause problems.
warning.
Summary of a bit vector.
common-lisp.
alexandria:array-index
0
This slot is read-only.
Summary for factor variables
list
This slot is read-only.
Summary for generic variables, i.e. those with mixed types.
Summary of a real elements (using quantiles).
common-lisp.
real
0
This slot is read-only.
real
0
This slot is read-only.
real
0
This slot is read-only.
alexandria.
real
0
This slot is read-only.
real
0
This slot is read-only.
common-lisp.
real
0
This slot is read-only.
This class is used for implementing both data-vector and data-frame, and represents an ordered collection of key-column pairs. Columns are not assumed to have any specific attributes. This class is not exported.
The name of the data frame. MUST be the same as the symbol whose value cell points to this data frame. This slot essentially allows us to go ’backwards’ and get the symbol that names the data frame.
string
nil
name.
data-frame::ordered-keys
:ordered-keys
vector
:columns
A statistical type for a data variable. All data columns must be one of these types if they are to be intepreted properly by Lisp-Stat
Global list of all data frames
Data frames corresponding to the default R datasets
Student’s Sleep Data
Convert an array to a list of lists
Modify ORDERED-KEYS by adding KEY.
Add KEYS to ORDERED-KEYS
Return the string used to represent ‘thing‘ when printing aesthetically.
Create an object of CLASS (subclass of DATA) from ALIST which contains key-column pairs.
Return a format string for the most specific type found in sequence Use this for sequences of type T to determine how to format the column.
Return a copy of ORDERED-KEYS
Returns T if there is environment set-up for the data frame, or NIL if there isn’t one.
Returns the number of distinct elements in COLUMN, a symbol naming a variable. Useful for formatting columns for human output.
Remove all rows from DF that are missing values. Convenience R-like function.
Recognizes the following and converts them to an alist:
plist
alist
(plist)
(alist)
(data-frame)
When REAL is a RATIO, convert it to a float, otherwise return as is. Used for printing.
Return the most specific type symbol for x
A user prompt, using DUOLOGUE, to select a valid data frame name.
Return the index for KEY.
Return a list of keys used in REST, a form
Number of keys.
Vector of all keys.
Create a DATA object from KEYS and COLUMNS. FOR INTERNAL USE. Always creates a copy of COLUMNS in order to ensure that it is an adjustable array with a fill pointer. KEYS are converted to ORDERED-KEYS if necessary.
Return the maximum number of digits to the right of the decimal point in the numbers of SEQUENCE, equal to or less than MAX-DIGITS
Return the largest printed string size of the elements of SEQUENCE, equal to or less than MAX-WIDTH
Returns T if all elements of COLUMN, a SYMBOL, are increasing monotonically Useful for detecting row numbers in imported data.
Create an ORDERED-KEYS object from KEYS (a sequence).
Create an object of CLASS (subclass of DATA) from PLIST which contains keys and columns, interleaved.
Print COUNT as is and also as a rounded percentage
Print ROWS as a nicely-formatted table.
Each row should have the same number of colums.
Columns will be justified properly to fit the longest item in each one.
Example:
(print-table ’((1 :red something)
(2 :green more)))
=>
1 | RED | SOMETHING
2 | GREEN | MORE
Print values of all the printer variables
Modify DATA (a data-frame or data-vector) by removing columns with keys. If a data-frame environment exists, add columns to it as well.
Modify ORDERED-KEYS by removing KEY.
Return DF with columns in reverse order
Show the standard properties of the variables of the data frame DF Standard properties are ’label’, ’type’ and ’unit’
Print all symbols in PKG Example: (show-symbols ’mtcars)
Return an alist of factor/count pairs
Return an object that summarizes COLUMN of a DATA-FRAME. Primarily intended for printing, not analysis, returned values should print nicely. This function can be used on any type of column, even one with mixed types
Return a summary for a float variable
Return a list of the types found in SEQ
Return a list whose elements alternate between each of the lists ‘lists‘. Weaving stops when any of the lists has been exhausted.
Check if COLUMN is compatible with DATA.
Return the length of column.
Return a list of formatting strings for ARRAY
The method returns a set of default formatting strings using heuristics.
An attempt to redefine an existing data frame. Triggered if either the symbol is bound or the package exists.
error.
:data-frame
This slot is read-only.
A variable has missing data, e.g. :na, nil
Representation of ordered keys
Ordered keys provide a mapping from column keys (symbols) to nonnegative
integers. They are used internally and the corresponding interface is
NOT EXPORTED.
TABLE maps keys to indexes, starting from zero.
structure-object.
hash-table
(make-hash-table :test (function eq))
This slot is read-only.
Base class for summarizing variables. Summary functions take SYMBOLs, rather than values, because the symbol property lists naming the variables have meta-data, e.g. type, label, that we want to print. Not exported.
structure-object.
common-lisp.
alexandria:array-index
0
This slot is read-only.
fixnum
0
This slot is read-only.
string
""
This slot is read-only.
string
""
This slot is read-only.
Jump to: | (
2
A B C D E F G H I K M N O P R S T U V W |
---|
Jump to: | (
2
A B C D E F G H I K M N O P R S T U V W |
---|
Jump to: | *
A B C D E F H I J K L M N O P Q R S T U V W |
---|
Jump to: | *
A B C D E F H I J K L M N O P Q R S T U V W |
---|
Jump to: | B C D F G K L M O P R S T U V |
---|
Jump to: | B C D F G K L M O P R S T U V |
---|