Importing data#
In this first example, we will import the data from Iris, that I downloaded for another course (git course). For that we only need to:
using DataFrames, DataFramesMeta
using RDatasets
using CategoricalArrays
iris = dataset("datasets", "iris")
#=iris[!, :Species] = categorical(iris[!, :Species])=#
first(iris, 5)5×5 DataFrame
| Row | SepalLength | SepalWidth | PetalLength | PetalWidth | Species |
|---|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | Cat… | |
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
Basic plotting#
And a simple boxplot showing the sepal length per species:
using StatsPlots: boxplot
boxplot(iris.Species, iris.SepalLength)using CairoMakie
fig, ax, plot = scatter(iris.PetalLength, iris.SepalLength,
color=levelcode.(iris.Species))
ax2 = Axis(fig[1,2])
scatter!(ax2, iris.PetalLength, iris.PetalWidth,
color = levelcode.(iris.Species))
fig┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions. └ @ Makie ~/.julia/packages/Makie/kJl0u/src/scenes.jl:264
Principal Component Analysis (PCA)#
using MultivariateStats
X = Matrix(iris[:, Not(:Species)])'
y = iris.Species
species = unique(iris.Species)
model = fit(PCA, X; maxoutdim = size(X)[1])PCA(indim = 4, outdim = 3, principalratio = 0.9947878161267246)
Pattern matrix (unstandardized loadings):
────────────────────────────────────
PC1 PC2 PC3
────────────────────────────────────
1 0.743108 0.323446 -0.16277
2 -0.173801 0.359689 0.167212
3 1.76155 -0.0854062 0.0213202
4 0.736739 -0.0371832 0.152647
────────────────────────────────────
Importance of components:
─────────────────────────────────────────────────────────
PC1 PC2 PC3
─────────────────────────────────────────────────────────
SS Loadings (Eigenvalues) 4.22824 0.242671 0.0782095
Variance explained 0.924619 0.0530665 0.0171026
Cumulative variance 0.924619 0.977685 0.994788
Proportion explained 0.929463 0.0533445 0.0171922
Cumulative proportion 0.929463 0.982808 1.0
─────────────────────────────────────────────────────────
X_transform = MultivariateStats.transform(model, X)'
X_df = DataFrame(Matrix(X_transform), :auto)
rename!(X_df, :x1 => :PC1, :x2 => :PC2, :x3 => :PC3)
X_df.Species = iris.Species;fig_pca = Figure();
ax_pca = Axis(fig_pca[1, 1],
xlabel = "PC1",
ylabel = "PC2")
@with X_df begin
scatter!(ax_pca, :PC1, :PC2,
color = levelcode.(:Species),
label = levelcode.(:Species))
end
fig_pca┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions. └ @ Makie ~/.julia/packages/Makie/kJl0u/src/scenes.jl:264
Basic inference: ANOVA on a linear model#
using AnovaGLM
aovlm = anova_lm(@formula(SepalLength ~ PetalLength * Species), iris)Analysis of Variance
Type 1 test / F test
SepalLength ~ 1 + PetalLength + Species + PetalLength & Species
Table:
────────────────────────────────────────────────────────────────────────
DOF Exp.SS Mean Square F value Pr(>|F|)
────────────────────────────────────────────────────────────────────────
(Intercept) 1 5121.68 5121.68 45244.8660 <1e-99
PetalLength 1 77.64 77.64 685.8998 <1e-55
Species 2 7.8434 3.9217 34.6441 <1e-12
PetalLength & Species 2 0.3810 0.1905 1.6828 0.1895
(Residuals) 144 16.30 0.1132
────────────────────────────────────────────────────────────────────────
Adding the regression lines#
using ColorSchemes
import ColorSchemes.viridis
lmmod = lm(@formula(SepalLength ~ PetalLength), iris)
lmmod_gp = lm(@formula(SepalLength ~ PetalLength * Species), iris)
gp_mod_coef = coef(lmmod_gp)
fig = Figure()
ax = Axis(fig[1, 1],
xlabel = "Petal Length",
ylabel = "Petal Width")
scatter!(iris.PetalLength, iris.SepalLength,
color=levelcode.(iris.Species));
lines!(ax,
iris.PetalLength,
predict(lmmod),
linewidth = 2,
color = :black)
ablines!(gp_mod_coef[1],
gp_mod_coef[2],
linewidth = 2,
linestyle = :dash,
color = viridis[0.0])
ablines!(gp_mod_coef[1] + gp_mod_coef[3],
gp_mod_coef[2] + gp_mod_coef[5],
linewidth = 2,
linestyle = :dash,
color = viridis[0.5])
ablines!(gp_mod_coef[1] + gp_mod_coef[4],
gp_mod_coef[2] + gp_mod_coef[6],
linewidth = 2,
linestyle = :dash,
color = viridis[1.0])
fig┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions. └ @ Makie ~/.julia/packages/Makie/kJl0u/src/scenes.jl:264
