In this first example, we will import the data from Iris, that I downloaded for another course (git course). For that we only need to:
using DataFrames, DataFramesMeta using RDatasets using CategoricalArrays iris = dataset("datasets", "iris") #=iris[!, :Species] = categorical(iris[!, :Species])=# first(iris, 5)
Row | SepalLength | SepalWidth | PetalLength | PetalWidth | Species |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Cat… | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
And a simple boxplot showing the sepal length per species:
using StatsPlots: boxplot boxplot(iris.Species, iris.SepalLength)
using CairoMakie fig, ax, plot = scatter(iris.PetalLength, iris.SepalLength, color=levelcode.(iris.Species)) ax2 = Axis(fig[1,2]) scatter!(ax2, iris.PetalLength, iris.PetalWidth, color = levelcode.(iris.Species)) fig
using MultivariateStats X = Matrix(iris[:, Not(:Species)])' y = iris.Species species = unique(iris.Species) model = fit(PCA, X; maxoutdim = size(X)[1])
PCA(indim = 4, outdim = 3, principalratio = 0.9947878161267246) Pattern matrix (unstandardized loadings): ──────────────────────────────────── PC1 PC2 PC3 ──────────────────────────────────── 1 0.743108 0.323446 -0.16277 2 -0.173801 0.359689 0.167212 3 1.76155 -0.0854062 0.0213202 4 0.736739 -0.0371832 0.152647 ──────────────────────────────────── Importance of components: ───────────────────────────────────────────────────────── PC1 PC2 PC3 ───────────────────────────────────────────────────────── SS Loadings (Eigenvalues) 4.22824 0.242671 0.0782095 Variance explained 0.924619 0.0530665 0.0171026 Cumulative variance 0.924619 0.977685 0.994788 Proportion explained 0.929463 0.0533445 0.0171922 Cumulative proportion 0.929463 0.982808 1.0 ─────────────────────────────────────────────────────────
X_transform = MultivariateStats.transform(model, X)' X_df = DataFrame(Matrix(X_transform), :auto) rename!(X_df, :x1 => :PC1, :x2 => :PC2, :x3 => :PC3) X_df.Species = iris.Species;
fig_pca = Figure(); ax_pca = Axis(fig_pca[1, 1], xlabel = "PC1", ylabel = "PC2") @with X_df begin scatter!(ax_pca, :PC1, :PC2, color = levelcode.(:Species), label = levelcode.(:Species)) end fig_pca
using AnovaGLM aovlm = anova_lm(@formula(SepalLength ~ PetalLength * Species), iris)
Analysis of Variance Type 1 test / F test SepalLength ~ 1 + PetalLength + Species + PetalLength & Species Table: ──────────────────────────────────────────────────────────────────────── DOF Exp.SS Mean Square F value Pr(>|F|) ──────────────────────────────────────────────────────────────────────── (Intercept) 1 5121.68 5121.68 45244.8660 <1e-99 PetalLength 1 77.64 77.64 685.8998 <1e-55 Species 2 7.8434 3.9217 34.6441 <1e-12 PetalLength & Species 2 0.3810 0.1905 1.6828 0.1895 (Residuals) 144 16.30 0.1132 ────────────────────────────────────────────────────────────────────────
using ColorSchemes import ColorSchemes.viridis lmmod = lm(@formula(SepalLength ~ PetalLength), iris) lmmod_gp = lm(@formula(SepalLength ~ PetalLength * Species), iris) gp_mod_coef = coef(lmmod_gp) fig = Figure() ax = Axis(fig[1, 1], xlabel = "Petal Length", ylabel = "Petal Width") scatter!(iris.PetalLength, iris.SepalLength, color=levelcode.(iris.Species)); lines!(ax, iris.PetalLength, predict(lmmod), linewidth = 2, color = :black) ablines!(gp_mod_coef[1], gp_mod_coef[2], linewidth = 2, linestyle = :dash, color = viridis[0.0]) ablines!(gp_mod_coef[1] + gp_mod_coef[3], gp_mod_coef[2] + gp_mod_coef[5], linewidth = 2, linestyle = :dash, color = viridis[0.5]) ablines!(gp_mod_coef[1] + gp_mod_coef[4], gp_mod_coef[2] + gp_mod_coef[6], linewidth = 2, linestyle = :dash, color = viridis[1.0]) fig