Joan edukira

DataFrames.jl erabiltzen

·666 hitz·4 minutu
Urtzi Enriquez-Urzelai
Egilea
Urtzi Enriquez-Urzelai
Ekofisiologo ebolutiboa
Julia Tutorialak - Artikulu hau bilduma baten parte da.
Atala 1: Artikulu hau

Adibide honetan, internetetik deskargatutako munduko herrialdeen populazio-datuekin egingo dut lan. Ikasgai hau YouTubeko bideo batean 1 oinarrituta dago.

Datuak inportatzen
#

using DataFrames
using CSV

wp = DataFrame(CSV.File("./world_pop.csv"))
first(wp, 5)
5×8 DataFrame
RowCountryPopulation 2024Population 2023Area (km2)Density (/km2)Growth RateWorld %World Rank
StringInt64Int64String7Float64Float64Float64?Int64
1India144171985214286276633M485.00.00920.18011
2China142517878214256713529.4M151.0-0.00030.1782
3United States3418144203399965639.1M37.00.00530.04273
4Indonesia2797980492775341221.9M149.00.00820.0354
5Pakistan245209815240485658770.9K318.00.01960.03065
describe(wp)
8×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1CountryAfghanistanZimbabwe0String
2Population 20243.46886e75265.62636e614417198520Int64
3Population 20233.43744e75185.6439e614286276630Int64
4Area (km2)1.1K< 10String7
5Density (/km2)453.7880.1498.521674.00Float64
6Growth Rate0.00920043-0.03090.007950.04830Float64
7World %0.004446490.00.000750.18016Union{Missing, Float64}
8World Rank117.51117.52340Int64
names(wp)
8-element Vector{String}:
 "Country"
 "Population 2024"
 "Population 2023"
 "Area (km2)"
 "Density (/km2)"
 "Growth Rate"
 "World %"
 "World Rank"

Datuen eraldaketa (Data wrangling)
#

wp.id = 1:nrow(wp)
first(wp, 5)
5×9 DataFrame
RowCountryPopulation 2024Population 2023Area (km2)Density (/km2)Growth RateWorld %World Rankid
StringInt64Int64String7Float64Float64Float64?Int64Int64
1India144171985214286276633M485.00.00920.180111
2China142517878214256713529.4M151.0-0.00030.17822
3United States3418144203399965639.1M37.00.00530.042733
4Indonesia2797980492775341221.9M149.00.00820.03544
5Pakistan245209815240485658770.9K318.00.01960.030655
colnames = [:country, :pop2024, :pop2023, :area, :density, :growth_rate, :world_perc, :world_rank, :id]
rename!(wp, colnames)
describe(wp)
9×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1countryAfghanistanZimbabwe0String
2pop20243.46886e75265.62636e614417198520Int64
3pop20233.43744e75185.6439e614286276630Int64
4area1.1K< 10String7
5density453.7880.1498.521674.00Float64
6growth_rate0.00920043-0.03090.007950.04830Float64
7world_perc0.004446490.00.000750.18016Union{Missing, Float64}
8world_rank117.51117.52340Int64
9id117.51117.52340Int64
wp_clean = select!(wp, :id, :country, :pop2024, :growth_rate);

Kontuz ibili, harridura-markak (!) jatorrizko taula ere aldatzen baitu! Gogoratu ez garela kopiak egiten ari, memoriako objektu berberetara zuzentzen duten erakusle (pointer) berriak sortzen baizik.

describe(wp_clean)
4×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64DataType
1id117.51117.52340Int64
2countryAfghanistanZimbabwe0String
3pop20243.46886e75265.62636e614417198520Int64
4growth_rate0.00920043-0.03090.007950.04830Float64
describe(wp)
4×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64DataType
1id117.51117.52340Int64
2countryAfghanistanZimbabwe0String
3pop20243.46886e75265.62636e614417198520Int64
4growth_rate0.00920043-0.03090.007950.04830Float64

Azpimultzoak hautatzea (Subsetting)
#

Kate bat (adibidez, herrialde baten izena) edo balio bat presente dagoen egiaztatzeko in operadorea erabili daiteke.

"Tanzania" in wp.country
true

Herrialde zehatz bat zein indizetan dagoen jakiteko findall() edo findfirst() funtzioak erabili ditzakegu.

# funtzio anonimoekin
findall(x -> x == "Tanzania", wp.country)

# edo == funtzioa erabiliz
findall(==("Tanzania"), wp.country)
1-element Vector{Int64}:
 21

Horrek gure dataframe-aren azpimultzoak hainbat modutan hautatzeko aukera ematen digu:

# findall() edo findfirst() erabiliz
wp[findall(==("Tanzania"), wp.country), :]
1×4 DataFrame
Rowidcountrypop2024growth_rate
Int64StringInt64Float64
121Tanzania694190730.0294
# edo "broadcasting" bidez, R-ko sintaxiaren antzera
wp[wp.country .== "Tanzania", :]
1×4 DataFrame
Rowidcountrypop2024growth_rate
Int64StringInt64Float64
121Tanzania694190730.0294

wp.country .== "Tanzania" adierazpenak 0 eta 1 balioez osatutako bektore bat itzultzen du, eta bektore hori errenkadak hautatzeko erabiltzen da.


  1. YouTube bideo honetan oinarrituta. ↩︎

Julia Tutorialak - Artikulu hau bilduma baten parte da.
Atala 1: Artikulu hau