Be careful, the bang (!) modifies also the original table! Remember that we are not making copies, but creating new pointers to the same objects in memory.
describe(wp_clean)
4×7 DataFrame
Row
variable
mean
min
median
max
nmissing
eltype
Symbol
Union…
Any
Union…
Any
Int64
DataType
1
id
117.5
1
117.5
234
0
Int64
2
country
Afghanistan
Zimbabwe
0
String
3
pop2024
3.46886e7
526
5.62636e6
1441719852
0
Int64
4
growth_rate
0.00920043
-0.0309
0.00795
0.0483
0
Float64
describe(wp)
4×7 DataFrame
Row
variable
mean
min
median
max
nmissing
eltype
Symbol
Union…
Any
Union…
Any
Int64
DataType
1
id
117.5
1
117.5
234
0
Int64
2
country
Afghanistan
Zimbabwe
0
String
3
pop2024
3.46886e7
526
5.62636e6
1441719852
0
Int64
4
growth_rate
0.00920043
-0.0309
0.00795
0.0483
0
Float64
Subsetting
It is possible to check whether a string (i.e. country name) or value is present by using the in operator.
"Tanzania"in wp.country
true
We can get the index where a specific country is by using the findall() or findfirst() functions.
# with anonymous functionsfindall(x -> x =="Tanzania", wp.country)# or using the == functionfindall(==("Tanzania"), wp.country)
1-element Vector{Int64}:
21
And this allows us to subset our dataframe in several ways:
# using any of the possibles ways with findall() or findfirst()wp[findall(==("Tanzania"), wp.country), :]
1×4 DataFrame
Row
id
country
pop2024
growth_rate
Int64
String
Int64
Float64
1
21
Tanzania
69419073
0.0294
# or using broadcasting, similar as R syntaxwp[wp.country .=="Tanzania", :]
1×4 DataFrame
Row
id
country
pop2024
growth_rate
Int64
String
Int64
Float64
1
21
Tanzania
69419073
0.0294
The wp.country .== "Tanzania" statement returns a vector of 0s and 1s, that is used for selecting the rows.