”;
DataFrame may be defined as a table or spreadsheet which we can be used to sort as well as explore a set of related data values. In other words, we can call it a smarter array for holding tabular data. Before we use it, we need to download and install DataFrame and CSV packages as follows −
(@v1.5) pkg> add DataFrames (@v1.5) pkg> add CSV
To start using the DataFrames package, type the following command −
julia> using DataFrames
Loading data into DataFrames
There are several ways to create new DataFrames (which we will discuss later in this section) but one of the quickest ways to load data into DataFrames is to load the Anscombe dataset. For better understanding, let us see the example below −
anscombe = DataFrame( [10 10 10 8 8.04 9.14 7.46 6.58; 8 8 8 8 6.95 8.14 6.77 5.76; 13 13 13 8 7.58 8.74 12.74 7.71; 9 9 9 8 8.81 8.77 7.11 8.84; 11 11 11 8 8.33 9.26 7.81 8.47; 14 14 14 8 9.96 8.1 8.84 7.04; 6 6 6 8 7.24 6.13 6.08 5.25; 4 4 4 19 4.26 3.1 5.39 12.5; 12 12 12 8 10.84 9.13 8.15 5.56; 7 7 7 8 4.82 7.26 6.42 7.91; 5 5 5 8 5.68 4.74 5.73 6.89]);
julia> rename!(anscombe, [Symbol.(:N, 1:4); Symbol.(:M, 1:4)]) 11×8 DataFrame │ Row │ N1 │ N2 │ N3 │ N4 │ M1 │ M2 │ M3 │ M4 │ │ │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ ├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤ │ 1 │ 10.0 │ 10.0 │ 10.0 │ 8.0 │ 8.04 │ 9.14 │ 7.46 │ 6.58 │ │ 2 │ 8.0 │ 8.0 │ 8.0 │ 8.0 │ 6.95 │ 8.14 │ 6.77 │ 5.76 │ │ 3 │ 13.0 │ 13.0 │ 13.0 │ 8.0 │ 7.58 │ 8.74 │ 12.74 │ 7.71 │ │ 4 │ 9.0 │ 9.0 │ 9.0 │ 8.0 │ 8.81 │ 8.77 │ 7.11 │ 8.84 │ │ 5 │ 11.0 │ 11.0 │ 11.0 │ 8.0 │ 8.33 │ 9.26 │ 7.81 │ 8.47 │ │ 6 │ 14.0 │ 14.0 │ 14.0 │ 8.0 │ 9.96 │ 8.1 │ 8.84 │ 7.04 │ │ 7 │ 6.0 │ 6.0 │ 6.0 │ 8.0 │ 7.24 │ 6.13 │ 6.08 │ 5.25 │ │ 8 │ 4.0 │ 4.0 │ 4.0 │ 19.0 │ 4.26 │ 3.1 │ 5.39 │ 12.5 │ │ 9 │ 12.0 │ 12.0 │ 12.0 │ 8.0 │ 10.84 │ 9.13 │ 8.15 │ 5.56 │ │10 │ 7.0 │ 7.0 │ 7.0 │ 8.0 │ 4.82 │ 7.26 │ 6.42 │ 7.91 │ │11 │ 5.0 │ 5.0 │ 5.0 │ 8.0 │ 5.68 │ 4.74 │ 5.73 │ 6.89 │
We assigned the DataFrame to a variable named Anscombe, convert them to an array and then rename columns.
Collected Datasets
We can also use another dataset package named RDatasets package. It contains several other famous datasets including Anscombe’s. Before we start using it, we need to first download and install it as follows −
(@v1.5) pkg> add RDatasets
To start using this package, type the following command −
julia> using DataFrames julia> anscombe = dataset("datasets","anscombe") 11×8 DataFrame │ Row │ X1 │ X2 │ X3 │ X4 │ Y1 │ Y2 │ Y3 │ Y4 │ │ │ Int64 │ Int64 │ Int64 │ Int64 │ Float64 │ Float64 │ Float64 │ Float64 │ ├─────┼───────┼───────┼───────┼───────┼─────────┼─────────┼─────────┼─────────┤ │ 1 │ 10 │ 10 │ 10 │ 8 │ 8.04 │ 9.14 │ 7.46 │ 6.58 │ │ 2 │ 8 │ 8 │ 8 │ 8 │ 6.95 │ 8.14 │ 6.77 │ 5.76 │ │ 3 │ 13 │ 13 │ 13 │ 8 │ 7.58 │ 8.74 │ 12.74│ 7.71 │ │ 4 │ 9 │ 9 │ 9 │ 8 │ 8.81 │ 8.77 │ 7.11 │ 8.84 │ │ 5 │ 11 │ 11 │ 11 │ 8 │ 8.33 │ 9.26 │ 7.81 │ 8.47 │ │ 6 │ 14 │ 14 │ 14 │ 8 │ 9.96 │ 8.1 │ 8.84 │ 7.04 │ │ 7 │ 6 │ 6 │ 6 │ 8 │ 7.24 │ 6.13 │ 6.08 │ 5.25 │ │ 8 │ 4 │ 4 │ 4 │ 19 │ 4.26 │ 3.1 │ 5.39 │ 12.5 │ │ 9 │ 12 │ 12 │ 12 │ 8 │ 10.84 │ 9.13 │ 8.15 │ 5.56 │ │ 10 │ 7 │ 7 │ 7 │ 8 │ 4.82 │ 7.26 │ 6.42 │ 7.91 │ │ 11 │ 5 │ 5 │ 5 │ 8 │ 5.68 │ 4.74 │ 5.73 │ 6.89 │
Empty DataFrames
We can also create DataFrames by simply providing the information about rows, columns as we give in an array.
Example
julia> empty_df = DataFrame(X = 1:10, Y = 21:30) 10×2 DataFrame │ Row │ X │ Y │ │ │ Int64 │ Int64 │ ├─────┼───────┼───────┤ │ 1 │ 1 │ 21 │ │ 2 │ 2 │ 22 │ │ 3 │ 3 │ 23 │ │ 4 │ 4 │ 24 │ │ 5 │ 5 │ 25 │ │ 6 │ 6 │ 26 │ │ 7 │ 7 │ 27 │ │ 8 │ 8 │ 28 │ │ 9 │ 9 │ 29 │ │ 10 │ 10 │ 30 │
To create completely empty DataFrame, we only need to supply the Column Names and define their types as follows −
julia> Complete_empty_df = DataFrame(Name=String[], W=Float64[], H=Float64[], M=Float64[], V=Float64[]) 0×5 DataFrame
julia> Complete_empty_df = vcat(Complete_empty_df, DataFrame(Name="EmptyTestDataFrame", W=5.0, H=5.0, M=3.0, V=5.0)) 1×5 DataFrame │ Row │ Name │ W │ H │ M │ V │ │ │ String │ Float64 │ Float64 │ Float64 │ Float64 │ ├─────┼────────────────────┼─────────┼─────────┼─────────┼─────────┤ │ 1 │ EmptyTestDataFrame │ 5.0 │ 5.0 │ 3.0 │ 5.0 │
julia> Complete_empty_df = vcat(Complete_empty_df, DataFrame(Name="EmptyTestDataFrame2", W=6.0, H=6.0, M=5.0, V=7.0)) 2×5 DataFrame │ Row │ Name │ W │ H │ M │ V │ │ │ String │ Float64 │ Float64 │ Float64 │ Float64 │ ├─────┼─────────────────────┼─────────┼─────────┼─────────┼─────────┤ │ 1 │ EmptyTestDataFrame │ 5.0 │ 5.0 │ 3.0 │ 5.0 │ │ 2 │ EmptyTestDataFrame2 │ 6.0 │ 6.0 │ 5.0 │ 7.0 │
Plotting Anscombe’s Quarter
Now the Anscombe dataset has been loaded, we can do some statistics with it also. The inbuilt function named describe() enables us to calculate the statistics properties of the columns of a dataset. You can supply the symbols, given below, for the properties −
-
mean
-
std
-
min
-
q25
-
median
-
q75
-
max
-
eltype
-
nunique
-
first
-
last
-
nmissing
Example
julia> describe(anscombe, :mean, :std, :min, :median, :q25) 8×6 DataFrame │ Row │ variable │ mean │ std │ min │ median │ q25 │ │ │ Symbol │ Float64 │ Float64 │ Real │ Float64 │ Float64 │ ├─────┼──────────┼─────────┼─────────┼──────┼─────────┼─────────┤ │ 1 │ X1 │ 9.0 │ 3.31662 │ 4 │ 9.0 │ 6.5 │ │ 2 │ X2 │ 9.0 │ 3.31662 │ 4 │ 9.0 │ 6.5 │ │ 3 │ X3 │ 9.0 │ 3.31662 │ 4 │ 9.0 │ 6.5 │ │ 4 │ X4 │ 9.0 │ 3.31662 │ 8 │ 8.0 │ 8.0 │ │ 5 │ Y1 │ 7.50091 │ 2.03157 │ 4.26 │ 7.58 │ 6.315 │ │ 6 │ Y2 │ 7.50091 │ 2.03166 │ 3.1 │ 8.14 │ 6.695 │ │ 7 │ Y3 │ 7.5 │ 2.03042 │ 5.39 │ 7.11 │ 6.25 │ │ 8 │ Y4 │ 7.50091 │ 2.03058 │ 5.25 │ 7.04 │ 6.17 │
We can also do a comparison between XY datasets as follows −
julia> [describe(anscombe[:, xy], :mean, :std, :median, :q25) for xy in [[:X1, :Y1], [:X2, :Y2], [:X3, :Y3], [:X4, :Y4]]] 4-element Array{DataFrame,1}: 2×5 DataFrame │ Row │ variable │ mean │ std │ median │ q25 │ │ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ ├─────┼──────────┼─────────┼─────────┼─────────┼─────────┤ │ 1 │ X1 │ 9.0 │ 3.31662 │ 9.0 │ 6.5 │ │ 2 │ Y1 │ 7.50091 │ 2.03157 │ 7.58 │ 6.315 │ 2×5 DataFrame │ Row │ variable │ mean │ std │ median │ q25 │ │ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ ├─────┼──────────┼─────────┼─────────┼─────────┼─────────┤ │ 1 │ X2 │ 9.0 │ 3.31662 │ 9.0 │ 6.5 │ │ 2 │ Y2 │ 7.50091 │ 2.03166 │ 8.14 │ 6.695 │ 2×5 DataFrame │ Row │ variable │ mean │ std │ median │ q25 │ │ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ ├─────┼──────────┼─────────┼─────────┼─────────┼─────────┤ │ 1 │ X3 │ 9.0 │ 3.31662 │ 9.0 │ 6.5 │ │ 2 │ Y3 │ 7.5 │ 2.03042 │ 7.11 │ 6.25 │ 2×5 DataFrame │ Row │ variable │ mean │ std │ median │ q25 │ │ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ ├─────┼──────────┼─────────┼─────────┼─────────┼─────────┤ │ 1 │ X4 │ 9.0 │ 3.31662 │ 8.0 │ 8.0 │ │ 2 │ Y4 │ 7.50091 │ 2.03058 │ 7.04 │ 6.17 │
Let us reveal the true purpose of Anscombe, i.e., plot the four sets of its quartet as follows −
julia> using StatsPlots [ Info: Precompiling StatsPlots [f3b207a7-027a-5e70-b257-86293d7955fd] julia> @df anscombe scatter([:X1 :X2 :X3 :X4], [:Y1 :Y2 :Y3 :Y4], smooth=true, line = :red, linewidth = 2, title= ["X$i vs Y$i" for i in (1:4)''], legend = false, layout = 4, xlimits = (2, 20), ylimits = (2, 14))
Regression and Models
In this section, we will be working with Linear Regression line for the dataset. For this we need to use Generalized Linear Model (GLM) package which you need to first add as follows −
(@v1.5) pkg> add GLM
Now let us create a liner regression model by specifying a formula using the @formula macro and supplying columns names as well as name of the DataFrame. An example for the same is given below −
julia> linearregressionmodel = fit(LinearModel, @formula(Y1 ~ X1), anscombe) StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}} Y1 ~ 1 + X1 Coefficients: ─────────────────────────────────────────────────────────────────────── Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95% ─────────────────────────────────────────────────────────────────────── (Intercept) 3.00009 1.12475 2.67 0.0257 0.455737 5.54444 X1 0.500091 0.117906 4.24 0.0022 0.23337 0.766812 ───────────────────────────────────────────────────────────────────────
Let us check the summary and the coefficient of the above created linear regression model −
julia> summary(linearregressionmodel) "StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}" julia> coef(linearregressionmodel) 2-element Array{Float64,1}: 3.0000909090909054 0.5000909090909096
Now let us produce a function for the regression line. The form of the function is y = ax +c.
julia> f(x) = coef(linearmodel)[2] * x + coef(linearmodel)[1] f (generic function with 1 method)
Once we have the function that describes the regression line, we can draw a plot as follows −
julia> p1 = plot(anscombe[:X1], anscombe[:Y1], smooth=true, seriestype=:scatter, title = "X1 vs Y1", linewidth=8, linealpha=0.5, label="data") julia> plot!(f, 2, 20, label="correlation")
Working with DataFrames
As we know that nothing is perfect. This is also true in case of datasets because not all the datasets are consistent and tidy. To show how we can work with different items of DataFrame, let us create a test DataFrame −
julia> testdf = DataFrame( Number = [3, 5, 7, 8, 20 ], Name = ["Lithium", "Boron", "Nitrogen", "Oxygen", "Calcium" ], AtomicWeight = [6.941, 10.811, 14.0067, 15.9994, 40.078 ], Symbol = ["Li", "B", "N", "O", "Ca" ], Discovered = [1817, 1808, 1772, 1774, missing ]) 5×5 DataFrame │ Row │ Number │ Name │ AtomicWeight │ Symbol │ Discovered │ │ │ Int64 │ String │ Float64 │ String │ Int64? │ ├─────┼────────┼──────────┼──────────────┼────────┼────────────┤ │ 1 │ 3 │ Lithium │ 6.941 │ Li │ 1817 │ │ 2 │ 5 │ Boron │ 10.811 │ B │ 1808 │ │ 3 │ 7 │ Nitrogen │ 14.0067 │ N │ 1772 │ │ 4 │ 8 │ Oxygen │ 15.9994 │ O │ 1774 │ │ 5 │ 20 │ Calcium │ 40.078 │ Ca │ missing │
Handling missing values
There can be some missing values in datasets. It can be checked with the help of describe() function as follows −
julia> describe(testdf) 5×8 DataFrame │ Row │ variable │ mean │ min │ median │ max │ nunique │ nmissing │ eltype │ │ │ Symbol │ Union… │ Any │ Union… │ Any │ Union… │ Union… │ Type │ ├─────┼──────────────┼─────────┼───────┼─────────┼────────┼─────────┼──────────┼───────────────────────┤ │ 1 │ Number │ 8.6 │ 3 │ 7.0 │ 20 │ │ │ Int64 │ │ 2 │ Name │ │ Boron │ │ Oxygen │ 5 │ │ String │ │ 3 │ AtomicWeight │ 17.5672 │ 6.941 │ 14.0067 │ 40.078 │ │ │ Float64 │ │ 4 │ Symbol │ │ B │ │ O │ 5 │ │ String │ │ 5 │ Discovered │ 1792.75 │ 1772 │ 1791.0 │ 1817 │ │ 1 │ Union{Missing, Int64} │
Julia provides a special datatype called Missing to address such issue. This datatype indicates that there is not a usable value at this location. That is why the DataFrames packages allow us to get most of our datasets and make sure that the calculations are not tampered due to missing values.
Looking for missing values
We can check with ismissing() function that whether the DataFrame has any missing value or not.
Example
julia> for row in 1:nrows for col in 1:ncols if ismissing(testdf [row,col]) println("$(names(testdf)[col]) value for $(testdf[row,:Name]) is missing!") end end end
Discovered value for Calcium is missing!
Repairing DataFrames
We can use the following code to change values that are not acceptable like “n/a”, “0”, “missing”. The below code will look in every cell for above mentioned non-acceptable values.
Example
julia> for row in 1:size(testdf, 1) # or nrow(testdf) for col in 1:size(testdf, 2) # or ncol(testdf) println("processing row $row column $col ") temp = testdf [row,col] if ismissing(temp) println("skipping missing") elseif temp == "n/a" || temp == "0" || temp == 0 testdf [row, col] = missing println("changed row $row column $col ") end end end processing row 1 column 1 processing row 1 column 2 processing row 1 column 3 processing row 1 column 4 processing row 1 column 5 processing row 2 column 1 processing row 2 column 2 processing row 2 column 3 processing row 2 column 4 processing row 2 column 5 processing row 3 column 1 processing row 3 column 2 processing row 3 column 3 processing row 3 column 4 processing row 3 column 5 processing row 4 column 1 processing row 4 column 2 processing row 4 column 3 processing row 4 column 4 processing row 4 column 5 processing row 5 column 1 processing row 5 column 2 processing row 5 column 3 processing row 5 column 4 processing row 5 column 5 skipping missing
Working with missing values
Julia provides support for representing missing values in the statistical sense, that is for situations where no value is available for a variable in an observation, but a valid value theoretically exists.
completecases()
The completecases() function is used to find the maximum value of the column that contains the missing value.
Example
julia> maximum(testdf[completecases(testdf), :].Discovered) 1817
dropmissing()
The dropmissing() function is used to get the copy of DataFrames without having the missing values.
Example
julia> dropmissing(testdf) 4×5 DataFrame │ Row │ Number │ Name │ AtomicWeight │ Symbol │ Discovered │ │ │ Int64 │ String │ Float64 │ String │ Int64 │ ├─────┼────────┼──────────┼──────────────┼────────┼────────────┤ │ 1 │ 3 │ Lithium │ 6.941 │ Li │ 1817 │ │ 2 │ 5 │ Boron │ 10.811 │ B │ 1808 │ │ 3 │ 7 │ Nitrogen │ 14.0067 │ N │ 1772 │ │ 4 │ 8 │ Oxygen │ 15.9994 │ O │ 1774 │
Modifying DataFrames
The DataFrames package of Julia provides various methods using which you can add, remove, rename columns, and add/delete rows.
Adding Columns
We can use hcat() function to add a column of integers to the DataFrame. It can be used as follows −
julia> hcat(testdf, axes(testdf, 1)) 5×6 DataFrame │ Row │ Number │ Name │ AtomicWeight │ Symbol │ Discovered │ x1 │ │ │ Int64 │ String │ Float64 │ String │ Int64? │ Int64 │ ├─────┼────────┼──────────┼──────────────┼────────┼────────────┼───────┤ │ 1 │ 3 │ Lithium │ 6.941 │ Li │ 1817 │ 1 │ │ 2 │ 5 │ Boron │ 10.811 │ B │ 1808 │ 2 │ │ 3 │ 7 │ Nitrogen │ 14.0067 │ N │ 1772 │ 3 │ │ 4 │ 8 │ Oxygen │ 15.9994 │ O │ 1774 │ 4 │ │ 5 │ 20 │ Calcium │ 40.078 │ Ca │ missing │ 5 │
But as you can notice that we haven’t changed the DataFrame or assigned any new DataFrame to a symbol. We can add another column as follows −
julia> testdf [!, :MP] = [180.7, 2300, -209.86, -222.65, 839] 5-element Array{Float64,1}: 180.7 2300.0 -209.86 -222.65 839.0 julia> testdf 5×6 DataFrame │ Row │ Number │ Name │ AtomicWeight │ Symbol │ Discovered │ MP │ │ │ Int64 │ String │ Float64 │ String │ Int64? │ Float64 │ ├─────┼────────┼──────────┼──────────────┼────────┼────────────┼─────────┤ │ 1 │ 3 │ Lithium │ 6.941 │ Li │ 1817 │ 180.7 │ │ 2 │ 5 │ Boron │ 10.811 │ B │ 1808 │ 2300.0 │ │ 3 │ 7 │ Nitrogen │ 14.0067 │ N │ 1772 │ -209.86 │ │ 4 │ 8 │ Oxygen │ 15.9994 │ O │ 1774 │ -222.65 │ │ 5 │ 20 │ Calcium │ 40.078 │ Ca │ missing │ 839.0 │
We have added a column having melting points of all the elements to our test DataFrame.
Removing Columns
We can use select!() function to remove a column from the DataFrame. It will create a new DataFrame that contains the selected columns, hence to remove a particular column, we need to use select!() with Not. It is shown in the given example −
julia> select!(testdf, Not(:MP)) 5×5 DataFrame │ Row │ Number │ Name │ AtomicWeight │ Symbol │ Discovered │ │ │ Int64 │ String │ Float64 │ String │ Int64? │ ├─────┼────────┼──────────┼──────────────┼────────┼────────────┤ │ 1 │ 3 │ Lithium │ 6.941 │ Li │ 1817 │ │ 2 │ 5 │ Boron │ 10.811 │ B │ 1808 │ │ 3 │ 7 │ Nitrogen │ 14.0067 │ N │ 1772 │ │ 4 │ 8 │ Oxygen │ 15.9994 │ O │ 1774 │ │ 5 │ 20 │ Calcium │ 40.078 │ Ca │ missing │
We have removed the column MP from our Data Frame.
Renaming Columns
We can use rename!() function to rename a column in the DataFrame. We will be renaming the AtomicWeight column to AW as follows −
julia> rename!(testdf, :AtomicWeight => :AW) 5×5 DataFrame │ Row │ Number │ Name │ AW │ Symbol │ Discovered │ │ │ Int64 │ String │ Float64 │ String │ Int64? │ ├─────┼────────┼──────────┼─────────┼────────┼────────────┤ │ 1 │ 3 │ Lithium │ 6.941 │ Li │ 1817 │ │ 2 │ 5 │ Boron │ 10.811 │ B │ 1808 │ │ 3 │ 7 │ Nitrogen │ 14.0067 │ N │ 1772 │ │ 4 │ 8 │ Oxygen │ 15.9994 │ O │ 1774 │ │ 5 │ 20 │ Calcium │ 40.078 │ Ca │ missing │
Adding rows
We can use push!() function with suitable data to add rows in the DataFrame. In the below given example we will be adding a row having element Cooper −
Example
julia> push!(testdf, [29, "Copper", 63.546, "Cu", missing]) 6×5 DataFrame │ Row │ Number │ Name │ AW │ Symbol │ Discovered │ │ │ Int64 │ String │ Float64 │ String │ Int64? │ ├─────┼────────┼──────────┼─────────┼────────┼────────────┤ │ 1 │ 3 │ Lithium │ 6.941 │ Li │ 1817 │ │ 2 │ 5 │ Boron │ 10.811 │ B │ 1808 │ │ 3 │ 7 │ Nitrogen │ 14.0067 │ N │ 1772 │ │ 4 │ 8 │ Oxygen │ 15.9994 │ O │ 1774 │ │ 5 │ 20 │ Calcium │ 40.078 │ Ca │ missing │ │ 6 │ 29 │ Copper │ 63.546 │ Cu │ missing │
Deleting rows
We can use deleterows!() function with suitable data to delete rows from the DataFrame. In the below given example we will be deleting three rows (4th, 5th,and 6th) from our test data frame −
Example
julia> deleterows!(testdf, 4:6) 3×5 DataFrame │ Row │ Number │ Name │ AW │ Symbol │ Discovered │ │ │ Int64 │ String │ Float64 │ String │ Int64? │ ├─────┼────────┼──────────┼─────────┼────────┼────────────┤ │ 1 │ 3 │ Lithium │ 6.941 │ Li │ 1817 │ │ 2 │ 5 │ Boron │ 10.811 │ B │ 1808 │ │ 3 │ 7 │ Nitrogen │ 14.0067 │ N │ 1772 │
Finding values in DataFrame
To find the values in DataFrame, we need to use an elementwise operator examining all the rows. This operator will return an array of Boolean values to indicate whether cells meet the criteria or not.
Example
julia> testdf[:, :AW] .< 10 3-element BitArray{1}: 1 0 0 julia> testdf[testdf[:, :AW] .< 10, :] 1×5 DataFrame │ Row │ Number │ Name │ AW │ Symbol │ Discovered │ │ │ Int64 │ String │ Float64 │ String │ Int64? │ ├─────┼────────┼─────────┼─────────┼────────┼────────────┤ │ 1 │ 3 │ Lithium │ 6.941 │ Li │ 1817 │
Sorting
To sort the values in DataFrame, we can use sort!() function. We need to give the columns on which we want to sort.
Example
julia> sort!(testdf, [order(:AW)]) 3×5 DataFrame │ Row │ Number │ Name │ AW │ Symbol │ Discovered │ │ │ Int64 │ String │ Float64 │ String │ Int64? │ ├─────┼────────┼──────────┼─────────┼────────┼────────────┤ │ 1 │ 3 │ Lithium │ 6.941 │ Li │ 1817 │ │ 2 │ 5 │ Boron │ 10.811 │ B │ 1808 │ │ 3 │ 7 │ Nitrogen │ 14.0067 │ N │ 1772 │
The DataFrame is sorted based on the values of column AW.
”;