13
Aug

Data Frames – Introduction to R Programming – Part 6


A data frame is a two-dimensional array or to put simply a table data that is tabled in two rows and columns makes it easy to work with in R and means we can store a mix of numeric, categorical variables, and character strings Data frame is one of the main objects you’ll be working with in R so let’s get ourselves familiar with it So in our video on reading
and writing data you read in the income data
set as a data frame you can see a mix of data types here and variable names So data frame makes it
easy to subset data in R as you would’ve seen in our operations video you can use conditions to extract out something you might be interested in So for example I might be interested in extracting out the average income for jobs that are you know above 90,000 So to do this I’ll just simply write “income” and within my income data set I use the dollar sign ($) to refer to the variable that I’m interested in which is “average income” and I would like it to be
greater than or equal to 19K Now in our data frame income We can refer to specific
rows and or column names inside the square brackets here So here we are specifically
referring to rows that meet this condition we want to extract all the rows where the
average income value in the row is greater than or equal to 90,000 the comma that kind of follows this means that we can also extract at the column level so you can basically say within income we can specify the rows and we can specify the columns So let’s just say I’m interested in the
third row of the third column So what I’m saying here
is I would like the income value that sits at Row 3 of the “average.income” column which is the third column and if we run this you can see it has extracted the relevant value The same goes for meeting a condition So we just specified the rows we want before the comma and if we want to specify any columns we do this after the comma we can also extract a
range of rows and columns in our data set so for example I might want rows one to three and I only want the values from columns
one to two of income Now we can see the rows one to three showing and only columns one to two of those rows In a data frame we can easily add or remove columns too so, for example, to add a column we simply type “income” use this dollar sign ($) to add a variable we just call it “new.column” and I’m just going to add a bunch of “NA” missing values to that just for the quick demonstration Let’s have a look at this okay cool and now to remove a column we follow
similar kind of command here So “income” and I want to
kind of get rid of the fourth column, so gonna -4 here which is the lost column in our data set let’s have a look okay great Now another useful command in R is STR or what we call structure So if we look at the
structure of “income” for example So here we can see
our numerical or character or categorical variables we can see how many characters are you know categories or factor levels there are How many rows of data we have to
work with and the like So now that you’re
familiar with data frames we’ll move on to
vectors in the next video

Tags: , , , , , , , , , , , , ,

One Comment

  • Герман Черняев says:

    Another one important thing about dataframes (DF) in context of this video. If you want to extract/delete/print only specific columns or rows of DF (from 1 to 3, and also 7 for instance) you need to combine them inside of brackets with c() , such as: df <- df[c(1:3,7), ] for rows, and df <- df[,c(1:3,7)] for cols

Leave a Reply

Your email address will not be published. Required fields are marked *