Descriptive Statistics

Once loaded, the dataset is listed in the Environment pane on the top right side of the RStudio. You may investigated several properties of the dataset there (number of observation, list of variables).

Environment Pane

  1. Several other ways to view the whole dataset (be aware for large datasets!):
scores
## # A tibble: 8 x 3
##   student   gpa   ACT
##     <dbl> <dbl> <dbl>
## 1       1   2.8    21
## 2       2   3.4    24
## 3       3   3      26
## 4       4   3.5    27
## 5       5   3.6    29
## 6       6   3      25
## 7       7   2.7    25
## 8       8   3.7    30
View(scores)
head(scores)
## # A tibble: 6 x 3
##   student   gpa   ACT
##     <dbl> <dbl> <dbl>
## 1       1   2.8    21
## 2       2   3.4    24
## 3       3   3      26
## 4       4   3.5    27
## 5       5   3.6    29
## 6       6   3      25
  1. To list the variables
names(scores)
## [1] "student" "gpa"     "ACT"
  1. To access a particular variable, try first
gpa
## Error in eval(expr, envir, enclos): object 'gpa' not found

This does not work: the variable cannot be accessed directly, without specifying the dataset.

scores$gpa
## [1] 2.8 3.4 3.0 3.5 3.6 3.0 2.7 3.7
scores$gpa[2]
## [1] 3.4
  1. But we may work directly with variables once the dataset is attached
attach(scores)
gpa
## [1] 2.8 3.4 3.0 3.5 3.6 3.0 2.7 3.7
gpa[2]
## [1] 3.4
  1. Some ways to restrict data to work with
scores[1:5,]                # only observation number from 1 to 5 selected
## # A tibble: 5 x 3
##   student   gpa   ACT
##     <dbl> <dbl> <dbl>
## 1       1   2.8    21
## 2       2   3.4    24
## 3       3   3      26
## 4       4   3.5    27
## 5       5   3.6    29
scores[student %% 2 == 0,]  # only observation with even student number selected
## # A tibble: 4 x 3
##   student   gpa   ACT
##     <dbl> <dbl> <dbl>
## 1       2   3.4    24
## 2       4   3.5    27
## 3       6   3      25
## 4       8   3.7    30
scores[ACT > 25,]           # only students with ACT > 25 selected
## # A tibble: 4 x 3
##   student   gpa   ACT
##     <dbl> <dbl> <dbl>
## 1       3   3      26
## 2       4   3.5    27
## 3       5   3.6    29
## 4       8   3.7    30
scores[ACT > 25,]$gpa       # and the corresponding gpa values
## [1] 3.0 3.5 3.6 3.7
gpa[ACT > 25]               # shorter way
## [1] 3.0 3.5 3.6 3.7

The general subseting function

subset(scores, filter = ACT > 25, select = c(student, gpa))
## # A tibble: 4 x 2
##   student   gpa
##     <dbl> <dbl>
## 1       3   3  
## 2       4   3.5
## 3       5   3.6
## 4       8   3.7
  1. If the direct access is not further needed, detach the dataset
detach(scores)

(The dataset is detached automatically when closing RStudio.)