Using the select() function from the "dplyr" package to select only certain columns from a data frame:
mydata %>%
select(x, y)
It will select only x,y columns
Using the group_by() and summarize() functions from the "dplyr" package to group data by certain columns and calculate summary statistics:
mydata %>% (from the mydata dataset) and then...
group_by(x) %>% (group by x column) and then...
summarize(mean = mean(y), (take the average of y)
median = median(y), (where median is median y)
sd = sd(y)) (find the standard deviation y)
Using the ggplot() function from the "ggplot2" package to create a scatterplot of two columns in a data frame:
mydata %>%
ggplot(aes(x, y)) +
geom_point()
Ggplot is a very powerful tool for visuals in R. the above is a simple code of presenting 2 values. I will examine that in a later post.
Using the spread() function from the "tidyr" package to "spread" a column of data into multiple columns:
mydata %>%
spread(key, value)
No comments:
Post a Comment