Thursday, December 8, 2022

Here are some more examples of how to use the tidyverse in your R code:


Using the select() function from the "dplyr" package to select only certain columns from a data frame:

mydata %>%

select(x, y)

It will select only x,y columns

Using the group_by() and summarize() functions from the "dplyr" package to group data by certain columns and calculate summary statistics:

mydata %>% (from the mydata dataset) and then...

group_by(x) %>% (group by x column) and then...

summarize(mean = mean(y), (take the average of y)

median = median(y), (where median is median y)

sd = sd(y)) (find the standard deviation y)



Using the ggplot() function from the "ggplot2" package to create a scatterplot of two columns in a data frame:

mydata %>%

ggplot(aes(x, y)) +

geom_point()

Ggplot is a very powerful tool for visuals in R. the above is a simple code of presenting 2 values. I will examine that in a later post.


Using the spread() function from the "tidyr" package to "spread" a column of data into multiple columns:

mydata %>%

spread(key, value)



No comments:

Post a Comment

Binomial Distribution in very simple words

The binomial distribution is a probability distribution that describes the outcome of a series of independent "yes/no" experiments...