Statistical Analysis using R

Try R for Statistical Data Analysis

 R is a tool for statistics and data modeling. During my course, I found that the R programming language is elegant, versatile, and has a highly expressive syntax designed around working with data. It also includes extremely powerful graphics capabilities. I found it very helpful for easily manipulate of data. R is very useful tool for statistical Analysis.

Based on this course using Code School, I am going to create an example based on some data that I have created and using R Graphics to visualise the data.

First of all, I saved my data file with the help of Microsoft Excel in .CSV file.

Then I import my data file into R-Studio.

With the help of these command, I import my data file in R Studio.

>setwd(“c:/users/tehseen/desktop/R assignment”)

Or it can be select from choose Directory.

importing file

>data1 = read.csv(“data1.csv”, header=T)

Lungs capacity Data

In this data file I have 6 variables. This data is showing lungs capacity of smokers and non-smokers by age, gender and height.

I am going to use this data for statistical analysis using R.

First of all, I am going to make bar charts and pie chart. Which can summarizing the distribution of categorical Variable.


 >[1]”LungCap” “Age” “Height” “Smoke” “Gender” “Caesarean”

For Gender variable first of all, I make the frequency table.


 female   male

   93       207

To save this I give the name for gender frequency table to Count…

count <- table(data1$Gender)

 > count

female   male

93      207

For percentage ratio between relative frequency (male and female)

> table(data1$Gender)/300

female   male

0.31    0.69

To save this ratio in R we will use this command

> percent <- table(data1$Gender)/300

> percent

female   male

0.31    0.69

For the bar plot:barplot count


Here bar plot showing the percentage ratio between male and female in Gender variable.

percentage ratio between Male and Female


> barplot(percent, main=”Percentage ratio between Male & Female”, xlab=”Gender”, ylab=”%”, las=1, names.arg=c(“Female”, “Male”))


Here For pie chart plot showing the percentage ratio between male and female in Gender variable.pie chart

 >pie(count, main=”Percentage ratio between Male & Female”)



Histograms of Lungs Capacity:

Histograms is using for the summarizing the distribution of a numeric variable.

> hist(data1$LungCap)Histogram data

> hist(data1$LungCap, freq = F)

> hist(data1$LungCap, ylim=c(0, 0.2))

> hist(data1$LungCap, ylim=c(0, 0.2), col=4, breaks=14))

> hist(data1$LungCap, ylim=c(0, 0.2), col=4, breaks=c(0,2,4,6,8,10,12,14,16))


Here red line showing the Density for the variable Lung capacity.

> hist(data1$LungCap, ylim=c(0, 0.2), col=4, breaks=seq(from=0, to=16, by=2), main=”Boxplot of Lungs Capacity”, xlab=”Lung capacity”, ylab=”Density”, las=1 )Boxplot Of Lungs Capacity

> lines(density(data1$LungCap))

> lines(density(data1$LungCap), col=2, lwd=3)



There is a scattered plot which is showing the lung capacity by age.

Abline showing the ratio of lung capacity by age.abline

>plot(data1$Age, data1$LungCap, xlab=”Age”, ylab=”Lungs Capacity”, col=2)

> plot(data1$Age, data1$LungCap,main=”Scatterplot”, xlab=”Age”, ylab=”Lungs Capacity”, col=2)

Now I will use the boxplot for summarizing the distribution of numeric variable.lungcap1

I produced a boxplot for the lungs capacity.

For quantile variable lungcap.                                 >quantile(data1$LungCap, probs= c (0,1))

0%        100%boxplot 2

4.800    11.125

> quantile(data1$LungCap, probs= c (0, 0.25, .5, .75, 1))

0%        25%        50%      75%        100%

4.800    6.225      7.325     9.550      11.125

For the difference between lungs capacity between male and plot by gender

>boxplot(data1$LungCap ~ data1$Gender)

> boxplot(data1$LungCap ~ data1$Gender, main=”Boxplot By Gender”)

Now I will examine the relationship between smoker and non-smoker and lung capacity within age group or age strata.

> AgeGroups <- cut(data1$Age, breaks=c(0,13,15,17,25), labels=c(“<13”, “14/15”, “16/17”, “18+”))By age

> AgeGroups[1:5]

[1] <13   18+   16/17 14/15 <13 

Levels: <13 14/15 16/17 18+

 >boxplot(data1$LungCap ~ data1$Smoke)

>boxplot(data1$LungCap ~ data1$Smoke, main=”LungCap Vs Smoke “, ylab=”Age”)

For Examine the relationship between Lung Capacity Vs. Smoker and non-smoker by Age Groups.



>levels(AgeGroups)lungcap Vs Smoke by age

>boxplot(data1$LungCap ~ data1$Smoke*AgeGroups, main=”LungCap Vs Smoke “, ylab=”LungCap Vs Smoke By AgeGroups”, las=1)

>boxplot(data1$LungCap ~ data1$Smoke*AgeGroups, main=”LungCap Vs Smoke “, ylab=”LungCap Vs Smoke By AgeGroups”, las=2)

For stratified boxplot data with color:by age with col

Here we can observe the difference between the lungs capacity of the smokers and non-smokers by age groups. Red color showing smokers by age and blue color showing non-smoker by age.

> boxplot(data1$LungCap ~ data1$Smoke*AgeGroups, main=”LungCap Vs Smoke “, ylab=”LungCap Vs Smoke By AgeGroups”, las=2, col=c(4,2))

There can be many other different statistical analysis and graphical representation by using R.

There can be analysis between Lungs capacity by height as well.


With the help of R statistical analysis and graphical representations, I can easily analysis the Lungs capacity ratio in the smokers and non-smokers by their age, gender and height.

Leave a Reply

Your email address will not be published. Required fields are marked *