# Statistical Analysis using R

Try R for Statistical Data Analysis

R is a tool for statistics and data modeling. During my course, I found that the R programming language is elegant, versatile, and has a highly expressive syntax designed around working with data. It also includes extremely powerful graphics capabilities. I found it very helpful for easily manipulate of data. R is very useful tool for statistical Analysis.

Based on this course using, I am going to create an example based on some data that I have created and using R Graphics to visualise the data.

First of all, I saved my data file with the help of Microsoft Excel in .CSV file.

Then I import my data file into R-Studio.

With the help of these command, I import my data file in R Studio.

>setwd(“c:/users/tehseen/desktop/R assignment”)

Or it can be select from choose Directory.

In this data file I have 6 variables. This data is showing lungs capacity of smokers and non-smokers by age, gender and height.

I am going to use this data for statistical analysis using R.

First of all, I am going to make bar charts and pie chart. Which can summarizing the distribution of categorical Variable.

>names(data1)

>[1]”LungCap” “Age” “Height” “Smoke” “Gender” “Caesarean”

For Gender variable first of all, I make the frequency table.

>table(data1\$Gender)

female   male

93       207

To save this I give the name for gender frequency table to Count…

count <- table(data1\$Gender)

> count

female   male

93      207

For percentage ratio between relative frequency (male and female)

> table(data1\$Gender)/300

female   male

0.31    0.69

To save this ratio in R we will use this command

> percent <- table(data1\$Gender)/300

> percent

female   male

0.31    0.69

For the bar plot:

>barplot(count)

Here bar plot showing the percentage ratio between male and female in Gender variable.

>barplot(percent)

> barplot(percent, main=”Percentage ratio between Male & Female”, xlab=”Gender”, ylab=”%”, las=1, names.arg=c(“Female”, “Male”))

>Box()

Here For pie chart plot showing the percentage ratio between male and female in Gender variable.

>pie(count, main=”Percentage ratio between Male & Female”)

>box()

Histograms of Lungs Capacity:

Histograms is using for the summarizing the distribution of a numeric variable.

> hist(data1\$LungCap)

> hist(data1\$LungCap, freq = F)

> hist(data1\$LungCap, ylim=c(0, 0.2))

> hist(data1\$LungCap, ylim=c(0, 0.2), col=4, breaks=14))

> hist(data1\$LungCap, ylim=c(0, 0.2), col=4, breaks=c(0,2,4,6,8,10,12,14,16))

Here red line showing the Density for the variable Lung capacity.

> hist(data1\$LungCap, ylim=c(0, 0.2), col=4, breaks=seq(from=0, to=16, by=2), main=”Boxplot of Lungs Capacity”, xlab=”Lung capacity”, ylab=”Density”, las=1 )

> lines(density(data1\$LungCap))

> lines(density(data1\$LungCap), col=2, lwd=3)

>box()

There is a scattered plot which is showing the lung capacity by age.

Abline showing the ratio of lung capacity by age.

>plot(data1\$Age, data1\$LungCap, xlab=”Age”, ylab=”Lungs Capacity”, col=2)

> plot(data1\$Age, data1\$LungCap,main=”Scatterplot”, xlab=”Age”, ylab=”Lungs Capacity”, col=2)

Now I will use the boxplot for summarizing the distribution of numeric variable.

I produced a boxplot for the lungs capacity.
>boxplot(data1\$LungCap)

For quantile variable lungcap.                                 >quantile(data1\$LungCap, probs= c (0,1))

0%        100%

4.800    11.125

> quantile(data1\$LungCap, probs= c (0, 0.25, .5, .75, 1))

0%        25%        50%      75%        100%

4.800    6.225      7.325     9.550      11.125

For the difference between lungs capacity between male and female.

>boxplot(data1\$LungCap ~ data1\$Gender)

> boxplot(data1\$LungCap ~ data1\$Gender, main=”Boxplot By Gender”)

Now I will examine the relationship between smoker and non-smoker and lung capacity within age group or age strata.

> AgeGroups <- cut(data1\$Age, breaks=c(0,13,15,17,25), labels=c(“<13”, “14/15”, “16/17”, “18+”))

> AgeGroups[1:5]

[1] <13   18+   16/17 14/15 <13

Levels: <13 14/15 16/17 18+

>boxplot(data1\$LungCap ~ data1\$Smoke)

>boxplot(data1\$LungCap ~ data1\$Smoke, main=”LungCap Vs Smoke “, ylab=”Age”)

For Examine the relationship between Lung Capacity Vs. Smoker and non-smoker by Age Groups.

>data1\$Age[1:5]

>AgeGroups[1:5]

>levels(AgeGroups)

>boxplot(data1\$LungCap ~ data1\$Smoke*AgeGroups, main=”LungCap Vs Smoke “, ylab=”LungCap Vs Smoke By AgeGroups”, las=1)

>boxplot(data1\$LungCap ~ data1\$Smoke*AgeGroups, main=”LungCap Vs Smoke “, ylab=”LungCap Vs Smoke By AgeGroups”, las=2)

For stratified boxplot data with color:

Here we can observe the difference between the lungs capacity of the smokers and non-smokers by age groups. Red color showing smokers by age and blue color showing non-smoker by age.

> boxplot(data1\$LungCap ~ data1\$Smoke*AgeGroups, main=”LungCap Vs Smoke “, ylab=”LungCap Vs Smoke By AgeGroups”, las=2, col=c(4,2))

There can be many other different statistical analysis and graphical representation by using R.

There can be analysis between Lungs capacity by height as well.

### Conclusion:

With the help of R statistical analysis and graphical representations, I can easily analysis the Lungs capacity ratio in the smokers and non-smokers by their age, gender and height.