**Try R for Statistical Data Analysis**

** **R is a tool for statistics and data modeling. During my course, I found that the R programming language is elegant, versatile, and has a highly expressive syntax designed around working with data. It also includes extremely powerful graphics capabilities. I found it very helpful for easily manipulate of data. R is very useful tool for statistical Analysis.

Based on this course using Code School, I am going to create an example based on some data that I have created and using R Graphics to visualise the data.

First of all, I saved my data file with the help of Microsoft Excel in .CSV file.

Then I import my data file into R-Studio.

With the help of these command, I import my data file in R Studio.

>setwd(“c:/users/tehseen/desktop/R assignment”)

Or it can be select from choose Directory.

>data1 = read.csv(“data1.csv”, header=T)

In this data file I have 6 variables. This data is showing lungs capacity of smokers and non-smokers by age, gender and height.

I am going to use this data for statistical analysis using R.

First of all, I am going to make bar charts and pie chart. Which can summarizing the distribution of categorical Variable.

>names(data1)

>[1]”LungCap” “Age” “Height” “Smoke” “Gender” “Caesarean”

For Gender variable first of all, I make the frequency table.

>table(data1$Gender)

female male

93 207

To save this I give the name for gender frequency table to Count…

count <- table(data1$Gender)

> count

female male

93 207

For percentage ratio between relative frequency (male and female)

> table(data1$Gender)/300

female male

0.31 0.69

To save this ratio in R we will use this command

> percent <- table(data1$Gender)/300

> percent

female male

0.31 0.69

**For the bar plot:**

>barplot(count)

**Here bar plot showing the percentage ratio between male and female in Gender variable.**

>barplot(percent)

> barplot(percent, main=”Percentage ratio between Male & Female”, xlab=”Gender”, ylab=”%”, las=1, names.arg=c(“Female”, “Male”))

>Box()

**Here For pie chart plot showing the percentage ratio between male and female in Gender variable.**

** ****>**pie(count, main=”Percentage ratio between Male & Female”)

>box()

** **

**Histograms of Lungs Capacity:**

Histograms is using for the summarizing the distribution of a numeric variable.

> hist(data1$LungCap)

> hist(data1$LungCap, freq = F)

> hist(data1$LungCap, ylim=c(0, 0.2))

> hist(data1$LungCap, ylim=c(0, 0.2), col=4, breaks=14))

> hist(data1$LungCap, ylim=c(0, 0.2), col=4, breaks=c(0,2,4,6,8,10,12,14,16))

** **

**Here red line showing the Density for the variable Lung capacity**.

> hist(data1$LungCap, ylim=c(0, 0.2), col=4, breaks=seq(from=0, to=16, by=2), main=”Boxplot of Lungs Capacity”, xlab=”Lung capacity”, ylab=”Density”, las=1 )

> lines(density(data1$LungCap))

> lines(density(data1$LungCap), col=2, lwd=3)

>box()

** **

**There is a scattered plot which is showing the lung capacity by age.**

**Abline showing the ratio of lung capacity by age.**

**>plot(data1$Age, data1$LungCap, xlab=”Age”, ylab=”Lungs Capacity”, col=2)**

**> plot(data1$Age, data1$LungCap,main=”Scatterplot”, xlab=”Age”, ylab=”Lungs Capacity”, col=2)**

**Now I will use the boxplot for summarizing the distribution of numeric variable.**

I produced a **boxplot for the lungs capacity**.

>boxplot(data1$LungCap)

**For quantile variable lungcap.**** **>quantile(data1$LungCap, probs= c (0,1))

0% 100%

4.800 11.125

> quantile(data1$LungCap, probs= c (0, 0.25, .5, .75, 1))

0% 25% 50% 75% 100%

4.800 6.225 7.325 9.550 11.125

**For the difference between lungs capacity between male and female.**

>boxplot(data1$LungCap ~ data1$Gender)

> boxplot(data1$LungCap ~ data1$Gender, main=”Boxplot By Gender”)

**Now I will examine the relationship between smoker and non-smoker and lung capacity within age group or age strata.**

> AgeGroups <- cut(data1$Age, breaks=c(0,13,15,17,25), labels=c(“<13”, “14/15”, “16/17”, “18+”))

> AgeGroups[1:5]

[1] <13 18+ 16/17 14/15 <13

Levels: <13 14/15 16/17 18+

>boxplot(data1$LungCap ~ data1$Smoke)

>boxplot(data1$LungCap ~ data1$Smoke, main=”LungCap Vs Smoke “, ylab=”Age”)

**For Examine the relationship between Lung Capacity Vs. Smoker and non-smoker by Age Groups.
**

>data1$Age[1:5]

>AgeGroups[1:5]

>levels(AgeGroups)

>boxplot(data1$LungCap ~ data1$Smoke*AgeGroups, main=”LungCap Vs Smoke “, ylab=”LungCap Vs Smoke By AgeGroups”, las=1)

>boxplot(data1$LungCap ~ data1$Smoke*AgeGroups, main=”LungCap Vs Smoke “, ylab=”LungCap Vs Smoke By AgeGroups”, las=2)

**For stratified boxplot data with color:**

Here we can observe the difference between the lungs capacity of the smokers and non-smokers by age groups. Red color showing smokers by age and blue color showing non-smoker by age.

> boxplot(data1$LungCap ~ data1$Smoke*AgeGroups, main=”LungCap Vs Smoke “, ylab=”LungCap Vs Smoke By AgeGroups”, las=2, col=c(4,2))

There can be many other different statistical analysis and graphical representation by using R.

There can be analysis between Lungs capacity by height as well.

** ****Conclusion:**

With the help of R statistical analysis and graphical representations, I can easily analysis the Lungs capacity ratio in the smokers and non-smokers by their age, gender and height.