# lab homework 3

**Lab Homework**

#save your plots as an image file and upload separately (export > save as image)

#save them as png with filename your_name_Lab3_plot

#Exercise 1

#The rain probability for the month of July in the city of Portland is 72%

#(a) What is the probability that it rains exactly 11 days

#(b) What is the probability that it rains 11 or fewer days?

#(c) What is the probability that it rains 27 or more days?

#(d) What is the probability that it rains more than 20 but less than 25 days?

#Exercise 2

#Suppose the winnings of gamblers at Las Vegas are normally distributed with

#mean $670 and standard deviation $38.

#(a) What is the probability of winning exactly $500?

#(b) What is the probability of winning $500 or less?

#(c) What is the probability of winning more than $800?

#(d) What is the probability of winning between $500 to $875?

#(e) Generate 10000 random numbers using rnorm function. Show how this

#distribution would look like. Add in the plot the mean, median and 20%

#trimmed mean (provide these values).

#Exercise 3

#In the previous problem, how much someone has to win to be in the top 5%?

——————————————————————-

**LAB 3 EXAMPLE**

#Lab 3-Contents

#1. Binomial Probability Distribution

#2. Normal Probability Distribution

#————————————–

# 1. Binomial Probability Distribution

#————————————–

# The binomial distribution is one of the most important distributions in statisitcs.

# Like it’s name suggests, bi-nomial, indicates that we are dealing with

# numbers that have only two possibilites (eg. Yes/No, Dead/Alive etc.)

#There are three main types of problems dealing with binomial probabilites:

#A) Finding the probability of EXACTLY x successes (Yes responses, deaths, boys)

#B) Finding the probability of < x successes OR > x successes

#C) Finding the probability of < x1 successes AND > x2 successes

# Let’s try one by semi-hand (we’ll have R do some computations)

# and then learn how to use R to solve these kinds of problems.

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXAMPLE 1: Let’s say the probability for divorce is 45%. Out of 20 married couples,

#what is the probability of EXACTLY 5 of them getting divorced?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

N=20 #n

K=5 # x

p=0.45

q=1-p

Nf=factorial(N) #n!

Kf=factorial(K) # x!

Nf/(Kf*factorial(N-K)) * p^K * (1-p)^(N-K)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#Binomial Probability (EXACT): dbinom(k, n, p)

#Where k: number of successes, n: number of trials,

#p: probability of success

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#Let’s use the dbinom() command from R to answer this question

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXERCISE 1-1: Use the dbinom() command to solve the problem in Example 1

#

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

dbinom(5, 20, 0.45)

# What if we wanted to know the probability of 5 OR fewer divorces?

# One strategey is to compute the probability for 5, 4, 3, 2, 1, 0 divorces and sum the probabilities

p5=dbinom(5, 20, 0.45)

p4=dbinom(4, 20, 0.45)

p3=dbinom(3, 20, 0.45)

p2=dbinom(2, 20, 0.45)

p1=dbinom(1, 20, 0.45)

p0=dbinom(0, 20, 0.45)

p0+p1+p2+p3+p4+p5

# This is a bit tedious, so let’s just use a new command called pbinom()

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#Binomial Probability (x <= k): pbinom(k, n, p)

#Where k: <= number of successes, n: number of trials,

#p: probability of success

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXERCISE 1-2: Use the pbinom() command to solve the problem of computing the probability for

# 5 or fewer divorces.

#

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

pbinom(5, 20, 0.45)

#??????????????????????????????????????????????????????????????#

#Thought Question 1: How would the command change if the question

# was worded as “Find the probabilty for less than 5 divorces”?

pbinom(4, 20, 0.45)

#??????????????????????????????????????????????????????????????#

# The pbinom() command is useful, but only gives us probabilities for <= k

# What if we wanted to know the probability of having >= k responses or > k?

# Because the probabilities have to add to exactly 1, we can use some simple math to figure this out

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXAMPLE 2: The probability for divorce is 45%. Out of 20 married couples,

#what is the probability of >= 6 of them getting divorced?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

# We know from above that the probability of <= 5 couples divorcing is 0.05533419

# If we think about it, then the probability of >=6 is just 1-0.05533419 = 0.9446658

# We could even use the pbinom() command with this by putting the 1 minus in front

1-pbinom(5, 20, 0.45)

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXERCISE 1-3: Use the pbinom() command to solve the problem of computing the probability for

# > 7 divorces.

#

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

1-pbinom(7, 20, 0.45)

# Thus far we’ve covered how to answer 2 out of the 3 possible binomial probability probelms (A and B).

# Let’s think about how we could answer a question asking about a range of possible sucesses.

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXAMPLE 3: The probability for divorce is 45%. Out of 20 married couples,

#what is the probability of >= 7 and <= 10 divorces?

# 7=< p<=10 ?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

# This problem has 2 parts.

#1) Compute the probability for <=10 divorces

pbinom(10, 20, 0.45)

#2) Compute the probability for <= 6 divorces (YES, 6 not 7)

pbinom(6, 20, 0.45)

# The difference between these two probabilites is the probability between 7 and 10

pbinom(10, 20, 0.45) – pbinom(6, 20, 0.45)

#Note: Probability will always be positive, so make sure you do the subtraction in the correct order

#??????????????????????????????????????????????????????????????#

#Thought Question 2: How would computing the probability

# in example 3 change if the direction of the signs were reversed?

# eg. <=7 & >=10

#p<=7 and p>=10

#??????????????????????????????????????????????????????????????#

pbinom(7, 20, 0.45)

1-pbinom(9, 20, 0.45)

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXERCISE 1-4: Use the pbinom() command to solve the problem of computing the probability for

# > 4 AND <= 15 divorces. 4<p<=15

#

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

pbinom(15, 20, 0.45) – pbinom(4, 20, 0.45)

#————————————–

# 2. The Normal Probability Distribution

#————————————–

# Probabilities associated with continous variables are often computed using the normal distribution

# Rather than taking about the number of successes in a given number of trials, now we are concerned with

# first defining the properties of the normal distribution and determine the probabilites of having certain

# values along that distribution

#Let’s look at the standard normal distribution

# The standard normal has a mean of 0 and SD of 1

#I’m going to create a random variable with 10,000 observations of mean 0 and sd=1

set.seed(1)

x=rnorm(10000, mean=0, sd=1)

mean(x); sd(x)

hist(x)

#In the standard normal distribution, we can compute (or look up in a table) what the probability values

# should be for a Z score less than some value.

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXAMPLE 4: What is the probability of having < -1 (called a Z score) on the standard normal distribution

#

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

# First off, we can actually look this up in the appendix of your textbook.

# Page 300, beings the listing of the Z scores and their probabilities.

# From the textbook, this value is: 0.1587

# This means that the probability of having < -1 is 0.1587

#Let’s look at this on a plot:

plot(density(x))

abline(v= -1, col=”red”)

#Instead of the textbook, we could use the pnorm() function in R

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#Normal Dist Probability (Z <= z): pnorm(Z, mean, sd)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

pnorm(-1, 0, 1)

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXERCISE 2-1: Thinking about what we learned from using the binomial functions, how could we use

# pnorm() to find the probability of having a Z score > 1.96 on the standard normal distribution (mean=0, sd=1)

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

1-pnorm(1.96, 0, 1)

#We can generalize this to normal distributions that do not have a mean of 0 and SD of 1

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXAMPLE 5: What is the probability of having a bmi < 25.2 if the mean bmi is 35 and SD = 5

#

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#We can use pnorm() in this situation as well

#1)

pnorm(25.2, 35, 5)

# 2)

Z=(25.2-35)/5

pnorm(Z,0,1)

#Notice that the answer we get: 0.0249979 is the same as the answer we got in Exercise 2-1.

#??????????????????????????????????????????????????????????????#

#Thought Question 3: Why do you think the answers are the same in

# example 5 and exercise 2-1?

#??????????????????????????????????????????????????????????????#

(25.2-35)/5

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXERCISE 2-2: The average age for incoming college freshman is 18.2 years with SD=0.5.

# What is the probability that a freshman will be between 18 and 19 years old?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

pnorm(19, 18.2, .5) – pnorm(18, 18.2, .5)

# We can also do these problems in reverse, where we are given the probability and have to find the values

# at which that probability exists. When we do this, we are calculating the quantile of the normal distribution.

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#Normal Dist Probability (Quantile): qnorm(p, mean, sd)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

# Let’s think about the problem we did in example 5

# We found the probability of a bmi < 25.2 when the mean of the distribution was 35, SD=5

# The probability was 0.0249979

# Let’s see if qnorm() returns the correct result:

qnorm(0.0249979, 35, 5)

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXERCISE 2-3: The average age for incoming college freshman is 18.2 years with SD=0.5.

# A researcher tells you that the 65% of the freshman were less than…

# He can’t seem to remember what age it was at that 65% of the freshman were less than, help him out.

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

qnorm(0.65, 18.2, .5)

#????????????????????????????????????????????????????????????????????????????????????#

#Thought Question 4: What if instead of knowing the age for the bottom 65%

# we wanted to know how old a person needed to be in the top 10% of age in their class

#?????????????????????????????????????????????????????????????????????????????????????????#

qnorm(0.9, 18.2, .5)