Summaries

Session 1.2

23rd August 2010

Predicting Sample Behavior from a Model

We use a population model to predict the behavior of random samples. We check the predictions by direct inspection of samples. We repeat sampling with replacement, obtaining multiple random samples from the same population, obtained in the same process. We combine (pool) compatible samples to form larger samples. Pooling samples of size 50, we obtain samples of size 100, 150 and 300. In general, as sample size increases, samples become more precise and reliable, provided that the sampling process is reliable.

In general, if we are working with the correct model, then the predicted sample behavior reliably describes observed samples.

Session Overview

Estimation: Previously, we saw how random samples drawn from a population could be used to estimate the structure of a population as an empirical model.

Prediction: In this session, we continue our study of probability. We begin with a very basic example of a population with known structure, and use that structure to predict the behavior of random samples from that population.

We begin by constructing a probability model for our population, and define the concept of perfect sample. We then relate the perfect sample to random samples, both observed and unobserved. We then obtain real random samples, and check them against the perfect samples.

Exclude Case Study 1.2.1

Case Study 1.2.2

In this case study the idea of a perfect sample is introduced – a perfect sample matches perfectly the population from which it is sampled. On average, real samples corresponded nicely, though not perfectly, with their corresponding perfect samples.

We begin by building a color bowl. We then compute a probability model for draws with replacement (DWR) from the bowl, and then compute perfect samples of size 50, 100, 150, 200, 250 and 300. We then engage six groups to generate six samples each of n=50 DWR. We then compare sample frequencies and proportions to the model and to the perfect samples.

Bowl Counts and Perfect Sample Calculations

Expected Count Blue ( for Sample Size n) = n*PBlue

Expected Count Green ( for Sample Size n) = n*PGreen

Expected Count Red ( for Sample Size n) = n*PRed

Expected Count Yellow ( for Sample Size n) = n*PYellow

6:30 Model

Color

N

P

E50

E100

E150

E200

E250

E300

Blue

3

3/12 = 0.25

50*(3/12) = 12.5

25

37.5

50

62.5

75

Green

3

3/12 = 0.25

50*(3/12) = 12.5

25

37.5

50

62.5

75

Red

2

2/12= 0.167

50*(2/12) = 8.333

16.67

25

33.33

41.67

50

Yellow

4

4/12 = 0.333

50*(4/12) = 16.67

33.33

50

66.67

83.33

100

Total

12

1

50

100

150

200

250

300

 

Probabilities with Long Run Interpretation

 

PBlue = 3/12 = .25 

In long runs of draws with replacement from the bowl, approximately 25% of draws show blue.

 

PGreen = 3/12 = .25 

In long runs of draws with replacement from the bowl, approximately 25% of draws show green.

 

PRed = 2/12 = .1667 

In long runs of draws with replacement from the bowl, approximately 16.67% of draws show red.

 

PYellow = 4/12 = .3333 

In long runs of draws with replacement from the bowl, approximately 33.33% of draws show yellow.

Perfect Counts for the Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement

Color

E50

E100

E150

E200

E250

E300

Blue

50*(3/12) = 12.5

100*(3/12) = 25

150*(3/12) = 37.5

200*(3/12) = 50

250*(3/12) = 62.5

300*(3/12) = 75

Green

50*(3/12) = 12.5

100*(3/12) = 25

150*(3/12) = 37.5

200*(3/12) = 50

250*(3/12) = 62.5

300*(3/12) = 75

Red

50*(2/12) = 8.333

100*(2/12) = 16.66

150*(2/12) = 25

200*(2/12) = 33.33

250*(2/12) = 41.67

300*(2/12) = 50

Yellow

50*(4/12) = 16.67

100*(4/12) = 33.33

150*(4/12) = 50

200*(4/12) = 66.67

250*(4/12) = 83.33

300*(4/12) = 100

Total

50

100

150

200

250

300

Perfect Samples 

n=50

E50Blue  = n*PBlue= 50*(3/12) = 12.5

E50Green  = n*PGreen = 50*(3/12) = 12.5

E50Red = n*PRed = 50*(2/12) = 8.333

E50Yellow = n*PYellow = 50*(4/12) = 16.67

 

In samples of 50 draws with replacement from the bowl, we expect approximately 12 or 13 blue draws, 12 or 13 green draws, 8 or 9 red draws, and 16 or 17 yellow draws.

 

n=100

E100Blue  = n*PBlue= 100*(3/12) = 25

E100Green  = n*PGreen = 100*(3/12) = 25

E100Red = n*PRed = 100*(2/12) = 16.66

E100Yellow = n*PYellow = 100*(4/12) = 33.33

 

In samples of 100 draws with replacement from the bowl, we expect approximately 25 blue draws, 25 green draws, 16 or 17 red draws, and 33 or 34 yellow draws.

 

n=150

E150Blue  = n*PBlue= 150*(3/12) = 37.5

E150Green  = n*PGreen = 150*(3/12) = 37.5

E150Red = n*PRed = 150*(2/12) = 25

E150Yellow = n*PYellow = 150*(4/12) = 50

 

In samples of 150 draws with replacement from the bowl, we expect approximately 37 or 38 blue draws, 37 or 38 green draws, 25 red draws, and 50 yellow draws.

 

n=200

E200Blue  = n*PBlue= 200*(3/12) = 50

E200Green  = n*PGreen = 200*(3/12) = 50

E200Red = n*PRed = 200*(2/12) = 33.33

E200Yellow = n*PYellow = 200*(4/12) = 66.67

 

In samples of 200 draws with replacement from the bowl, we expect approximately 50 blue draws, 50 green draws, 33 or 34 red draws, and 66 or 67 yellow draws.

 

n=250

E250Blue  = n*PBlue= 250*(3/12) = 62.5

E250Green  = n*PGreen = 250*(3/12) = 62.5

E250Red = n*PRed = 250*(2/12) = 41.67

E250Yellow = n*PYellow = 250*(4/12) = 83.33

 

In samples of 250 draws with replacement from the bowl, we expect approximately 62 or 63 blue draws, 62 or 63 green draws, 41 or 42 red draws, and 83 or 84 yellow draws.

 

n=300

E300Blue  = n*PBlue= 300*(3/12) = 75

E300Green  = n*PGreen = 300*(3/12) = 75

E300Red = n*PRed = 300*(2/12) = 50

E300Yellow = n*PYellow = 300*(4/12) = 100

 

In samples of 300 draws with replacement from the bowl, we expect approximately 75 blue draws, 75 green draws, 50 red draws, and 100 yellow draws.

 

Samples – 6.30

 

Sample #1

Sample #2

Pooled 12

Color

n

p

E50

n

p

E50

n

p

E100

Blue

12

0.24

12.5

8

0.16

12.5

20

0.2

25

Green

15

0.3

12.5

9

0.18

12.5

24

0.24

25

Red

12

0.24

8.3333333

12

0.24

8.3333333

24

0.24

16.666667

Yellow

11

0.22

16.666667

21

0.42

16.666667

32

0.32

33.333333

 

 

 

Total

50

1

50

50

1

50

100

1

100

Sample #3

Sample #4

Pooled 34

Pooled 1234

Color

n

p

E50

n

p

E50

n

p

E100

n

p

E200

Blue

8

0.16

12.5

11

0.22

12.5

19

0.19

25

39

0.195

50

Green

21

0.42

12.5

15

0.3

12.5

36

0.36

25

60

0.3

50

Red

6

0.12

8.3333333

10

0.2

8.3333333

16

0.16

16.666667

40

0.2

33.333333

Yellow

15

0.3

16.666667

14

0.28

16.666667

29

0.29

33.333333

61

0.305

66.666667

Total

50

1

50

50

1

50

100

1

100

200

1

200

Sample #5

Sample #6

Pooled 56

Pooled 3456

Color

n

p

E50

n

p

E50

n

p

E100

n

p

E200

Blue

14

0.28

12.5

14

0.28

12.5

28

0.28

25

47

0.235

50

Green

10

0.2

12.5

10

0.2

12.5

20

0.2

25

56

0.28

50

Red

10

0.2

8.3333333

6

0.12

8.3333333

16

0.16

16.666667

32

0.16

33.333333

Yellow

16

0.32

16.666667

20

0.4

16.666667

36

0.36

33.333333

65

0.325

66.666667

Total

50

1

50

50

1

50

100

1

100

200

1

200

Pooled 135

Pooled 246

Pooled All

Color

n

p

E150

n

p

E150

n

p

E300

Blue

34

0.2266667

37.5

33

0.22

37.5

67

0.2233333

75

Green

46

0.3066667

37.5

34

0.2266667

37.5

80

0.2666667

75

Red

28

0.1866667

25

28

0.1866667

25

56

0.1866667

50

Yellow

42

0.28

50

55

0.3666667

50

97

0.3233333

100

 

 

 

Total

150

1

150

150

1

150

300

1

300

8:00 Model

Color

N

P

E50

E100

E150

E200

E250

E300

Blue

8

8/28 = 0.2857

50*(8/28) = 14.29

100*(8/28) = 28.57

150*(8/28) = 42.86

200*(8/28) = 57.14

250*(8/28) = 71.43

300*(8/28) = 85.71

Green

8

8/28 = 0.2857

50*(8/28) = 14.29

100*(8/28) = 28.57

150*(8/28) = 42.86

200*(8/28) = 57.14

250*(8/28) = 71.43

300*(8/28) = 85.71

Red

2

2/28 = 0.0714

50*(2/28) = 3.57

100*(2/28) = 7.14

150*(2/28) = 10.71

200*(2/28) = 14.29

250*(2/28) = 17.86

300*(2/28) = 21.43

Yellow

10

10/28 = 0.3571

50*(10/28) = 17.86

100*(10/28) =35.71

150*(10/28) =53.57

200*(10/28) =71.43

250*(10/28) =89.29

300*(10/28) =107.14

Total

28

1

50

100

150

200

250

300

 

Probabilities with Long Run Interpretation

 

PBlue = 8/28 = 0.2857 

In long runs of draws with replacement from the bowl, approximately 28.57% of draws show blue.

 

PGreen = 8/28 = 0.2857 

In long runs of draws with replacement from the bowl, approximately 28.57% of draws show green.

 

PRed = 2/28 = 0.0714 

In long runs of draws with replacement from the bowl, approximately 7.14% of draws show red.

 

PYellow = 10/28 = 0.3571

In long runs of draws with replacement from the bowl, approximately 35.71% of draws show yellow.

Perfect Counts for the Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement

Color

E50

E100

E150

E200

E250

E300

Blue

50*(8/28) = 14.29

100*(8/28) = 28.57

150*(8/28) = 42.86

200*(8/28) = 57.14

250*(8/28) = 71.43

300*(8/28) = 85.71

Green

50*(8/28) = 14.29

100*(8/28) = 28.57

150*(8/28) = 42.86

200*(8/28) = 57.14

250*(8/28) = 71.43

300*(8/28) = 85.71

Red

50*(2/28) = 3.57

100*(2/28) = 7.14

150*(2/28) = 10.71

200*(2/28) = 14.29

250*(2/28) = 17.86

300*(2/28) = 21.43

Yellow

50*(10/28) = 17.86

100*(10/28) =35.71

150*(10/28) =53.57

200*(10/28) =71.43

250*(10/28) =89.29

300*(10/28) =107.14

Total

50

100

150

200

250

300

 

Perfect Samples 

n=50

E50Blue  = n*PBlue= 50*(8/28) = 14.29

E50Green  = n*PGreen = 50*(8/28) = 14.29

E50Red = n*PRed = 50*(2/28) = 3.57

E50Yellow = n*PYellow = 50*(10/28) = 17.86

 

In samples of 50 draws with replacement from the bowl, we expect approximately 14 or 15 blue draws, 14 or 15 green draws, 3 or 4 red draws, and 17 or 18 yellow draws.

 

n=100

E100Blue  = n*PBlue= 100*(8/28) = 28.57

E100Green  = n*PGreen = 100*(8/28) = 28.57

E100Red = n*PRed = 100*(2/28) = 7.14

E100Yellow = n*PYellow = 100*(10/28) =35.71

 

In samples of 100 draws with replacement from the bowl, we expect approximately 28 or 29 blue draws, 28 or 29 green draws, 7 or 8 red draws, and 35 or 36 yellow draws.

 

n=150

E150Blue  = n*PBlue= 150*(8/28) = 42.86

E150Green  = n*PGreen = 150*(8/28) = 42.86

E150Red = n*PRed = 150*(2/28) = 10.71

E150Yellow = n*PYellow = 150*(10/28) =53.57

 

In samples of 150 draws with replacement from the bowl, we expect approximately 42 or 43 blue draws, 42 or 43 green draws, 10 or 11 red draws, and 53 or 54 yellow draws.

 

n=200

E200Blue  = n*PBlue= 200*(8/28) = 57.14

E200Green  = n*PGreen = 200*(8/28) = 57.14

E200Red = n*PRed = 200*(2/28) = 14.29

E200Yellow = n*PYellow = 200*(10/28) = 71.43

 

In samples of 200 draws with replacement from the bowl, we expect approximately 57 or 58 blue draws, 57 or 58 green draws, 14 or 15 red draws, and 71 or 72 yellow draws.

 

n=250

E250Blue  = n*PBlue= 250*(8/28) = 71.43

E250Green  = n*PGreen = 250*(8/28) = 71.43

E250Red = n*PRed = 250*(2/28) = 17.86

E250Yellow = n*PYellow = 250*(10/28) = 89.29

 

In samples of 250 draws with replacement from the bowl, we expect approximately 71 or 72 blue draws, 71 or 72 green draws, 17 or 18 red draws, and 89 or 90 yellow draws.

 

n=300

E300Blue  = n*PBlue= 300*(8/28) = 85.71

E300Green  = n*PGreen = 300*(8/28) = 85.71

E300Red = n*PRed = 300*(2/28) = 21.43

E300Yellow = n*PYellow = 300*(10/28) = 107.14

 

In samples of 300 draws with replacement from the bowl, we expect approximately 85 or 86 blue draws, 85 or 86 green draws, 21 or 22 red draws, and 107 or 108 yellow draws.

 

Samples – 8.00

Sample #1

Sample #2

Pooled 12

Color

n

p

E50

n

p

E50

n

p

E100

Blue

12

0.24

14.285714

13

0.26

14.28571429

25

0.25

28.571429

Green

14

0.28

14.285714

17

0.34

14.28571429

31

0.31

28.571429

Red

5

0.1

3.5714286

5

0.1

3.571428571

10

0.1

7.1428571

Yellow

19

0.38

17.857143

15

0.3

17.85714286

34

0.34

35.714286

 

 

 

Total

50

1

50

50

1

50

100

1

100

Sample #3

Sample #4

Pooled 34

Pooled 1234

Color

n

p

E50

n

p

E50

n

p

E100

n

p

E200

Blue

19

0.38

14.285714

11

0.22

14.28571429

30

0.3

28.571429

55

0.275

57.142857

Green

13

0.26

14.285714

21

0.42

14.28571429

34

0.34

28.571429

65

0.325

57.142857

Red

6

0.12

3.5714286

3

0.06

3.571428571

9

0.09

7.1428571

19

0.095

14.285714

Yellow

12

0.24

17.857143

15

0.3

17.85714286

27

0.27

35.714286

61

0.305

71.428571

Total

50

1

50

50

1

50

100

1

100

200

1

200

Sample #5

Sample #6

Pooled 56

Pooled 3456

Color

n

p

E50

n

p

E50

n

p

E100

n

p

E200

Blue

15

0.3

14.285714

17

0.34

14.28571429

32

0.32

28.571429

62

0.31

57.142857

Green

21

0.42

14.285714

17

0.34

14.28571429

38

0.38

28.571429

72

0.36

57.142857

Red

2

0.04

3.5714286

2

0.04

3.571428571

4

0.04

7.1428571

13

0.065

14.285714

Yellow

12

0.24

17.857143

14

0.28

17.85714286

26

0.26

35.714286

53

0.265

71.428571

Total

50

1

50

50

1

50

100

1

100

200

1

200

Pooled 135

Pooled 246

Pooled All

Color

n

p

E150

n

p

E150

n

p

E300

Blue

46

0.30667

42.857143

41

0.27333333

42.85714286

87

0.29

85.714286

Green

48

0.32

42.857143

55

0.36666667

42.85714286

103

0.34333333

85.714286

Red

13

0.08667

10.714286

10

0.06666667

10.71428571

23

0.07666667

21.428571

Yellow

43

0.28667

53.571429

44

0.29333333

53.57142857

87

0.29

107.14286

 

 

 

Total

150

1

150

150

1

150

300

1

300

The structure of the bowl, expressed as color proportions, determines the basic structure of samples drawn from the bowl. Probability models allow the prediction of sample behavior, but said predictions are only as reliable as the validity of the original model and of the sampling procedures.

In both models, the results were choppy – the sample sizes were insufficient to fully stabilize the sample frequencies. As a result, in most samples, one or two colors were appreciably off. Despite the volatility seen in the samples, we did see improvements with increasing sample size.

The foundation of statistical applications is the careful preparation of a study population and the random sampling procedures to go with it. Proper execution of this sampling procedure ensures a potable sample.

You are now ready to learn the Long Run Argument and Perfect Sample case types in 1st Hourly Stuff.