联系我们: 手动添加方式: 微信>添加朋友>企业微信联系人>13262280223 或者 QQ: 1483266981
ASSIGNMENT 3
INFERENCES FOR CATEGORICAL DATA
IMPORTANT:
1) In this lab, you will need to enter many statistical formula to derive answers from Excel.
2) For all graphs and charts, please label the axis and ensure proper titles are used.
3) For all tables, please ensure the correct variable name(s) is/are used.
4) Each group will be expected to create a Google document for the lab report where students will
type their answers (in full sentences) and paste the Excel output (where necessary) for each lab
question.
5) Completed assignments will be saved as a PDF file, submitted, graded, and returned on eClass.
6) Each lab group MUST upload and submit only ONE lab report, so students MUST work together
to complete the lab assignment together.
7) Please see the Lab Submission Info tab through the Lab Information link in the Labs section on
eClass.
In this lab assignment, you will use descriptive, graphical, and inferential tools in Excel to analyze the
data to learn about the distribution of different age group in Edmonton and Calgary. You will display
and summarize the related categorical variables and explore the relationship between them with
contingency tables. The significance of the bivariate relationships will also be assessed. Tests and
confidence intervals for proportions will be used to compare the distribution of people in each age
group.
Various Age Groups in Edmonton and Calgary
The distribution of age groups is one of the important economical factors that a capitalist, an
economist, and the government want to know. For example, the distribution of age groups is important
for a capitalist to determine what type of business is more suitable for investment in a certain city, a
certain province, and/or a certain country. While the economist and the government want to know the
distribution of the age groups so that they can predict about the economy and know what policy
should be in place. For example, an aging population means the government needs to allocate more
budget to health and retirement programs for seniors, which can lead to a slow GDP growth.
Therefore, from time to time, the government will conduct a census and gather this information from
their citizens. A census is expensive to conduct each year, however, so Statistics Canada usually
completes one every 5 years, and the census data from year 2016 will be used in this lab. The first
dataset (lab3a – census.txt) relates to this census.
If a marketing company wants to learn more about the distribution of age groups for Edmonton and
Calgary, though, they might wonder if the distribution from the 2016 census can still be applied.
Using a computer program in 2021, suppose the company randomly selects 31 households in
Edmonton and another 31 households in Calgary, calling and asking each household member which
age group (Children (0 to 14 years old), Teen and Adults (15 to 64 years old), or Seniors (65 years old
or above) they belong to. The second dataset (lab3b – sample.txt) relates to this supposition.
The datasets are available in the Data link located in the Lab 3 tab display in the Labs section on
eClass. The data are not to be printed in your submission.
1
2
The following is a description of the variables in the file lab3a-census.txt:
Column Variable Name Description of Variable
1 City Name of city
2 Group Age group
3 Count Total number of persons in a particular age group
The data for Columns 1 to 3 were obtained for the year 2016 from this Statistics Canada Website:
https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/dt-td/index-eng.cfm.
The website offers census data for the year 2016.
The following is a description of the variables in the file lab3b-sample.txt:
Column Variable Name Description of Variable
1 Household The household number
2 Subject A number assigned to each person in the sample.
3 City Name of city where the household is located
4 Age Age group to which the person in the household belongs
Answer the following questions using the data:
1. According to the census data provided by Statistics Canada, they have divided the age groups into
21 different age groups (data provided in the Census tab), but the marketing company is only
interested in observing the distribution of 3 age groups (Group 1: Children (0 to 14 years old);
Group 2: Teen and Adults (15 to 64 years old); Group 3: Seniors (65 years old or above).
(a) Find and summarize the frequency and relative frequency distribution of the 3 age groups for
Edmonton in 2016. Comment briefly.
(b) Find and summarize the frequency and relative frequency distribution of the 3 age groups for
Calgary in 2016. Comment briefly.
(c) Compare the two cities’ distributions. Are the distribution similar or noticeably difference
2. Looking at the study design of the second dataset, can you generalize the results of the study to
Edmonton and/or Calgary Explain briefly.
For the same dataset, identify the identifier variable and comment briefly. What is/are the
categorical and numerical variable(s) in the data, if any
3. Using the Sample sheet, use the PivotTable function to create frequency tables (showing both
frequency and relative frequency) to summarize the distribution of age groups for households in
Edmonton from the sample in 2021. Repeat this separately for the Calgary data.
(a) Paste the table for the Edmonton data into your report. Comment briefly on the distribution. Is
the distribution of household age groups in Edmonton approximately uniform Compare this
sample distribution with the distribution of the 3 age groups for Edmonton in the census from
Question 1, part (a). Specifically, provide the exact differences in the distribution of each of
the 3 age groups in Edmonton between the census data and the sample data.
(b) Repeat part (a) with the Calgary data. Compare this sample distribution with the distribution
of the 3 age groups for Calgary in the census from Question 1, part (b). Specifically, provide
the exact differences in distribution of each of the 3 age groups in Calgary between the census
data and the sample data.
3
(c) Based on the sample from 2021, carry out an appropriate hypothesis test at α = 0.05 to see
whether the proportion of individuals in Edmonton with ages between 0 and 14 years old is
now different from 17.64% (which is the rounded percentage of people with age between 0 to
14 in the 2016 census).
State the null and alternative hypotheses. Calculate the value of the appropriate test statistic
(show calculations), state the distribution of the test statistic under the null hypothesis, and
the P-value (use the “NORM.S.DIST” function in Excel) of the test to answer the question.
State your conclusion.
(d) Calculate a 95% confidence interval for the proportion of children (age 0 to 14) in Edmonton in
2021. State clearly all components (estimate, standard error, critical value) and show calculations.
State your conclusion.
Comment on how your 95% confidence interval is consistent with the results of your
hypothesis test in part (c).
4. Using the Sample sheet, the marketing company wants to answer various questions regarding the
relationship between the age groups and cities.
(a) Obtain a contingency table (using the PivotTable feature) of household age group by city.
Paste the table into your report.
(b) Refer to the output in part (a) to answer the following questions:
i) What percentage of all people are between 15 and 64 years old in Edmonton
ii) What percentage of people in Edmonton are between 15 and 64 years old
iii) What percentage of the group with age between 15 and 64 years old are in
Edmonton
(c) Using α = 0.05, is there any evidence of a difference in the proportion of seniors (ages 65 and
over) between Calgary and Edmonton in 2021 Carry out the appropriate two-sample
proportion test to answer the question.
State the null and alternative hypotheses. Calculate the value of the appropriate test statistic
(show calculations), state the distribution of the test statistic under the null hypothesis, and
the P-value (use the “NORM.S.DIST” function in Excel) of the test to answer the question.
State your conclusion.
(d) Calculate a 95% confidence interval for the difference in the proportion of seniors between
Calgary and Edmonton in 2021. State clearly all components (estimate, standard error, critical
value) and show calculations. State your conclusion.
Comment on how your 95% confidence interval is consistent with the results of your
hypothesis test in part (c).
4
LAB 3 ASSIGNMENT: MARKING SCHEMA
Proper Cover Page (5 marks): Please download the “Lab Assignment Template” for more information
about the cover page. Lab reports must be typed.
Question 1 (10)
(a) Distribution of the 3 age groups for Edmonton: 3 points
Comment: 1 point
(b) Distribution of the 3 age groups for Calgary: 3 points
Comment: 1 point
(c) Comparison: 2 points
Question 2 (4)
Generalize results: 1 point
Identifier variable: 1 point
Categorical variable(s): 1 point
Numerical variable(s): 1 point
Question 3 (34)
(a) Frequency and relative frequency table for Edmonton: 3 points
Comment: 1 point
Uniformity: 1 point
Comparison with census data: 2 points
(b) Frequency and relative frequency table for Calgary: 3 points
Comment: 1 point
Uniformity: 1 point
Comparison with census data: 2 points
(c) Hypotheses: 2 points
The value of the test statistic and calculations: 4 points
The distribution of the test statistic under the null hypothesis: 1 point
The P-value: 2 points
Conclusion: 2 points
(d) Components (Estimate, Standard Error, Critical Value): 3 points
Confidence interval: 2 points
Conclusion: 2 points
Comparison to the test in part (c): 2 points
5
Question 4 (30)
(a) Contingency table of household age by City: 4 points
(b) Percentages for households in Edmonton: 6 points (2 points each)
(c) Hypotheses: 2 points
The value of the test statistic and calculations: 4 points
The distribution of the test statistic under the null hypothesis: 1 point
The P-value: 2 points
Conclusion: 2 points
(d) Components (Estimate, Standard Error, Critical Value): 3 points
Confidence interval: 2 points
Conclusion: 2 points
Comparison to the test in part (c): 2 points
TOTAL = 5 + 10 + 4 + 34 + 30 = 83


发表评论