ST5213 | ST5213 Categorical Data Analysis II Assignment 3

ST5213 Categorical Data Analysis II Assignment 3

联系我们: 手动添加方式: 微信>添加朋友>企业微信联系人>13262280223 或者 QQ: 1483266981

ST5213 Categorical Data Analysis II

Assignment 3

Instructions

● This assignment accounts for 20% of your nal grade.

● Your solutions must be uploaded in a pdf le labeled using your student number (e.g. A0123456M.pdf) to the Canvas submission folder by 6 Nov 2022. Any student failing to submit by the deadline will receive a penalty for late submission unless the lecturer is informed as soon as possible of any extenuating circumstances.

● Write your name and student number at the top of the rst page.

● The work that you submit must be your sole e ort (not copied from anyone else). You will be severely penalized if found guilty of plagiarism.

● If you have made multiple attempts to submit your assignment, only the most recent submission will be graded.

Format

● The two tasks in this assignment involve the analysis of some real data using R software.

● Your report for the two tasks should be typed in Times New Roman font (11-12 point) and should not exceed 4 pages of A4 paper including any relevant gures/tables. The rst 4 pages of your submission should be your report for the two tasks, and R code and output should be attached as Appendix after that.

● Present the statistical techniques and results of your data analysis in a clear and concise manner with appropriate use of gures/tables for summarization. Conclusions must be supported with reasoning and su cient evidence.

● You will be assessed based on the report only, so all important results must be included in the report and not in the Appendix, which is for checking purposes only.

● You may be penalized for exceeding the page limit or if any of the above instructions are not followed.

Task 1. A cohort study in South Africa is designed to follow children born between April and June 1990 in hopes of identifying risk factors for cardiovascular disease. After 5 years, the children were invited to participate in interviews. Many children did not participate in these interviews, leaving open the possibility of biases if inferences were made based on those who participated. Morrell (1999) gave data comparing children who participated to those who did not with respect to whether the mother had medical aid at time of birth.

(a) Based on the information in Table 1, use Fisher’s exact test to determine if there is

evidence of a relationship between medical aid status and participation in interviews, explaining how the probability of the observed table and p-value are computed.

No interview Interview

Had medical aid No medical aid

195

979

46

370

Table 1

(b) The study further classi ed children by their racial group in Tables 2 and 3. Determine

if there is evidence of a relationship between medical aid status and participation in interviews for

i. white children using Pearson chi-square test,

ii. black children using likelihood ratio test,

showing how the test statistics and p-values are computed.

No interview Interview

Had medical aid No medical aid

104

22

10

2

Table 2: White children

No interview Interview

Had medical aid No medical aid

91

957

36

368

Table 3: Black children

(c) Compute the marginal odds ratio and the conditional odds ratios given race relating medical aid status and participation in interviews. Compare the marginal association and conditional associations and explain why they are so di erent.

Task 2. The following are data on smoking from a survey of seventh graders (age: 1 = 12 or younger, 2 =13 or older):

Smoking Family structure Race Gender Age None Some

Black Male 1 27 2

2 12 2

Female 1 23 4

2 7 1

White Male 1 394 32

2 142 19

Female 1 421 38

2 94 11

Black Male 1 18 1

2 13 1

Female 1 24 0

2 4 3

White Male 1 48 6

2 25 4

Female 1 55 15

2 13 4

Search for the loglinear model that can best explain the association patterns in the contingency table by treating

(a) smoking and family structure as response variables;

(b) smoking as a response variable;

and the rest as explanatory variables. In each case,

i. State the minimal model.

ii. Write down the symbol of the loglinear model that best describes the data and give a step-by-step description of how it was built.

iii. Represent the conditional independence structure in the loglinear model using an association graph and explain whether it is a graphical model. Give the symbol of another model that has the same association graph, but is not a graphical model.

iv. Interpret the associations in the loglinear model, taking into account conditional independence, collapsibility and odds ratios.

v. Explain whether the zero cell in the contingency table a ects your analysis.

(For consensus, do not drop any terms with p-value close to but less than 0.05 in model search.)

发表评论

了解 KJESSAY历史案例 的更多信息

立即订阅以继续阅读并访问完整档案。

继续阅读