ASSIGNMENT 1
Spring Semester, 2018
Due date: 10 September, 2018
Statistical Methods in Epidemiology (unit no. 401176)
Total marks is 100 which will be converted to 25. Every question carries 20 marks each. Please read the marking rubric towards the end of the document. No late submissions allowed without a valid reason (read the Learning Guide for instructions). Assignment cover sheet is also attached.
Please answer all questions
Q1. Categorize the following variables as qualitative-nominal, qualitative-ordinal, quantitative-discrete or quantitative-continuous (no explanation needed for your answer)
- Hospital discharge diagnosis
- Exact serum cholesterol measurements
- Exact age
- Age groups as 1=<30,2=30-39,3=40-49,4=50+
- Causes of death
- Sites of a randomized trial
- Education levels coded as 1= high school not completed
2= high school completed
3 = some post-high school education
- Exact systolic blood pressure levels
- Being treated for hypertension with codes as 1=no,2=yes
- Pack years of cigarette smoking
Each question above has 2 marks.
Q2. The following stem-and-leaf plot was obtained from the values of BMI (body mass index) for a random sample of 88 persons.
Frequency Stem Leaf
1 19 7
2 20 69
7 21 4788999
7 22 3666799
9 23 112355799
17 24 01222222345555679
18 25 002223344444577789
9 26 002577799
5 27 02689
5 28 01289
7 29 0001668
1 30 2
Stem width: 1.0
Each leaf: 1 case
[Hints on how to read the data from stem & leaf plot:
First, Frequency 1 with 19 (stem) and 7(leaf) means just one value, 19.7, frequency 2 with 20(stem) and 69(leaf) means two values, 20.6 and 20.9, and similarly for the remaining stem and leaf values]
Each question below has 4 marks.
- What are the smallest and largest BMI values among these 88 persons?
- What percentage of BMI values exceed 25.0? [Hints: use results for a computed binary variable by SAS]
- Obtain the 1st quartile, median and 3rd quartile for BMI based on this sample, and sketch a stem and leaf plot and box and whisker plot for BMI.
- Interpret the histogram for BMI.
- Interpret the bar charts for mean BMI classified by males and females.
SAS codes are given below in order to answer all questions:
data a;
input bmi;
cards;
19.7 0
20.6 1
20.9 0
21.4 1
21.7 0
21.8 1
21.8 0
21.9 1
21.9 0
21.9 1
22.3 0
22.6 1
22.6 0
22.6 1
22.7 0
22.9 1
22.9 0
23.1 1
23.1 0
23.2 1
23.3 0
23.5 1
23.5 0
23.7 1
23.9 0
23.9 1
24.0 0
24.1 1
24.2 0
24.2 1
24.2 0
24.2 1
24.2 0
24.2 1
24.3 0
24.4 1
24.5 0
24.5 1
24.5 0
24.5 1
24.6 0
24.7 1
24.9 0
25.0 1
25.0 0
25.2 1
25.2 0
25.2 1
25.3 0
25.3 1
25.4 0
25.4 1
25.4 0
25.4 1
25.4 0
25.5 1
25.7 0
25.7 1
25.7 0
25.8 1
25.9 0
26.0 1
26.0 0
26.2 1
26.5 0
26.7 1
26.7 0
26.7 1
26.9 0
26.9 1
27.0 0
27.2 1
27.6 0
27.8 1
27.9 0
28.0 1
28.1 0
28.2 1
28.8 0
28.9 1
29.0 0
29.0 1
29.0 0
29.1 1
29.6 0
29.6 1
29.8 0
30.2 1
;
data a;
set a;
bmigt25=(BMI >25);
run;
proc freq;
tables bmigt25;
run;
proc sort data=a;
by bmi;
ods listing;
ods graphics off;
proc univariate data=a plot;
var bmi;
title “quartiles and mean bmi, side-by-side stem and leaf plot and boxplot for BMI”;
run;
proc univariate data=a plot;
var bmi;
histogram;
title “histogram for a continuous variable bmi”;
run;
proc gchart data=a;
vbar sex/group=sex sumvar=bmi type=mean discrete;
title “Vertical bar chart for mean BMI by sex,0=female, 1=male”;
run;
Q3. A variable can be a confounder, effect modifier, both or none of the two. There are statistical tests for detecting effect modification. But, there is no statistical test for detecting an operational confounder. For example, if a test for comparing unadjusted and adjusted odds ratios show no significant difference, but one is considerably larger than the other, then one would still adjust for the confounder. However, if a test for comparing unadjusted and adjusted odds ratios shows significant difference, but one is not considerably larger than the other, one would not have to adjust for the confounder.
Let us consider a study for assessing the association between smoking & lung cancer. Is sex a confounder or effect modifier (quantitative or qualitative)? (10 marks)
We have 4 different scenarios, such as:
OR (Men) OR (Women) Crude OR Adj OR
2.51 2.15 2.32 2.35
1.06 0.95 2.02 1.01
4.40 3.41 4.02 2.63
2.15 0.65 1.42 1.29
The following table presents unadjusted and age-adjusted coronary event rates and death subsequent to a coronary event, for men in north Glasgow, 1991. The exposure of interest is social deprivation. Is age a confounder in the relationship between social deprivation and coronary event rate and coronary death?
(10 marks)
Table for Coronary event rates and risk of death by deprivation group; north Glasgow men in 1991:
Coronary event rate
(per thousand) Risk of coronary death
Deprivation group | Unadjusted | Age adjusted | Unadjusted | Age adjusted |
I (most advantaged) | 2.95 | 3.28 | 0.57 | 0.59 |
II | 4.32 | 4.20 | 0.50 | 0.50 |
III | 6.15 | 5.30 | 0.51 | 0.52 |
IV (least advantaged) | 5.90 | 5.75 | 0.56 | 0.56 |
Total 4.83 4.88 0.53 0.54
Q 4. The data below are modified from Jick, H. et al (Coffee and Myocardial Infarction, New England Journal of Medicine, Vol.289, No.2, pp.63-67, 1973). These authors used a case-control study to investigate the relationship between coffee consumption and myocardial infarction (MI). Cases were patients hospitalized on the basis of acute chest pain with an admission diagnosis of possible or definite MI. Controls were patients with various other diagnoses. To control for confounding, a multivariate risk score was derived for each patient, taken into account a patient’s age, sex, history of MI, smoking status, admission to hospital, season admitted to hospital, history of antianginal drugs, history of digitalis use, presence of diabetes, and religion. The score was computed in such a way that patients with a high score were more at risk of an MI than patients with a low score. The distribution of all such computed scores was divided into quintiles, with patients in the first quintile representing 20% of subjects with lowest scores, and patients in the fifth quintile representing 20% of subjects with the highest scores. The table below shows the distribution of cases and controls among subjects drinking 0 cups of coffee a day and subjects drinking 6+ cups a day, separately within each quantile (the variables below are in the order of risk score, cups of coffee/day, MI and frequency of cell count in the 2X2 table). (20 marks)
Quintile 1 6 or more pres 12
Quintile 1 6 or more abs 670
Quintile 1 0 pres 17
Quintile 1 0 abs 1315
Quintile 2 6 or more pres 5
Quintile 2 6 or more abs 261
Quintile 2 0 pres 12
Quintile 2 0 abs 395
Quintile 3 6 or more pres 4
Quintile 3 6 or more abs 174
Quintile 3 0 pres 13
Quintile 3 0 abs 370
Quintile 4 6 or more pres 2
Quintile 4 6 or more abs 80
Quintile 4 0 pres 14
Quintile 4 0 abs 160
Quintile 5 6 or more pres 1
Quintile 5 6 or more abs 38
Quintile 5 0 pres 14
Quintile 5 0 abs 117
Using the aggregate or grouped data set given above, obtain the following
- Interpret Breslow and Day test for homogeneity of relative odds across the risk quintiles. Are the relative odds homogeneous across the risk quintiles ? (6 marks)
- If the relative odds are the same across the risk quintiles, interpret the Mantel-Haenszel test on whether the common relative odds differs significantly from 1. Is there a significant association between coffee drinking and myocardial infarction after adjusting for risk quintiles ? (7 marks)
- Interpret the Mantel-Haenszel and Woolf (logit) estimators of common relative odds and the corresponding 95% confidence intervals. (7 marks)
SAS codes for all parts of the question are:
OPTIONS LINESIZE=80 PAGESIZE=60;
data htbact;
input score $ 1-10 coffee $ 12-20 mi $ 22-25 freq 27-29;
cards;
Quintile 1 6 or more pres 12
Quintile 1 6 or more abs 670
Quintile 1 0 pres 17
Quintile 1 0 abs 1315
Quintile 2 6 or more pres 5
Quintile 2 6 or more abs 261
Quintile 2 0 pres 12
Quintile 2 0 abs 395
Quintile 3 6 or more pres 4
Quintile 3 6 or more abs 174
Quintile 3 0 pres 13
Quintile 3 0 abs 370
Quintile 4 6 or more pres 2
Quintile 4 6 or more abs 80
Quintile 4 0 pres 14
Quintile 4 0 abs 160
Quintile 5 6 or more pres 1
Quintile 5 6 or more abs 38
Quintile 5 0 pres 14
Quintile 5 0 abs 117
;
run;
proc freq order=data;
weight freq;
tables score*coffee*mi/cmh;
run;
- 100 men with lung cancer and 100 men without lung cancer were asked if they had ever smoked; their answers are tabulated in the following table: (20 marks)
Previous Smoking |
Lung cancer
Present Absent |
Total |
Yes
No |
38 22
62 78 |
60
140 |
Total | 100 100 | 200 |
- What type of epidemiological study design is this ? (2 marks)
- Interpret an approximate 95% confidence interval (by Wald method) for the risk ratio of association between previous smoking and lung cancer. (6 marks)
- Interpret the Wald test for the risk ratio of association between previous smoking and lung cancer being equal 1. (6 marks)
- Interpret an approximate 95% confidence interval (by Wald method) for the relative odds of association between previous smoking and lung cancer. (6 marks)
SAS codes for all parts of Q5 except for (i):
data lcancer;
input smoking $ 1-10 lcancer $ 12-15 count 18-19;
cards;
Smoker pres 38
Smoker abs 22
Non smoker pres 62
Non smoker abs 78
;
run;
proc freq data=lcancer;
tables smoking*lcancer /relrisk(CL=(Wald) method=Wald equal) OR(CL=(Wald));
weight count;
title ‘Chi-Square Test of Association of smoking and CVD death’;
run;
Marking rubric: Assessment 1
Criteria | No marks | Part marks | Full marks |
Where a single number or single word answer is required | Answer is absent or incorrect | SAS or R output screenshot or simply copied and pasted. | Answer is written in Word and is
correct. |
Where a graph is required | Graph is absent or the wrong graph | Correct graph, poorly labelled. | Correct graph pasted into
Word. Accurate and descriptive axes labels (within the constraints of SAS or R). Accurate and Descriptive title. |
Where a table is required | Table is absent or the wrong table. | SAS or R output or screenshot. Table is poorly labelled. Table contains transcription errors. | Table is formatted in Word. An
accurate and descriptive title is given. No transcription errors. |
Where ‘show working’ is requested.
|
Working is absent, incoherent or irrelevant.
|
SAS or R output or screenshots provided in lieu of working. Working contains errors or omissions. | All calculations and derivations
are fully documented and correct. No SAS or R outputs or screenshots. Working laid out in logical order. |
Where interpretation or explanation is requested | No explanation is provided. Explanation is incorrect or incoherent. | Explanation demonstrates general understanding but contains errors and/or omissions. Explanation is general rather than specific to the question. | Explanation is correct, complete
and clear. |
Where you are required to perform a statistical test to answer a question | No hypotheses to be tested are provided. Description of hypotheses is incorrect. | Description of hypotheses to be tested is only partially correct. | Hypotheses to be tested are
correct, complete and clear. |
Criteria | Unsatisfactory | Pass | Credit | Distinction | High Distinction |
Overall | Few questions answered correctly. The majority of workings and explanations missing, incorrect or incoherent. | The majority of questions answered correctly. Some working or explanation absent or containing serious errors or omissions. | Most questions answered correctly. Required working and explanations provided with at worst a few serious errors or omissions. | Most questions answered correctly. Required working and explanations provided with at worst a few serious errors or omissions. | All questions answered correctly, completely and clearly. No omissions, errors or spurious information. |
Assignment Cover Sheet
School of Medicine
Student name: | |
Student number: | |
Unit name and number: | 401176 : Statistical Methods in Epidemiology |
Tutorial group: | |
Tutorial day and time: | |
Unit Coordinator: | Haider Mannan |
Title of assignment: | |
Length: | |
Date due: | |
Date submitted: | |
Campus enrolment: |
Declaration:
q I hold a copy of this assignment if the original is lost or damaged.
q I hereby certify that no part of this assignment or product has been copied from any other student’s work or from any other source except where due acknowledgement is made in the assignment.
q I hereby certify that no part of this assignment or product has been submitted by me in another (previous or current) assessment, except where appropriately referenced, and with prior permission from the Lecturer/Tutor/ Unit Co-ordinator for this unit.
q No part of the assignment/product has been written/produced for me by any other person except where collaboration has been authorised by the Lecturer/Tutor/Unit Co-ordinator concerned.
q I am aware that this work will be reproduced and submitted to plagiarism detection software programs for the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking).
Signature:______________________________________
Note: An examiner or lecturer/tutor has the right to not mark this assignment if the above declaration has not been signed.
Previous answers to this question
This is a preview of an assignment submitted on our website by a student. If you need help with this question or any assignment help, click on the order button below and get started. We guarantee authentic, quality, 100% plagiarism free work or your money back.