Statistical Methods in Epidemiology (unit no. 401176)

ASSIGNMENT 1

Spring Semester, 2018

Due date: 10 September, 2018

Statistical Methods in Epidemiology (unit no. 401176)

Total marks is 100 which will be converted to 25. Every question carries 20 marks each. Please read the marking rubric towards the end of the document. No late submissions allowed without a valid reason (read the Learning Guide for instructions). Assignment cover sheet is also attached.

Please answer all questions

Q1. Categorize the following variables as qualitative-nominal, qualitative-ordinal, quantitative-discrete or quantitative-continuous                                                           (no explanation needed for your answer)

  • Hospital discharge diagnosis
  • Exact serum cholesterol measurements
  • Exact age
  • Age groups as 1=<30,2=30-39,3=40-49,4=50+
  • Causes of death
  • Sites of a randomized trial
  • Education levels coded as 1= high school not completed

2= high school completed

3 = some post-high school education

  • Exact systolic blood pressure levels
  • Being treated for hypertension with codes as 1=no,2=yes
  • Pack years of cigarette smoking

Each question above has 2 marks.

Q2. The following stem-and-leaf plot was obtained from the values of BMI (body mass index) for a     random sample of 88 persons.

Frequency      Stem          Leaf

1                           19             7

2                           20             69

7                           21             4788999

7                         22             3666799

9                           23             112355799

17                        24             01222222345555679

18                        25             002223344444577789

9                           26             002577799

5                           27             02689

5                           28             01289

7                           29             0001668

1                           30             2

Stem width: 1.0

Each leaf: 1 case

[Hints on how to read the data from stem & leaf plot:

First, Frequency 1 with 19 (stem) and 7(leaf) means just one value, 19.7, frequency 2 with 20(stem) and 69(leaf) means two values, 20.6 and 20.9, and similarly for the remaining stem and leaf values]

Each question below has 4 marks.

  • What are the smallest and largest BMI values among these 88 persons?
  • What percentage of BMI values exceed 25.0? [Hints: use results for a computed binary variable by SAS]
  • Obtain the 1st quartile, median and 3rd quartile for BMI based on this sample, and sketch a stem and leaf plot and box and whisker plot for BMI.
  • Interpret the histogram for BMI.
  • Interpret the bar charts for mean BMI classified by males and females.

SAS codes are given below in order to answer all questions:

data a;

input bmi;

cards;

19.7 0

20.6 1

20.9 0

21.4 1

21.7 0

21.8 1

21.8 0

21.9 1

21.9 0

21.9 1

22.3 0

22.6 1

22.6 0

22.6 1

22.7 0

22.9 1

22.9 0

23.1 1

23.1 0

23.2 1

23.3 0

23.5 1

23.5 0

23.7 1

23.9 0

23.9 1

24.0 0

24.1 1

24.2 0

24.2 1

24.2 0

24.2 1

24.2 0

24.2 1

24.3 0

24.4 1

24.5 0

24.5 1

24.5 0

24.5 1

24.6 0

24.7 1

24.9 0

25.0 1

25.0 0

25.2 1

25.2 0

25.2 1

25.3 0

25.3 1

25.4 0

25.4 1

25.4 0

25.4 1

25.4 0

25.5 1

25.7 0

25.7 1

25.7 0

25.8 1

25.9 0

26.0 1

26.0 0

26.2 1

26.5 0

26.7 1

26.7 0

26.7 1

26.9 0

26.9 1

27.0 0

27.2 1

27.6 0

27.8 1

27.9 0

28.0 1

28.1 0

28.2 1

28.8 0

28.9 1

29.0 0

29.0 1

29.0 0

29.1 1

29.6 0

29.6 1

29.8 0

30.2 1

;

data a;

set a;

bmigt25=(BMI >25);

run;

proc freq;

tables bmigt25;

run;

proc sort data=a;

by bmi;

ods listing;

ods graphics off;

proc univariate data=a plot;

var bmi;

title “quartiles and mean bmi, side-by-side stem and leaf plot and boxplot for BMI”;

run;

proc univariate data=a plot;

var bmi;

histogram;

title “histogram for a continuous variable bmi”;

run;

proc gchart data=a;

vbar sex/group=sex sumvar=bmi type=mean discrete;

title “Vertical bar chart for mean BMI by sex,0=female, 1=male”;

run;

Q3. A variable can be a confounder, effect modifier, both or none of the two. There are statistical tests for detecting effect modification. But, there is no statistical test for detecting an operational confounder. For example, if a test for comparing unadjusted and adjusted odds ratios show no significant difference, but one is considerably larger than the other, then one would still adjust for the confounder. However, if a test for comparing unadjusted and adjusted odds ratios shows significant difference, but one is not considerably larger than the other, one would not have to adjust for the confounder.

Let us consider a study for assessing the association between smoking & lung cancer. Is sex a confounder or effect modifier (quantitative or qualitative)?                                                                  (10 marks)

We have 4 different scenarios, such as:

OR (Men)   OR (Women)     Crude OR         Adj OR

2.51            2.15                    2.32                2.35

1.06            0.95                    2.02                1.01

4.40            3.41                    4.02                2.63

2.15            0.65                    1.42                1.29

The following table presents unadjusted and age-adjusted coronary event rates and death subsequent to a coronary event, for men in north Glasgow, 1991. The exposure of interest is social deprivation. Is age a confounder in the relationship between social deprivation and coronary event rate and coronary death?

(10 marks)

 

 

 

Table for Coronary event rates and risk of death by deprivation group; north Glasgow men in 1991:

                                                                      Coronary  event rate

(per thousand)                 Risk of coronary death

 

Deprivation  group Unadjusted Age adjusted Unadjusted Age adjusted
I (most advantaged) 2.95 3.28 0.57 0.59
II 4.32 4.20 0.50 0.50
III 6.15 5.30 0.51 0.52
IV (least advantaged) 5.90 5.75 0.56 0.56

Total                             4.83                4.88                0.53                0.54

 

Q 4. The data below are modified from Jick, H. et al (Coffee and Myocardial Infarction, New England Journal of Medicine, Vol.289, No.2, pp.63-67, 1973). These authors used a case-control study to investigate the relationship between coffee consumption and myocardial infarction (MI). Cases were patients hospitalized on the basis of acute chest pain with an admission diagnosis of possible or definite MI. Controls were patients with various other diagnoses. To control for confounding, a multivariate risk score was derived for each patient, taken into account a patient’s age, sex, history of MI, smoking status, admission to hospital, season admitted to hospital, history of antianginal drugs, history of digitalis use, presence of diabetes, and religion. The score was computed in such a way that patients with a high score were more at risk of an MI than patients with a low score. The distribution of all such computed scores was divided into quintiles, with patients in the first quintile representing 20% of subjects with lowest scores, and patients in the fifth quintile representing 20% of subjects with the highest scores. The table below shows the distribution of cases and controls among subjects drinking 0 cups of coffee a day and subjects drinking 6+ cups a day, separately within each quantile (the variables below are in the order of risk score, cups of coffee/day, MI and frequency of cell count in the 2X2 table).            (20 marks)

Quintile 1 6 or more pres 12

Quintile 1 6 or more abs  670

Quintile 1 0         pres 17

Quintile 1 0         abs  1315

Quintile 2 6 or more pres 5

Quintile 2 6 or more abs  261

Quintile 2 0         pres 12

Quintile 2 0         abs  395

Quintile 3 6 or more pres 4

Quintile 3 6 or more abs  174

Quintile 3 0         pres 13

Quintile 3 0         abs  370

Quintile 4 6 or more pres  2

Quintile 4 6 or more abs  80

Quintile 4 0         pres 14

Quintile 4 0         abs  160

Quintile 5 6 or more pres 1

Quintile 5 6 or more abs  38

Quintile 5 0         pres 14

Quintile 5 0         abs  117

 

Using the aggregate or grouped data set given above, obtain the following

  • Interpret Breslow and Day test for homogeneity of relative odds across the risk quintiles. Are the relative odds homogeneous across the risk quintiles ? (6 marks)
  • If the relative odds are the same across the risk quintiles, interpret the Mantel-Haenszel test on whether the common relative odds differs significantly from 1. Is there a significant association between coffee drinking and myocardial infarction after adjusting for risk quintiles ? (7 marks)
  • Interpret the Mantel-Haenszel and Woolf (logit) estimators of common relative odds and the corresponding 95% confidence intervals.                                                 (7 marks)

SAS codes for all parts of the question are:

OPTIONS LINESIZE=80 PAGESIZE=60;

data htbact;

input score $ 1-10 coffee $ 12-20 mi $ 22-25 freq 27-29;

cards;

Quintile 1 6 or more pres 12

Quintile 1 6 or more abs  670

Quintile 1 0         pres 17

Quintile 1 0         abs  1315

Quintile 2 6 or more pres 5

Quintile 2 6 or more abs  261

Quintile 2 0         pres 12

Quintile 2 0         abs  395

Quintile 3 6 or more pres 4

Quintile 3 6 or more abs  174

Quintile 3 0         pres 13

Quintile 3 0         abs  370

Quintile 4 6 or more pres  2

Quintile 4 6 or more abs  80

Quintile 4 0         pres 14

Quintile 4 0         abs  160

Quintile 5 6 or more pres 1

Quintile 5 6 or more abs  38

Quintile 5 0         pres 14

Quintile 5 0         abs  117

;

run;

proc freq order=data;

weight freq;

tables score*coffee*mi/cmh;

run;

  1. 100 men with lung cancer and 100 men without lung cancer were asked if they had ever smoked; their answers are tabulated in the following table: (20 marks)
 

Previous Smoking

                  Lung cancer

Present                              Absent

Total
Yes

No

38                                                       22

62                                                       78

60

140

Total 100                                                  100 200

 

  • What type of epidemiological study design is this ?                                 (2 marks)
  • Interpret an approximate 95% confidence interval (by Wald method) for the risk ratio of association between previous smoking and lung cancer. (6 marks)
  • Interpret the Wald test for the risk ratio of association between previous smoking and lung cancer being equal 1. (6 marks)
  • Interpret an approximate 95% confidence interval (by Wald method) for the relative odds of association between previous smoking and lung cancer. (6 marks)

 

SAS codes for all parts of Q5 except for (i):

data lcancer;

input smoking $ 1-10 lcancer $ 12-15 count 18-19;

cards;

Smoker     pres  38

Smoker     abs   22

Non smoker pres  62

Non smoker abs   78

;

run;

proc freq data=lcancer;

tables smoking*lcancer /relrisk(CL=(Wald) method=Wald equal) OR(CL=(Wald));

weight count;

title ‘Chi-Square Test of Association of smoking and CVD death’;

run;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Marking rubric: Assessment 1

 

 

 

Criteria No marks Part marks Full marks
Where a single number or single word answer  is required Answer is absent or incorrect SAS or R output screenshot or simply copied and pasted. Answer is written in Word and is

correct.

Where a graph is required Graph is absent or the wrong graph Correct graph, poorly labelled. Correct graph pasted into

Word. Accurate and descriptive axes

labels (within the constraints of

SAS or R). Accurate and

Descriptive title.

Where a table is required Table is absent or the wrong table. SAS or R output or screenshot.  Table is poorly labelled. Table contains transcription errors. Table is formatted in Word. An

accurate and descriptive title is

given. No transcription errors.

Where ‘show working’ is requested.

 

Working is absent, incoherent or irrelevant.

 

SAS or R output or screenshots provided in lieu of working. Working contains errors or omissions. All calculations and derivations

are fully documented and

correct. No SAS or R outputs or

screenshots. Working laid out in

logical order.

Where interpretation or explanation is requested No explanation is provided. Explanation is incorrect or incoherent. Explanation demonstrates general understanding but contains errors and/or omissions. Explanation is general rather than specific to the question. Explanation is correct, complete

and clear.

Where you are required to perform a statistical test to answer a question No hypotheses to be tested are provided. Description of hypotheses is incorrect. Description of hypotheses to be tested is only partially correct. Hypotheses to be tested are

correct, complete and clear.

 

Criteria Unsatisfactory Pass Credit Distinction High Distinction
          Overall Few questions answered correctly. The majority of workings and explanations missing, incorrect or incoherent. The majority of questions answered correctly.  Some working or explanation absent or containing serious errors or omissions. Most questions answered correctly. Required working and explanations provided with at worst a few serious errors or omissions. Most questions answered correctly. Required working and explanations provided with at worst a few serious errors or omissions. All questions answered correctly, completely and clearly. No omissions, errors or spurious information.

 

 

 

 

 

 

 

 

 

 

Assignment Cover Sheet

School of Medicine

Student name:  
Student number:  
Unit name and number: 401176 : Statistical Methods in Epidemiology
Tutorial group:  
Tutorial day and time:  
Unit Coordinator: Haider Mannan
Title of assignment:  
Length:  
Date due:  
Date submitted:  
Campus enrolment:  

 

Declaration:

q I hold a copy of this assignment if the original is lost or damaged.

q I hereby certify that no part of this assignment or product has been copied from any other student’s work or from any other source except where due acknowledgement is made in the assignment.

q I hereby certify that no part of this assignment or product has been submitted by me in another (previous or current) assessment, except where appropriately referenced, and with prior permission from the Lecturer/Tutor/ Unit Co-ordinator for this unit.

q No part of the assignment/product has been written/produced for me by any other person except where collaboration has been authorised by the Lecturer/Tutor/Unit Co-ordinator concerned.

q I am aware that this work will be reproduced and submitted to plagiarism detection software programs for the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking).

 

Signature:______________________________________

Note:  An examiner or lecturer/tutor has the right to not mark this assignment if the above declaration has not been signed.

 

 

Previous answers to this question


This is a preview of an assignment submitted on our website by a student. If you need help with this question or any assignment help, click on the order button below and get started. We guarantee authentic, quality, 100% plagiarism free work or your money back.

order uk best essays Get The Answer