Introduction 3 nlms index and Record Format as of July 1, 2014 11 Return to toc 11 extract file 16 analysis file 271 appendices 398



Download 4.03 Mb.
Page1/39
Date conversion29.01.2017
Size4.03 Mb.
  1   2   3   4   5   6   7   8   9   ...   39
National Jump to Index

Longitudinal

Mortality

Study


Extract and Analysis Files

Reference Manual, Version 4.0

Mortality Follow-up 1979—2011
July 1, 2014

TABLE OF CONTENTS


INTRODUCTION 3

NLMS Index and Record Format as of July 1, 2014 11

Return to TOC 11



EXTRACT FILE 16

ANALYSIS FILE 271

APPENDICES 398

  1. September 1985 Health and Tobacco Use File

  2. State Concatenated Codes

  3. SMSA Rank

  4. United Nations Codes for Countries and Areas

  5. 1990 Industrial Classification System

  6. 2007 Industrial Classification System

  7. 1990 Occupational Classification System

  8. 2000 Occupational Classification System

  9. CDC 113 Causes of Death from ICD-10



Variable Reference Manual

for

National Longitudinal Mortality Study

Extract and Analysis Files

Version 4.0

INTRODUCTION

Documentation Date: July 1, 2014


The National Longitudinal Mortality Study (NLMS) is a national, longitudinal, mortality study sponsored by the National Heart, Lung, and Blood Institute, the National Cancer Institute, the National Institute on Aging, the National Center for Health Statistics, and the U.S. Census Bureau for the purpose of studying the effects of differentials in demographic and socio-economic characteristics on mortality.
The NLMS consists of a database developed for the purpose of studying the effects of demographic and socio-economic characteristics on differentials in U.S. mortality rates. The NLMS is a unique research database in that it is based on a complex, stratified sample of the non-institutionalized population of the United States. It consists of U.S. Census Bureau data from Current Population Surveys, Annual Social and Economic Supplements, and a subset of the 1980 Census combined with death certificate information to identify mortality status and cause of death. The study currently consists of approximately 3.8 million records with over 560,000 identified mortality cases. The content of the socio-economic variables available offers researchers the potential to answer questions on mortality differentials for a variety of important socio-economic and demographic subgroups not covered as extensively in other databases. This project has generated over 85 publications in various prominent scholarly, scientific, and public health related journals.
The NLMS currently consists of Annual Social and Economic Supplements which cover the period from March 1973 to March 2011, Current Population Surveys for February 1978, April 1980, August 1980, December 1980, and September 1985, and one 1980 Census cohort, 39 cohorts in all. Mortality information is obtained from death certificates available for deceased persons through the National Center for Health Statistics. Important variables available for analyses are standard demographic and socio-economic variables such as education, income and employment as well as information collected from death certificates, including cause of death.
This documentation identifies variables selected from National Longitudinal Mortality Study (NLMS) master files for use in study analysis files. The variables documented in this manual were selected by study sponsors to be of immediate interest as the primary focus of scientific and public health research and their importance in a wide variety of study publications.

Description Of The NLMS File Structure
The complete database of all NLMS variables available for analysis is stored on files called "Master Files," one file for each NLMS cohort. The data available in each cohort master file vary in content depending on the interests of the original sponsor of the file. Each master file has its own format and each contains both edited and unedited data. Because of the difficulty in using variable format files and because definitions of variables change over time, the NLMS has developed two other levels of files for analysis purposes, “Extract” and “Analysis” files. Extract Files are ASCII, fixed format files that contain a specifically defined, sponsor selected set of variables. These variables are taken, unaltered, directly from the source Master File. This Reference Manual documents NLMS Extract and Analysis variables. To accommodate analysis, SAS data files have been developed directly from the Extract files. These are called Analysis Files. During the development of Analysis Files, variable standardization across cohorts is implemented. Variable modifications are documented in the Analysis File section of this Reference Manual.
Extract files are constructed by selecting the same subset of information from each NLMS Master File. Extract Files are organized to the same fixed format for all files. Each record consists of 117 variables in 339-character strings. Information relevant to death is left blank for all nondeceased persons. Extract files include fail-edit records and the set of selected variables as they are defined on the source NLMS Master File. A four-digit file description and an alphabetic letter code are the basic file identifiers associated with each cohort.
In addition to an Extract File for each cohort, a special Extract File combining data across all cohorts has been created to include the same extract file information but only for deceased persons. The file, called the "Numerator File," has the same format, variable content, and variable location as the studies’ Extract Files.
For the specific purpose of conducting matches to the National Death Index, a set of files called “Match Files” have been developed. These files consist of only those variables required for matching to the National Death Index. They also include identifiers necessary to link identified matches back to relevant NLMS records after a match. In a few instances, additional records are generated with pseudo birth dates to permit matches that would otherwise not be possible because the original file only collected quarter of birth and not month of birth.
For the 2013 match to the NDI, specialized files were created containing all mortality information taken from death certificates and NLMS control numbers to permit the linking of these data to other NLMS analytical files. These files are called "Master Death Files. Characteristics of these files are not summarized in a tabular format. For persons not identified as deceased by NLMS processes, Master Death Files contain only the control number of the person and mortality outcome indicators: IND213, IND206, IND201, IND291, IND289, IND287, IND285, and IND185.
Table 1 below lists, for each cohort, the letter code and the four-digit numerical code associated with each file, a description of the survey or Census from which the data were obtained, and the NLMS official date used as the start of follow-up for the cohort. The cohorts 8603, 8703, 8803, 8903, 9009, 9203, and 9303 have multiple start dates on the file. See the description of FACTOR for discussion of this feature and the alternative start dates that should be used.
Table 1. Cohort Notation


01

Cohort A

7303

March 1973 CPS

January 1, 1979

02

Cohort B

7802

February 1978 CPS

January 1, 1979

03

Cohort C

7903

March 1979 CPS

March 18, 1979

04

Cohort D

8014

April 1980 CPS

April 13, 1980

05

Cohort E

8024

1980 Census E Sample

January 1, 1981

06

Cohort F

8008

August 1980 CPS

August 17, 1980

07

Cohort G

8012

December 1980 CPS

December 14, 1980

08

Cohort H

8103

March 1981 CPS

March 15, 1981

09

Cohort I

8203

March 1982 CPS

March 14, 1982

10

Cohort J

8303

March 1983 CPS

March 13, 1983

11

Cohort K

8403

March 1984 CPS

March 18, 1984

12

Cohort L

8503

March 1985 CPS

March 17, 1985

13

Cohort M

8509

September 1985 CPS

September 21, 1985

14

Cohort N

8603

March 1986 CPS

March 25, 1986

15

Cohort O

8703

March 1987 CPS

March 24, 1987

16

Cohort P

8803

March 1988 CPS

March 22, 1988

17

Cohort Q

8903

March 1989 CPS

March 28, 1989

18

Cohort R

9003

March 1990 CPS

March 27, 1990

19

Cohort S

9103

March 1991 CPS

March 26, 1991

20

Cohort T

9203

March 1992 CPS

March 24, 1992

21

Cohort U

9303

March 1993 CPS

March 23, 1993

22

Cohort V

9413

March 1994 CPS

March 22, 1994

23

Cohort W

9503

March 1995 CPS

March 28, 1995

24

Cohort X

9603

March 1996 CPS

March 26, 1996

25

Cohort Y

9703

March 1997 CPS

March 25, 1997

26

Cohort Z

9803

March 1998 CPS

March 24, 1998

27

Cohort AA

9903

March 1999 CPS

March 23, 1999

28

Cohort BB

0003

March 2000 CPS

March 28, 2000

29

Cohort CC

0103

March 2001 CPS

March 27, 2001

30

Cohort DD

0203

March 2002 CPS

March 26, 2002

31

Cohort EE

0303

March 2003 CPS

March 25, 2003

32

Cohort FF

0403

March 2004 CPS

March 23, 2004

33

Cohort GG

0503

March 2005 CPS

March 22, 2005

34

Cohort HH

0603

March 2006 CPS

March 28, 2006

35

Cohort II

0703

March 2007 CPS

March 27, 2007

36

Cohort JJ

0803

March 2008 CPS

March 25, 2008

37

Cohort KK

0903

March 2009 CPS

March 24, 2009

38

Cohort LL

1003

March 2010 CPS

March 23, 2010

39

Cohort MM

1103

March 2011 CPS

March 22, 2011

Tables 2 and 3 list basic frequencies for each of the cohort extract files. Included in Table 2 are the number of records on the file and the number of fail-edits (i.e. records lacking sufficient information to match to the NDI). Table 3 shows the number of deaths in each file by the year of the identifying match.


Table 2. File Specific Frequencies for the 2013 NDI Match


Study Records Fail-Edits
01 7303 A 131,213 1,027

02 7802 B 94,662 1,900

03 7903 C 43,098 1,518

04 8014 D 184,871 1,891

05 8024 E 124,345 143

06 8008 F 182,373 1,052

07 8012 G 177,765 2,881

08 8103 H 61,804 1,707

09 8203 I 81,480 1,599

10 8303 J 81,281 2,070

11 8403 K 80,732 3,234

12 8503 L 80,071 3,368

13 8509 M 144,698 25

14 8603 N 46,022 7

15 8703 O 45,271 7

16 8803 P 45,479 6

17 8903 Q 41,925 7

18 9003 R 44,285 6

19 9103 S 49,037 9

20 9203 T 43,550 5

21 9303 U 62,899 2,304

22 9413 V 103,286 728

23 9503 W 75,485 579

24 9603 X 65,555 666

25 9703 Y 66,394 654

26 9803 Z 66,173 732

27 9903 AA 69,175 98

28 0003 BB 69,322 633

29 0103 CC 78,937 105

30 0203 DD 79,522 110

31 0303 EE 154,405 26,401

32 0403 FF 151,309 26,773

33 0503 GG 161,257 29,419

34 0603 HH 134,463 11,803

35 0703 II 133,742 12,129

36 0803 JJ 130,055 11,228

37 0903 KK 132,629 18,749

38 1003 LL 131,468 18,441

39 1103 MM 130,061 16,318
Total 3,780,099 200,332
NOTES:

1. "Records" gives the total number of all records on the NLMS cohort Master File.


2. "Fail-edits" are the number of records on the file that failed edit and are, therefore, ineligible for a match to the NDI. These records are not considered to be part of the NLMS for analytical purposes since mortality cannot be determined.
Table 3. Mortality by Year of Match
Mortality by Year of Match

Study

83

85

87

89

91

01

06

13




























01 7303 A

3,000

5,250

7,410

9,684

11,967

22,790

28,289

38,790

02 7802 B

2,540

4,468

6,397

8,406

10,379

20,020

24,815

34,398

03 7903 C

712

1,331

1,916

2,509

3,178

6,306

7,800

10,844

04 8014 D

2,102

4,756

7,647

10,607

13,733

28,336

35,266

49,503

05 8024 E

1,168

3,585

6,128

8,517

11,147

23,468

30,715

41,760

06 8008 F

1,653

4,276

7,165

10,036

12,926

27,225

33,559

47,165

07 8012 G

1,165

3,576

6,147

8,855

11,690

25,191

31,173

43,681

08 8103 H

297

1,166

2,048

2,915

3,853

8,553

10,821

15,520

09 8203 I

0

949

2,076

3,280

4,486

10,678

13,647

19,614

10 8303 J

0

357

1,464

2,566

3,749

9,579

12,394

18,230

11 8403 K

0

0

872

1,956

3,117

8,850

11,710

17,386

12 8503 L

0

0

378

1,459

2,529

8,131

10,974

16,613

13 8509 M

0

0

0

0

3,658

12,593

16,548

24,885

14 8603 N

0

0

0

0

0

6,411

8,676

13,746

15 8703 O

0

0

0

0

0

5,606

7,902

12,857

16 8803 P

0

0

0

0

0

5,246

7,569

12,721

17 8903 Q

0

0

0

0

0

4,305

6,408

10,997

18 9003 R

0

0

0

0

0

4,000

6,144

10,868

19 9103 S

0

0

0

0

0

3,355

5,896

11,072

20 9203 T

0

0

0

0

0

2,908

4,883

9,634

21 9303 U

0

0

0

0

0

2,765

5,192

10,249

22 9413 V

0

0

0

0

0

3,086

6,160

12,498

23 9503 W

0

0

0

0

0

1,827

4,142

8,892

24 9603 X

0

0

0

0

0

1,105

2,929

7,005

25 9703 Y

0

0

0

0

0

722

2,489

6,483

26 9803 Z

0

0

0

0

0

259

1,888

5,885

27 9903 AA

0

0

0

0

0

0

1,350

5,628

28 0003 BB

0

0

0

0

0

0

999

6,046

29 0103 CC

0

0

0

0

0

0

685

5,083

30 0203 DD

0

0

0

0

0

0

320

4,824

31 0303 EE

0

0

0

0

0

0

0

5,410

32 0403 FF

0

0

0

0

0

0

0

4,571

33 0503 GG

0

0

0

0

0

0

0

4,229

34 0603 HH

0

0

0

0

0

0

0

3,598

35 0703 II

0

0

0

0

0

0

0

2,969

36 0803 JJ

0

0

0

0

0

0

0

2,268

37 0903 KK

0

0

0

0

0

0

0

1,658

38 1003 LL

0

0

0

0

0

0

0

1,048

39 1103 MM

0

0

0

0

0

0

0

379




























Total

12,637

29,714

49,648

70,790

96,412

253,315

341,343

559,007


NOTE: "Mortality by Year of Match" gives mortality count according to the relevant mortality indicator appropriate to the data complete through the indicated year. Frequencies are cumulative totals through the match year shown.
Features of the Documentation
The purpose of this Reference Manual is to document the NLMS and describe the variables available to researchers for immediate analysis. A brief description and a summary of useful information about the location of each variable on the Extract File serve as the index to this manual. The Index gives a variable description, the variable name in eight or fewer characters, page location in the Reference Manual of the full variable discussion in both the Extract and Analysis files, variable location on the Extract File, and variable edit status. The eight-digit label will be used in all NLMS software to refer to the variable and it will serve as the Analysis Files SAS dataset variable name. The location of each variable on the Extract File is identified in the index by its actual location in characters under the heading "LOCATION”. The size of the variable is the number of characters reserved on the record for the variable.

Throughout the text there are links that point from the index to the Extract and Analysis versions of every variable and then back to the index. There are also links between the Extract and Analysis file versions of each variable. Running headers indicate the general category of variables as well as whether the reader is in the Extract or Analysis portion of the manual.


The body of the Reference Manual contains a full description of each Extract file variable. The descriptive portion of the text is identified by the expression "DESCRIPTION." Possible codes and frequencies on relevant files are identified for each variable. Any restrictions or special conditions are also noted. When possible, an attempt has been made during the construction of Analysis Files to standardize the original master file codes so that variable values are consistent across cohorts. If not documented in the Analysis File section of the Reference Manual, no standardization has been made for the variable and Analysis File frequencies would be as indicated in the Extract File description portion of this manual.
File-specific variable frequencies have been included in this Reference Manual along with the variable descriptions, for all variables for which the number of levels of the variable was small enough to construct a display table. Frequencies are shown for legitimate as well as illegitimate codes. Entries in this documentation are intended to reflect all entries on each file for all basic variables in the study.
Comments on File Definition
Not all records available in the respective CPS surveys or the Census sub-sample are part of the NLMS study because some records lack the required information to match to the National Death Index. These records are identified as failed edit records. For each match of NLMS records to the NDI, one or two indicator variables are created. These indicator variables are identified in the “Fail-Edit and Mortality Indicators” section of the Index (p10). A value of 0 or 1 for these indicator variables indicates a fail edit record for that match to the NDI. Fail-edit frequencies for each cohort for IND213 are shown in Table 2.
For a small percentage of records in cohorts A, B, C, D, and F, a link could not be made between the data file record and the control file, the source of the NLMS Master File information. As a result, these records consist of control file information only and may not include desirable demographic or socio-economic information of interest to an analysis. The total number of records that did not link is also shown in Table 2. The frequency of "unknown" responses for variables described in this manual may include frequencies for records that did not link.
Three Files with Special Restrictions
Cohort A, based on the 1973 CPS survey, consists of persons identified as alive six years before the beginning of the National Death index in 1979. The NLMS is unable to determine which persons in this cohort died during the approximately six-year interim period between the end of interview in March 1973 and January 1, 1979, the starting date of the National Death Index. For most analyses, tables constructed using this file should not be included or should exclude persons older than 64 years of age in 1979. This would correspond to removing from analysis those persons whose actual age on the file is more than 58 years. By deleting persons 58 years of age or older, the persons having the greatest chance of dying during the 6 year period before the start of mortality follow-up would not be considered for analysis.
A similar problem also applies to the February 1978 cohort, Cohort B. In this case the lack of follow-up potential is for the interim period from February 1978 to January 1979, a period of approximately 10 months. In 2008 a determination of mortality for records in the 7303 and 7802 cohorts was made by comparing these files to Social Security Administration NUMIDENT files to identify mortality prior to 1979. Mortality identified in this process is identified by an IND213 value of “5.” Records identified in this way should be considered as failed edits and dropped from analyses. Records in these cohorts not identified by fail edit status of 0, 1, or 5 would be eligible for analysis and follow-up beginning in 1979.
Due to the lengthy follow-up period required to complete the 1980 Census, records for persons in the NLMS 1980 Census cohort, Cohort E who died in 1980 were considered to be invalid and the start of follow-up for this cohort was set to January 1, 1981.
The September 1985 "Tobacco Use" File
At the conclusion of the 1989 match to the NDI a new file, the September 1985 CPS file (cohort M), was added to the NLMS list of study cohorts. This file was constructed for the specific purpose of studying the tobacco use and health status variables obtained in the September 1985 CPS interview. The file consists of the full 155-character Extract File record, as for all other cohorts in the NLMS, with an appended section that contains the tobacco use and health status information, an additional 42 characters. Tallies and descriptions of variables occupying the first 155 characters have been incorporated into the main portion of the Reference Manual. The tobacco use and health status variables are documented in Appendix A as a regular continuation of this Reference Manual but with warnings that this information is only available on the 8509 file.
Creation of the Analysis File SAS Datasets
SAS datasets, derived from each of the NLMS Extract files described in Table 1, have been created for analysis purposes. These files are referred to as, “Analysis Files.” In the process of creating analysis files, a variety of edits were performed to both standardize variables and expedite analyses.
The results of these edits are found in the Analysis File section of this documentation. Three basic types of edits may be performed on a variable: simple edits, are recoding of variables to a missing value code because of an inability to collect data; invalid entry edits, are recodings of obvious data keying errors to missing; recoding edits , edits made to standardize variable values across files. If the edits are simple or invalid entry, frequency counts are as shown in the body of the Reference Manual for the meaningful levels of the variables. For recoding edits, the final frequency distribution is given in the Analysis File. A set of columns in the Reference Manual index, labeled, “EDITS,” indicates any edits performed, and these indications are echoed in the Extract and Analysis File entries for each variable.
Confidentiality of NLMS Data
Title 13 of the United States Code (U.S.C.) provides the assurance of confidentiality of Census Bureau data. As such, NLMS operational procedures carefully follow well-defined practices designed to maintain the confidentiality of personal records as required by Title 13. These practices include the prevention of disclosure through the elimination of sparse cells in publications, the prohibited release of small-area geographical information on the NLMS public-use file, the use of an individually assigned NLMS control number to identify records instead of the use of personal identifiers for these purposes, and the restriction of persons having direct access to the NLMS database. A violation of Title 13 includes assessment of severe penalties including a prison term of up to 5 years and/or fines of up to $250,000, for any individual found guilty of releasing confidential information. In addition, any data acquired for NLMS purposes from an external agency is acquired under strict confidentiality protections and agreements that govern its use and subsequent release.

  1   2   3   4   5   6   7   8   9   ...   39


The database is protected by copyright ©dentisty.org 2016
send message

    Main page