**** File overview ***

This package produces a master O*NET file in Stata wide format combining most of the information available in the Occupational Information
Network (O*NEt) database 16.0.

For each O*NET occupation code, the file contains different job characteristics (each with its own ID) and one or more associated measures (scales) of each.
Each measure in turn has several associated variables. Some of these variables are excluded, as described below, but the do-file used to construct the master file
is easily modified to include these variables. For files containing multiple measures of a given characteristic (for instance, contatining both average and categorial measures)
some measures may be omitted for spcace, but can also be included by modifying the construction do file.

Additional do files for merging of the O*NET with other common micro data surveys will be made available on this page later.





**** How to compile the ONET in Stata 11 ***

Download the zipped folder of ascii files and the construction do file. Unzip the folder of readable ascii files, which are modified from the original txt files provided
in ONET database 16.0 Open the construction do file in stata and enter the paths to the folder you have saved on your computer. You should also download database 16.0 from the
O*NET page as it contains all scale reference files and the ONET data dictionary, as well as original versions of all the txt files, and more complete labels.

Finally, run the construction do file in Stata 11, which will create a master file identical to the O*NET file at the first link on my O*NET page. The file
should take between three and five minutes to run.





**** Included files ***

The O*NET files compiled together into the master file, and the included measures of job characteristics kept from each file are as follows:

Abilities file (AS): all measures in mean form
Education, training and experience requirements file (ETEx): all measures in categorical share form
Interests file (In): all measures in both mean share reported and "high points" (most commonly reported) form.
Knowledge file (Kn): all measures in mean form
Jobs zones file (JZ): all measures in mean form
Occupation Data (occupation names)
Occupation Metadata (MD): all measures in categorical share form
Skills file (Sk):  all measures in mean form
Task ratings file (with task statements included as labels): relevance measure only. See below.
Work activities file (WA): all measures in mean form
Work context file (WC): some measures omitted: see below
Work styles file (WS): all measures
Work values file (WV): all measures





**** Excluded files ***

Green occupations/ Green task statements





**** Included and excluded variables ***

From each file, and for each measure of job characteristics, the variables included are "value" (the average rating or share of workers
in the given category); sample size "N" used to calculate "value"; the standard deviation "st_err" of the "value"; an indicator fpr whether or not the O*NET
administrators recommend suppressing the information in "value" and "st_err" due to imprecision ("r_suppress"), and the "source" of the information used to compute
"value" (e.g. occupation incumbant, analyst, occupational expert.)

To save space, I omit exclude the following variables associated to each measure: the upper and lower bounds of the 95% condidence interval and the date the file was last updated.
The infile commands in the construction do file are easily modified to read these additional variables into the dataset for any or all files.





**** Variable names ***


With the exception of the job zones file, The variable names in the compiled wide stata file consist of four parts, separated by an "_":

1) the type of information contained. This can be:
   value (in most contexts the mean of the measure),
   st_err (the standard deviation of the measure),
   N (the number of observations on which the measure is based,
   r_suppress (whether or not (Y/N or n/a) the O*NET staff recommend suppressing the measure due to low precision.
   Source (the source of the data for calculating "value".)

2) the scale ID. Not all scales for all variables are kept in the data file. The most common scales are IM (importance)
   and LV (level of task), or measures of the share of occupational incumbants who fall into a given category.
   The scales reference and level scale anchors files available in the O*NET database 16.0 are needed to interpret these indexes.

3) the element ID. This is the O*NET's ID for each job characteristic. The name/description associated with the ID is kept as a variable label.

4) the abbreviated name of the source file information (e.g. WS for Work Styles; see abbreviations above)





**** Additional Information on individual files ***


**** Job zones file ***

This file differs from the others in that only one variable (job_zone) is provided and imported. The associated labels are from the Job Zone Reference file available in the original ONET database.
The labels are truncated: more information on the meaning of each job zone is available in the Job Zone Reference file in the O*NET database 16.0.


**** Work contexts file ***

This file contains two types of measure, with additional measures omitted for space.
For most variables, the mean of the categorical variable (scale ID CX) are kept as the "value", with associated variables "st_err", "N, "r_suppress" and "Source".
Categorical variables ("CPX" variables) reporting the share of occupation holders falling in each category, are omitted from the file. The construction do file can be
modified to include these categorical variables using syntax similar to that for inporting categorical info from the Education/Training file.
For the "CPT" variables, the categories (1-3) are kept separately and "value" contains the share of workers in the occupation falling into the given category, with mean
category levels (cale ID "CT" variables). Definitions of the categories for both the "CPT" and "CX" variables are in the Work Context Categories file.


****Occupational metadata file ***

Data from this file is in categorical share form and each measure is represented by only two variables: "percent", which is the share of occupational respondants
in a given category (similar to the information in "value" in other categorical share files) and N, the sample size. (Dates when the files were updated
are not included as in the other files.) The file provides info on the shares of
each occupation which answered the ONET questionaire by web vs. paper, the completion rate by each occupation (employee and establishment level), the shares of
each occupation across both SIC and NAISC industry classifications, and the shares of occupational respondants at different job tenures. All available measures
are included in the master file.


****Occupation file ***

This file contains the name and a detailed statement about each occupation. I include only the name as a string variable; the detailed statements are omitted
since the task statements give detailed information on what occupational respondants do.


**** Task file ***

In the Task file, several task statements are associated to each occupation (up to 38 per occupation). Three measures of each task are available: its' subjectively
reported "importance", its subjectively reported "relevance" and categorical variables ranking how often the task is performed on average. Due to space, only the "relevance" (RV)
is included in the master file. This can easily be changed to "importance" by replacing the statement "keep if Scale_ID=="RV" to "keep if Scale_ID=="IM".) To keep the categoical
rankings, the do-file can be modified using syntax similar to that used to import the education and training file.