Database Guidelines

After reading and agreeing to our database guidelines, you will be directed to a form enabling you to request a consultation.

Guidelines to Follow

  • Excel is preferred.
  • Codebooks should be provided with the data sets.
  • All data should be entered into a single sheet, with codebook on a separate sheet.
  • Any other information (such as graphs or initial calculations) should be removed from data sheet.
  • Each row should correspond to a single unit (i.e., patient, participant, etc.) on which you have made observations.
  • Each unit should have a unique identifier that corresponds to the individual patient, participant, etc. (e.g., subject ID).
  • Do NOT include patient names, if possible.
  • Each column should correspond to a single measured variable/field.
  • Formatting should be consistent within each variable/column; do not mix text and numerical data.
  • Only one variable per column (e.g., blood pressure would be split into two columns/variables: diastolic and systolic).
  • Do not include units in the cell information; rather, include unit measurement in the variable name or in the codebook that should accompany your data.
  • Coded numeric data is preferred over text (e.g., “1= married”, “2=divorced”, 3=single”, etc.).
  • For Yes/No variables, code No=0 and Yes=1, if possible.
  • Both the database and the variables should have meaningful names (e.g., “projectname_date.xlsx” rather than “mydata.xlsx”; and “Gender” rather than “var1”).
  • Only use the FIRST row/header of the database for variable names.
  • Variable names should be as short as possible and restricted to one cell (i.e., do not merge across cells). Additionally, underscores are preferred to blank spaces between words.
  • Try to avoid using punctuation (e.g., apostrophes, inverted commas, accents, etc.) in the variable names.
  • Missing data should be represented using a blank cell.
  • Dates should be entered consistently. They may be entered as DD-MM-YYYY, or as YYYY-MM-DD, or even as (e.g.) 25 January 2011, provided that ALL dates are entered in the same format.

Codebook Guidelines

Codebooks should contain the following information:

  • Variable names and short description.
  • Codes: labeled categories for variables which require them and the corresponding values (e.g. for a Likert scale: “0-Strongly Disagree”, “1-Disagree”, “2-Neutral”, “3-Agree”, “4-Strongly Agree”; “1-Male”, 2-Female”.).
  • Units/Range.
  • Any and all calculations used to derive “created” variables.