Database Guidelines

  • Excel is preferred
  • Codebooks should be provided with the data sets
  • All information should be entered into a single sheet
  • Any other information (such as graphs or initial calculations) should be removed from this sheet
  • Each row should correspond to a single unit (i.e. person, animal, etc.) on which you have made observations
  • Each unit should have a unique identifier (which in some way corresponds to the hard copy); e.g., subject ID
  • Do NOT include patient names, if possible
  • Each column should correspond to a measured variable/field
  • Formatting should be consistent within each variable/column; do not mix text and numerical data
  • Only one variable per column; e.g., blood pressure would be split into two columns/variables: diastolic and systolic
  • Do not include units in the cell information-rather include this in the variable name or in the codebook that should accompany your data
  • Coded numeric data is preferred over text (e.g., “1= married”, “2=divorced”, 3=single”, etc.)
  • For Yes/No variables, code No=0 and Yes=1, if possible
  • Both the database and the variables should have meaningful names, e.g. “projectname_date.xlsx” rather than “mydata.xlsx” and “Gender” rather than “var1”).
  • Variables (columns) should only be named using the FIRST row/header.
  • Variable names should be as short as possible, and restricted to one cell (i.e. do not merge across cells). Additionally, underscores are preferred to blank spaces between words.
  • Do not use punctuation (e.g. apostrophes, inverted commas, accents etc.)
  • Missing information should be represented using a blank cell
  • Dates should be entered consistently: they may be entered as DD-MM-YYYY, or as YYYY-MM-DD or even as (e.g.) 25 January 2011, provided that ALL dates are treated in the same manner.

Codebook Guidelines:

Codebooks should contain information on the following:

  • Variable names and short description
  • Codes: labeled categories for variables which require them and the corresponding values (e.g. for a Likert scale: “0-Strongly Disagree”, “1-Disagree”, “2-Neutral”, “3-Agree”, “4-Strongly Agree”; “1-Male”, 2-Female”.)
  • Units/Range
  • Any and all calculations used to derive “created” variables