Configuration System

The pygcam scripts and libraries rely on a configuration file to:

  • define the location of essential and optional files,
  • allow the user to set defaults for many command-line arguments to scripts, and
  • define both global default and project-specific values for all parameters

The pygcam.config module provides access to configuration parameters. The configuration file and the API to access it are described below.

See also

Usage of the config sub-command is described on the gt config page. See pygcam.config for documentation of the API to the configuration system.

The configuration files

There are up to 4 configuration files read, two of which are user-modifiable:

  1. First, pygcam/etc/system.cfg is read from within the pygcam package. This defines all known config variables and provides their default values as described below. The values in this file are the appropriate values for Linux and similar systems. This file should not be modified by the user.
  2. Next, a platform-specific file is read, if it exists. Currently, the only such files are pygcam/etc/Windows.cfg and pygcam/etc/Darwin.cfg, read on Windows and Macintosh systems, respectively. (N.B. “Darwin” is the official platform name for the Macintosh operating system.) These files should not be modified by the user.
  3. Next, if the environment variable PYGCAM_SITE_CONFIG is defined, it should refer to a configuration file that defines site-specific settings. This file is optional; it allows an administrator to consolidate site-specific values to simplify configuration for users.
  4. Finally, the user’s own $HOME/.pygcam.cfg is read if it exists; otherwise the file is created with the initial contents being a commented-out version of pygcam/etc/system.cfg. This provides a handy reference to the available parameters and their default values.

The values in each successive configuration file override default values for variables of the same name that are set in files read earlier. Values can also be set in project-specific sections whose names should match project names defined in the project.xml file. Thus when a user specifies a project to operate on, either on the command-line to GCAM tool (gt) or as the value of GCAM.DefaultProject in $HOME/.pygcam.cfg, the project-specific values override any values set in [DEFAULT] sections.

For example, consider the following values in $HOME/.pygcam.cfg:

[DEFAULT]
GCAM.Root = %(Home)s/GCAM

[Project1]
GCAM.Root = /other/location/GCAM

[OtherProject]
# no value set here for GCAM.ROOT

The default value for GCAM.Root is %(Home)s/GCAM. This value is used for the project OtherProject since no project-specific value is defined, but the project Project1 overrides this with the value /other/location/GCAM.

The available parameters and their default values are described below.

Editing the user configuration file

You can edit the configuration file, $HOME/.pygcam.cfg, with any editor capable of working with plain text, i.e., not a word-processor such as Word. Use the command gt config -e to invoke an editor on the configuration file.

The command invoked by gt config -e to edit the config file is the value of the configuration parameter GCAM.TextEditor, which defaults to a system-appropriate value shown in the table below. Set this value in the configuration file to invoke your preferred editor. For example, if you prefer the emacs editor, you can add this line to ~/.pygcam.cfg:

GCAM.TextEditor = emacs

Then, invoking the command:

gt config -e

will cause the command emacs $HOME/.pygcam.cfg to be run.

Referencing configuration variables

A powerful feature of the configuration system is that variables can be defined in terms of other variables. The syntax for referencing the value of a variable is to precede the variable name with %( and follow it with )s. This to reference variable GCAM.QueryDir, you would write %(GCAM.QueryDir)s. Note that variable names are case-sensitive.

Note that variable values are substituted when a variable’s value is requested, not when the configuration file is read. The difference is that if variable A is defined in terms of variable B, (e.g., A = %(B)s/something/else), you can subsequently change B and the value of A will reflect this when A is accessed by pygcam.

Note

When de-referencing a variable in the config file, you must include the trailing ‘s’ after the closing parenthesis, or a Python exception will be raised.

Validating configuration settings

pygcam requires that certain configuration variables be set. The table below shows key variables, indicating whether they are required or optional, and whether their value must be a file or directory.

Variable name Required Type
GCAM.SandboxRoot yes directory
GCAM.ProjectRoot yes directory
GCAM.QueryDir yes directory
GCAM.MI.Dir yes directory
GCAM.RefWorkspace yes directory
GCAM.TempDir yes directory
GCAM.ProjectXmlFile yes file
GCAM.RefConfigFile yes file
GCAM.MI.JarFile yes file
GCAM.UserTempDir no directory
GCAM.RegionMapFile no file
GCAM.RewriteSetsFile no file

The config sub-command provides a limited amount of validation by checking that all required and optional variables are set to reasonable values. To check the config file, run the command gt config -t. You can specify a project to check that project’s variables. For example, I can test the values set for project Paper1 with the following command, shown with command output:

$ gt -P paper1 config -t
OK: GCAM.SandboxRoot = /Users/rjp/ws
OK: GCAM.SandboxDir = /Users/rjp/ws/paper1
OK: GCAM.ProjectRoot = /Users/rjp/bitbucket
OK: GCAM.ProjectDir = /Users/rjp/bitbucket/paper1
OK: GCAM.QueryDir = /Users/rjp/bitbucket/paper1/queries
OK: GCAM.MI.Dir = /Users/rjp/GCAM/current/ModelInterface
OK: GCAM.RefWorkspace = /Users/rjp/GCAM/current/Main_User_Workspace
OK: GCAM.TempDir = /tmp
OK: GCAM.UserTempDir = /Users/rjp/tmp
OK: GCAM.ProjectXmlFile = /Users/rjp/bitbucket/paper1/etc/project.xml
OK: GCAM.RefConfigFile = /Users/rjp/GCAM/current/Main_User_Workspace/exe/configuration_ref.xml
OK: GCAM.MI.JarFile = /Users/rjp/bitbucket/gcam-proj/ModelInterface/ModelInterface.jar
OK: GCAM.RegionMapFile = /Users/rjp/bitbucket/paper1/etc/Regions.txt

Location of GCAM program and data files

The configuration variable GCAM.RefWorkspace must point to a directory structured like the standard GCAM Main_User_Workspace, with sub-directories for input, output, libs, and exe. These files are the reference files used by GCAM tool (gt) to set up “sandbox” workspaces in which to run GCAM.

pygcam does not run GCAM in the reference workspace’s exe directory; it uses the files there to create new workspaces as required. Creating separate workspaces for each scenario allows multiple scenarios to be run simultaneously without contention for the XML database which is created at the end of the model run. This is essential when running on a computing cluster.

The variable GCAM.MI.Dir should point to a directory holding the ModelInterface program. This is used to execute batch queries to extract results from GCAM.

Default values

The system default values are provided in the pygcam package in the file pygcam/etc/system.cfg, which is listed below. In addition to these values, several values are read from platform-specific files, as noted above. These values are shown below.

For Windows:

# Windows-specific default values
[DEFAULT]
GCAM.Executable = Objects-Main.exe
GCAM.TextEditor = notepad.exe
GCAM.MI.UseVirtualBuffer = False

For Macintosh OS X:

# Macintosh-specific default values
[DEFAULT]
GCAM.MI.JarFile     = %(GCAM.MI.Dir)s/ModelInterface.app/Contents/Resources/Java/ModelInterface.jar
GCAM.Executable = Release/objects
GCAM.TextEditor = open -e
GCAM.MI.UseVirtualBuffer = False

Default configuration variable dependencies

The following figure shows variable dependencies according to the default definitions. Variables lower in the figure depend on those above them. Thus, if you change a variable with “descendants”, you affect the definition of everything below it in the figure.

_images/ConfigVarStructure.jpg

The system defaults file

[DEFAULT]

GCAM.VersionNumber = 4.2

# This project is used if '-P' flag not given to the 'gt' command
GCAM.DefaultProject =

# Linux default is defined here. Can be overridden in
# platform-specific config file (e.g., etc/Darwin.cfg,
# etc/Windows.cfg).
GCAM.Executable = gcam.exe

# This defines the variable for documentation purposes. The value
# is set automatically in each project section to the name of that
# section -- unless a non-blank value already exists in the section.
GCAM.ProjectName =

# Root directory for where the user keeps project folders
GCAM.ProjectRoot = %(Home)s/projects
GCAM.ProjectDir  = %(GCAM.ProjectRoot)s/%(GCAM.ProjectName)s

# Where to find plug-ins. Internal plugin directory is added
# automatically. Use this to add custom plug-ins outside the pygcam
# source tree. The value is a semicolon-delimited (on Windows) or
# colon-delimited (on Unix) string of directories to search for files
# matching the pattern '*_plugin.py'
GCAM.PluginPath = %(GCAM.ProjectDir)s/plugins

# Sets the folder holding the symlink "current" which refers
# to a folder holding Main_User_Workspace and ModelInterface.
# (This is one way of setting up the code, but not required.)
GCAM.Root = %(Home)s/GCAM

# Refers to the GCAM folder holding the version of the model
# you want to use. It is convenient to make this a symbolic link.
GCAM.Current = %(GCAM.Root)s/current

# The location of the Main_User_Workspace to use. This can refer
# to any folder; GCAM.Current is just an optional convention.
GCAM.RefWorkspace = %(GCAM.Current)s/Main_User_Workspace

# Files to link from the reference workspace to run-time workspace.
# If linking fails on Windows or GCAM.CopyAllFiles = True, files are
# copied instead.
GCAM.WorkspaceFilesToLink = input

# Same as above, for files to link from the run-time workspace to sandboxes.
GCAM.SandboxFilesToLink = input exe/%(GCAM.Executable)s

# The reference config file to use as a starting point for "setup"
GCAM.RefConfigFile = %(GCAM.RefWorkspace)s/exe/configuration_ref.xml

# QueryPath is string with one or more colon-delimited elements that
# identify directories or XML files in which to find batch query
# definitions.
GCAM.QueryDir  = %(GCAM.ProjectDir)s/queries
GCAM.QueryPath = %(GCAM.QueryDir)s/Main_Queries.xml

# File that defines query rewrites by name for use by query command.
# GCAM.RewriteSetsFile = %(GCAM.ProjectDir)s/etc/rewriteSets.xml
GCAM.RewriteSetsFile =

# The location of GCAM source code (for the purpose of reading
# the .csv file that defines the current regional aggregation.
GCAM.SourceWorkspace =

# The name of the XML Starlet program. Use full path if it's not
# found on your usual PATH.
GCAM.XmlStarlet = xml

# If using the XML "setup" system, this is the root folder for
# setup source files
GCAM.XmlSrc = %(GCAM.ProjectDir)s/xmlsrc

# The default input file for the runProj sub-command
GCAM.ProjectXmlFile = %(GCAM.ProjectDir)s/etc/project.xml

# Whether GCAM should generate a debug file (no value => no change)
GCAM.WriteDebugFile =

# Whether GCAM should generate a price file
GCAM.WritePrices =

# Whether GCAM should generate the large XML file with the combined data
# from all input files.
GCAM.WriteXmlOutputFile =

# Whether GCAM should generate outFile.csv
GCAM.WriteOutputCsv =

# Path to an XML file describing land protection scenarios
GCAM.LandProtectionXmlFile =

# Default location in which to look for scenario directories
GCAM.ScenariosDir =

# The pathname of the XML scenario setup file. If empty (no path
# provided) the file %(GCAM.XmlSrc)/scenarios.py, will be used.
GCAM.ScenarioSetupFile =

# Where to save expanded XML when using "iterators" in XML setup.
# This is optional and provide to aid in debugging setups.
GCAM.ScenarioSetupOutputFile =

# Set this to identify the subclass of XMLEditor to use as a
# superclass to generate a class to process your XML-based
# scenario setup. Uses XMLEditor as a superclass by default.
# The format of this option is:
#   {path to module directory}:{module.dot.specification}, e.g.
#
# GCAM.ScenarioSetupClass = %(Home)s/somewhere/pygcam:pygcam.sectorEditors.BioenergyEditor
#
GCAM.ScenarioSetupClass =

# The ModelInterface directory for the version to use.
GCAM.MI.Dir = %(GCAM.Current)s/ModelInterface

# This is defined dynamically in config.py. On the Mac, it's here:
# %(GCAM.MI.Dir)s/ModelInterface.app/Contents/Resources/Java/ModelInterface.jar
GCAM.MI.JarFile = %(GCAM.MI.Dir)s/ModelInterface.jar

# This is set dynamically to True on Linux (deprecated in next release)
GCAM.MI.UseVirtualBuffer = False

# The location of the libraries needed by ModelInterface
GCAM.MI.ClassPath = %(GCAM.RefWorkspace)s/libs/basex/BaseX.jar:%(GCAM.MI.JarFile)s

# Arguments to java to ensure that ModelInterface has enough heap space.
GCAM.MI.JavaArgs = -Xms512m -Xmx2g

GCAM.MI.BatchCommand = java %(GCAM.MI.JavaArgs)s -cp %(GCAM.MI.ClassPath)s ModelInterface/InterfaceMain -b "{batchFile}"

# Name of log file to catch verbose ModelInterface output. A relative
# path is written to the batch output dir. An absolute pathname is
# used as is. Set this to /dev/null on Unix to trash the output.
GCAM.MI.LogFile = mi.log

# The name of the database file (or directory, for BaseX)
GCAM.DbFile	= database_basexdb

# Columns to drop when processing results of XML batch queries
GCAM.ColumnsToDrop = scenario,Notes,Date

# Change this if desired to increase or decrease diagnostic messages.
# A default value can be set here, and a project-specific value can
# be set in the project's config file section.
# Possible values (from most to least verbose) are:
# DEBUG, INFO, WARNING, ERROR, CRITICAL
GCAM.LogLevel = WARNING

# The default location in which to find or create GCAM runtime sandboxes
GCAM.SandboxRoot = %(GCAM.Root)s/ws

GCAM.SandboxProjectDir = %(GCAM.SandboxRoot)s/%(GCAM.ProjectName)s

# Identifies the location of the workspace copy used to create
# new sandboxes. The workspace is created on demand.
GCAM.SandboxRefWorkspace = %(GCAM.SandboxProjectDir)s/Workspace

# N.B. These are set at run-time in project.py and are available
# with the "run" sub-command. SandboxDir is the directory under
# which sandboxes are created
# ScenarioGroup is set in project.py if <ScenarioGroup> sets
# useGroupDir="1". Thus the variable is available only via the
# run sub-command.
GCAM.ScenarioGroup =

# Directory in which new sandboxes are created for current project
# and scenario group. ScenarioGroup may be empty, but this doesn't
# affect path construction.
GCAM.SandboxDir = %(GCAM.SandboxProjectDir)s/%(GCAM.ScenarioGroup)s

# If set, application logger messages are written here. Note that
# this is different than the GCAM.BatchLogFile for batch job output.
GCAM.LogFile = %(GCAM.SandboxRoot)s/log/gt.log

# If GCAM.BatchLogFile or the --logFile arg to gt is not an absolute
# path (i.e., the path portion of the logFile does not start with '/',
# with Windows paths converted to Unix format), then batch log files
# are created relative to this directory.
GCAM.BatchLogDir = %(GCAM.SandboxDir)s/log

# Save batch log messages in the indicated file. Default is set up
# for SLURM, which replaces "%j" with the jobid. Note that "%" is a
# special character in configparser, the "%" cannot be specified
# directly. Instead, use "$", which is translated to "%" after all
# interpolation is done, unless GCAM.BatchLogFileDollarToPercent is
# False.
GCAM.BatchLogFile = gt-$j.out

GCAM.BatchLogFileDollarToPercent = True

# Set to True to keep XML database in memory, running queries defined
# in the generated XMLDBDriver.properties file based on list of queries
# defined in project. Setting GCAM.InMemoryDatabase to True implies that
# both GCAM.MultipleBatchQueries and GCAM.RunQueriesInGCAM are True.
GCAM.InMemoryDatabase = False

# Set to False to use old method of running ModelInterface anew for each query.
# Default creates a single batch file with multiple queries, invoking MI once.
# Setting GCAM.InMemoryDatabase to True implies that GCAM.MultipleBatchQueries
# is True since this is the only way to extract results from GCAM.
GCAM.BatchMultipleQueries = True

# If True, we expect GCAM to run batch queries before exiting. This is
# typically used with the in-memory database, but works otherwise, too.
# When False, an XMLDBDriver.properties file is written with an empty
# batch-query element. Setting GCAM.InMemoryDatabase to True implies
# that GCAM.RunQueriesInGCAM is True, but queries can still be run in
# GCAM when the XML database is written to disk. Setting this parameter
# to True implies that GCAM.BatchMultipleQueries is True since this is
# the only way to run multiple queries internally in GCAM.
GCAM.RunQueriesInGCAM = False

# Show log messages on the console (terminal)
GCAM.LogConsole = True

# The name of the queue used for submitting batch jobs on a cluster.
# On SLURM, you can request multiple queue, taking first one available.
GCAM.DefaultQueue = short,slurm

#GCAM.QueuePBS = qsub -q {queueName} -N {jobName} -l walltime={walltime} \
#  -d {exeDir} -e {logFile} -m n -j oe -l pvmem=6GB -v %(GCAM.OtherBatchArgs)s \
#  QUEUE_GCAM_CONFIG_FILE='{configs}',QUEUE_GCAM_WORKSPACE='{workspace}',QUEUE_GCAM_NO_RUN_GCAM={noRunGCAM}

GCAM.QueuePBS = qsub -q {queueName} -N {jobName} -l walltime={walltime} -e {logFile} -m n -j oe -l pvmem=6GB -v %(GCAM.OtherBatchArgs)s

# N.B. --signal=USR1@15 => send SIGUSR1 15s before walltime expires
#
#GCAM.QueueSLURM = sbatch -p {queueName} --nodes=1 -J {jobName} -t {walltime} \
#  -D {exeDir} --get-user-env=10L -s --mem=6000 --tmp=6000 %(GCAM.OtherBatchArgs)s \
#  --export=QUEUE_GCAM_CONFIG_FILE='{configs}',QUEUE_GCAM_WORKSPACE='{workspace}',QUEUE_GCAM_NO_RUN_GCAM={noRunGCAM}

GCAM.QueueSLURM = sbatch -p {queueName} --nodes=1 -J {jobName} -t {walltime} --get-user-env=10L -s %(GCAM.OtherBatchArgs)s -o {logFile} -e {logFile} {scriptFile}

# Arbitrary arguments to add to the selected batch command
GCAM.OtherBatchArgs =

GCAM.BatchCommand = %(GCAM.QueueSLURM)s

# Set this to a command to run when the -l flag is passed to gcamtool's
# "run" sub-command. The same options are available for substitution as
# for the GCAM.BatchCommand.
GCAM.LocalCommand =

# Arguments to qsub's "-l" flag that define required resources
GCAM.QsubResources = pvmem=6GB

# Environment variables to pass to qsub. (Not needed by most users.)
GCAM.QsubEnviroVars =

# For qsub, the default number of minutes to allocate per task.
GCAM.Minutes = 20

# A file that maps GCAM regions to rename them or to aggregate
# them. Each line consists of a GCAM region name, some number of
# tabs, and the name to map the region to.
GCAM.RegionMapFile =

# Where to create temporary files
GCAM.TempDir = /tmp

# Where to create temporary batch scripts to run gcamtool.py on a
# compute node. Note that this should not be set to a directory
# that is machine-specific (such as "/tmp"), since the file needs
# to be visible from compute nodes. The directory is created if needed.
GCAM.UserTempDir = %(Home)s/tmp

# For Windows users without permission to create symlinks
GCAM.CopyAllFiles = False

# For debugging purposes: gcamtool.py can show a stack trace on error
GCAM.ShowStackTrace = False

# If set, this format is applied to columns holding values for years
# when combining CSV files into XLSX files.
GCAM.ExcelNumberFormat = 0.000

# TextEditor to open via the --edit option to the 'config' sub-command
GCAM.TextEditor = vi