Next: Linear Structure Model Definition
Up: The GENetic optimization and
Previous: Program Syntax
General Setup
The general setup section of the model specification (modelspec) file
started at the very beginning of the file. The format of the section is
keyword for the desired option followed by a space and then any
arguments (numbers or filenames) needed for the option.
The following option keywords may be included in this section. Most of the
options are set to a default value if no definition is given in the modelspec
file. Most of the keywords may appear in any order, but they must be
separated by either spaces or line breaks and they must appear in the General
Setup section. The General Setup section ends with the keyword
model--see the example model specification files.
The keywords dataset, observedvars and latentvars
must be included with specified values because they don't have a default value
and are required. The ngroups option must be specified if multigroup
processing is to occur; in this case ngroups must be specified first,
before the other three options.
- ngroups
Usage: ngroups integer
Default: 1
This keyword must be specified if data from multiple groups are
to be processed in parallel. The integer argument specifies the number
of groups. This number must equal the number of files named in the
datafile option and the number of variable numbers listed in the
observedvars option. If there is only one group, then this
option may be omitted and the default value of 1 will be used.
- datafile
Usage: datafile path-to-file [path-to-file ...]
Default: none
This keyword precedes the path to where the dataset of raw observations is
located. There can be a maximum of 4096 characters in any path name used in
GENBLIS. This is a longer path than many operating systems allow. The data
are read from the dataset by observation, using the variable number
integer specified in the observedvars option: the first
integer values in the dataset become the values of the observed
variables for the first observation, the next integer values in the
dataset become the values of the observed variables for the next
observation, etc. Each value is translated from ascii (character)
representation to a double precision floating point number (C type double;
the C-language format used to scan [fscanf] the ascii representations is
%lf). Numbers in the raw data files must be delimited by whitespace.
GENBLIS sets the number of observations equal to the number of observations
read in this manner from the dataset. If multigroup processing is desired,
the data for each group must be placed in a separate file, each of which is
named in succession after the datafile keyword. The number of
filenames must equal the value of the argument to the ngroups
option. Each dataset is processed in parallel, in the fashion described for
a single dataset.
- observedvars
Usage: observedvars integer [integer ...]
Default: none
This variable must be set to the number of observed variables in the
data. The number is equal to the number of variables which are read in from
the data file. This specifies the number of rows in the model matrix
named
(see the model equation). If multigroup processing is desired, the number of
observed variables for each group must be listed. The order of the numbers
must match the order of the raw datasets named in the datafile
option. A number must be specified for each file, even if all the numbers
are equal. In the GENBLIS model section, the observed variables
are referred to by number in order across all groups. For instance, if
there are six variables in each of two groups, the observedvars
option would read ``observedvars 6 6'' and in the model
section the observed variables in the first group would have numbers 1-6
while the observed variables in the second group would have numbers 7-12.
The
matrix has 12 rows in this case.
- latentvars
Usage: latentvars integer
Default: none
The variable sets the number of latent variable present in the model to be
estimated. This specifies the number of rows and columns in the model
matrices named B and
,
and the number of columns in the matrix named
(see the model equation).
In a multigroup model, this number must count all the latent variables in
all the groups. For example, in a model with two groups and a one-factor
model in each group, there are two latent variables.
The remaining variables need not be set by the user.
- bootseed
Usage: bootseed integer
Default: 707070
There are two random number generators used by GENBLIS and hence two random
number seeds: genoudseed and bootseed. This option sets
the random number seed for generating the bootstrap resamples. This number
must be an integer. For each seed, a unique series of random numbers is
generated. This option allows one to replicate results. If both
bootseed and genoudseed are not changed, each and every
execution of GENBLIS with the same data and model will produce
exactly the same results. GENBLIS will generate and then use
exactly the same random numbers.
- bootstraps
Usage: bootstraps integer
Default: 1000
This variable sets the number of bootstrap resamples which are used to
calculate the bootstrap confidence intervals for the parameters and to
estimate the bootstrap confidence interval for the goodness-of-fit test.
- bootstrapdetails
Usage: bootstrapdetails
Default: The details of the BCa bootstrap are not presented.
By default GENBLIS prints the BCa confidence intervals. If this
option is selected, along with the BCa bootstrap confidence
intervals, GENBLIS will print the values of the bias (
)
and
acceleration adjustments (
)
for the BCa bootstrap.
For details please see Efron and Tibshirani, An Introduction to
the Bootstrap, (New York: Chapman & Hall), pp.325ff.
- covmatrix
Usage: covmatrix
Default: The asymptotic covariance matrix is not printed.
If this option is listed, the estimated asymptotic covariance matrix of
the parameter estimates is printed.
- genconverge
Usage: genconverge integer
Default value: 5.
This option sets the number of generations after GENBLIS thinks
it has converged it ought to keep going. This number must be an integer.
GENBLIS thinks it has converged if the gradients at the best solution
found so far are below a criterion defined below. It is often good to
keep GENBLIS working after this point because the evolutionary program
(EP) portion of GENBLIS continues to provide useful non-local information.
Obviously, the higher this number is, the greater security one has that a
global optimum has been found.
- genmax
Usage: genmax integer
Default: 100
This variable sets the maximum number of generations. Recall that theory
suggests that the size of the population of genetic operators is of greater
practical importance than the number of generations. The asymptotics are
primarily in operator population size: it will not work to run a very large
number of generations if the operator population is too small. But
optimization will also fail if the generation limit is set too small. Use
the genetic operator controls to change the population of
operators.
- genoudseed
Usage: genoudseed integer
Default value: 0
There are two random number generators used by GENBLIS and hence two random
number seeds: genoudseed and bootseed. This options sets
the random number seed for GENOUD which is the evolutionary program heart of
GENBLIS. This number must be an integer. For each seed, a unique series of
random numbers is generated. This option allows one to replicate results.
If both bootseed and genoudseed are not changed, each and
every execution of GENBLIS with the same data and model will produce
exactly the same results. GENBLIS will generate and then use
exactly the same random numbers.
- listfile
Usage: listfile path-to-file
Default: the name of the model specification file plus the .lst
extension.
In addition to the output which GENBLIS by default sends to standard out,
GENBLIS creates a list file which contains a summary of the results. The
list file provides a description of the estimated model, the options
chosen, and the results requested. The list file does not provide the
details of program execution. By default GENBLIS sends this list output
to a file it creates with the same name as the model specification file
plus the extension .lst. For example, if the specified model
specification file is entitled 1run the default list file will
be 1run.lst and it will be created in the directory from which
GENBLIS is executed. The name for the list file can be chosen using the
listfile option--please see the General Setup section.
- noboots
Usage: list noboots in the modelspec file to turn
off the bootstrap routine.
Default: the bootstrap routine is turned on.
By default GENBLIS provides bootstrap confidence intervals for the linear
structure model parameter estimates and provides a bootstrap goodness-of-fit
test.
- nobig
Usage: list nobig in the modelspec file to turn off
using control.big.
Default: control.big is used.
GENBLIS has four different control setups: control (used for the
original sample), control.big (used in the original sample and when
there is a convergence failure in either the jackknives or the bootstraps),
control.jack (used for the jackknives), and control.boot
(used for the bootstraps). When the nobig option is used GENBLIS
estimates only the original sample using the control file and then stops.
Neither jackknives nor bootstraps are done.
- nojacks
Usage: list nojacks in the modelspec file to turn off
the jackknife routine.
Default: the jackknife is turned on.
By default jackknives are done by GENBLIS because they are required in order
to estimate the bootstrap confidence intervals produced by GENBLIS. The
jackknives are usually done after the linear structure model is estimated in
the original sample (i.e., in the the observed dataset) and before the model
is estimated in bootstrap resamples. If one wants neither jackknives nor
any bootstrap resamples, then one must set nojacks and set
bootstraps to 0. These two options are usually used together
except for diagnostic purposes. The bootstraps will not provide useful
information without the jackknives.
- readrecordfile
Usage: readrecordfile path-to-file
Default: none
This keyword precedes the path to a GENBLIS record file which will be
read by GENBLIS. See the recordfile option for a full
discussion.
- recordfile
Usage: recordfile path-to-file
Default: none
This keyword precedes the path to the file GENBLIS will write its record
file. The record file is an ASCII file which records all of GENBLIS's
results--i.e., in the original sample and in jackknife and bootstrap
resamples. This file is not needed to run GENBLIS. But it is useful if
one stops GENBLIS in the middle, and then wants to start it again. The
readrecordfile option allows GENBLIS to finish a run started by
another execution. These options are especially useful when in one or
more bootstrap resamples convergence is difficult to obtain. One can
obtain convergence in the relatively easy cases and then restart GENBLIS
with a large genetic population--see the Genetic Operator Controls
section.
- onboundary
Usage: onboundary floating-point number
Default value: 0.000001
An estimated variance must be above this number or GENBLIS considers that
variance to be on the boundary--i.e., 0. This is known as a ``Heywood
case.'' When a variance is on the boundary, the remaining parameters,
including the non-zero variances, are estimated as usual, albeit with the
on-boundary parameter treated as if it were fixed equal to the boundary
value. In many cases the results from this approach will not be meaningful,
so if an on-boundary situation occurs it is extremely important to recheck
the data and model specification. Rechecking is especially called for if
the on-boundary situation occurs in the original sample of data. If a
variance goes below the onboundary threshold, GENBLIS prints
``boundary-hit threshold'' and goes on to estimate the remaining parameters.
- usecorr
Usage: usecorr
Default: analyze the sample correlation matrix
Specify this keyword to analyze the sample product-moment correlation
matrix. If none of the keywords usecov, usecorr or
usecrossp are specified, GENBLIS will analyze the covariance
matrix. If more than one of the three keywords is specified, the one
occurring last in the General Setup section will take effect.
- usecov
Usage: usecov
Default: analyze the sample covariance matrix
Specify this keyword to analyze the sample covariance matrix. If none of
the keywords usecov, usecorr or usecrossp are
specified, GENBLIS will analyze the covariance matrix. If more than one of
the three keywords is specified, the one occurring last in the General Setup
section will take effect.
- usecrossp
Usage: usecrossp
Default: analyze the mean sample crossproduct matrix
Specify this keyword to analyze the sample crossproduct matrix (the simple
crossproduct matrix is divided by the number of observations, or by the sum
of the weight values specified by the weights option). If none of
the keywords usecov, usecorr or usecrossp are
specified, GENBLIS will analyze the covariance matrix. If more than one of
the three keywords is specified, the one occurring last in the General Setup
section will take effect.
- use_out_of_bounds
Usage: list use_out_of_bounds in the modelspec file to turn this
option on.
Default: option is off.
This option tells GENBLIS if it should use parameter values which are
out of the bounds set in the modelspec file (the setting of bounds is
discussed later). When this option is on, GENBLIS uses the bounds
information provided to focus its search. However, GENBLIS will allow
searching outside of the bounds. If this occurs, a message will be
printed in standard output informing the user that GENBLIS is using
values outside of the bounds. If this option is off, as is the default,
GENBLIS will constrain optimization within the bounds but will print a
message that it wishes to go outside of the bounds. It is sometimes wise,
however, to ignore GENBLIS if one wishes to extensively search one region
of the parameter space.
- weights
Usage: weights path-to-file [path-to-file ...]
Default: none
This keyword precedes the path to where a dataset of numbers to use as
observation weights is located. If this option is specified, each
observation is multiplied by the corresponding weight value whenever there
is summing over the data to compute sample means, covariances or other
sample moments. The resulting sum is divided by the sum of the weights,
rather than by the unweighted number of observations. The weights are used
for sample moments computed in each bootstrap resample; each weight always
remains correctly associated with the original sample observation data
vector. If multigroup processing is being used, the weighting data for each
group must be placed in a separate file, each of which is named in
succession after the weights keyword. The number of filenames must
equal the value of the argument to the ngroups option. The number
of weight values in each file must equal the number of observations in the
dataset named in the corresponding position in the datafile option.
GENBLIS may not complain if the numbers do not match, but in general the
results in case of a mismatch will be garbage. When weights are used the
sample covariance matrix is computed with denominator equal to the sum of
the weights. When weights are not used the denominator is n-1, where n
is the number of observations (or, with multigroup processing, the number of
observations in the relevant group).
Although the genadd and cases variables are no longer
used by GENBLIS, including them in the model specification file will not
cause an error. These variables are simply ignored.
Next: Linear Structure Model Definition
Up: The GENetic optimization and
Previous: Program Syntax
Jas S. Sekhon
1998-08-25