Computer-based Simulation Modelling for Anthropologists
Michael D. Fischer

Design of Simulations

Simulations require programming of some sort, although there are specialised simulation
packages that minimise the effort required.  Because any computing language
that we might use has a limited set of resources available for representing data,
information, relationships and transformations that can be applied to these, we must
carefully design our simulation. Because we are first interested in a specific problem it is
important that the design develop according to the needs of the research, and not from the
needs of the computer. Although we will ultimately be limited by the resources available to
us on the computer, it is better to make the simplifications and compromises at the later
stages rather than the earlier ones of the design. In this way we have considered our
requirements in detail, and can better judge how to solve the more mundane problems of
implementation. Even if you will not ultimately write the programming code for the
simulation yourself, you need to be able to describe it in the terms of stages A, B and C
below for the programmer.

A) At the earliest stages of the design we want a simple statement of the problem and the
  objectives for which we want to construct simulations. This should be a prose
statement in human readable form which clearly states the meaning and intent of the
elements that will be represented in the proposed situations. The purpose of this step is
to begin to lay out clear criteria for judging the adequacy of later stages, since these
should ultimately be a representation of this initial statement. For examples sake we
begin with the following simple problem: we want to assess the effect of different
constraints on mate selection on the structure of monogamous marriage in a closed
population. Constraints will include age, status and group or kinship relations. The
kinds of structures we are interested include age structure, relatedness and the
proportion unmarried. For purposes of example we will follow a cohort for ten years.

B) The second step is to select the conditions for a specific simulation that will contribute to
  the statement of purpose. For the first simulation, we will select the effect of age
constraints on marriage structure. Here we need to state fairly precisely exactly what we
will need to write the simulation. In general we need a list of the agents or objects to
which actions or transformations will be attributed, the relationships that can exist
between different agents/objects, rules of interaction and transformation, and the kinds
of information that will be required. Perhaps more important we need to specify what
conditions will constitute the stages of the simulation, what information we need to
extract from the simulation, and when. For the example we can use:

  Agents: people. Relations between people: marriage.
Constraints on marriage: male should be 18 or older, and female between 13 and 18,
  but at least two years younger than male. A male and female can only marry if both
are unmarried.
  Information required about persons: age, sex, marital status. Information derived about
  persons: proportion married.

C) The third stage is to decide how to represent the elements described in stage B). Here
  we are getting a step closer to the translation into a computing language, but we still
want to remain a bit aloof at this stage with respect to which programming language.
However all existing programming languages share a great deal with a general strategy
for representation.

Explicit object definitions are usually represented as a set of categories, and instances of
objects are usually represented as a set of values for these categories. For example we
might define the object type PERSON as SEX, AGE, and MARITAL STATUS, and a
specific instance of PERSON as male, 14, unmarried. In other words we define a data
type PERSON, which will describe how to access and interpret information about specific
PERSONs. These explicit or simple objects are the basic means programming languages
use to represent data. Some objects like PERSON consist of several categories, some
consist of a single category (which is usually its name), and some have complex
categories, which contain not simple values, but a reference value that permits us to locate
the information in another object. We will look at references in the next example, but we
could have a reference to spouse in the MARITAL STATUS category, instead of the
simple married/unmarried value. In this way we would have access to information about
spouse as well as ego. However, because of our simple objective in this case, to find the
proportion of married to unmarried, we do not require this information. Only information
that is required need be considered, since our PERSON in this case is a model of a person
as required for the objective.

Because of the simple means of representing the married/unmarried relation, we will not
have any interperson relationships, so the representation of the marriage relation is simple.

The constraints on marriage choice will be given as rules. In this case something like: IF
age of male greater than or equal to 18 and the female is between 7 and 5 years younger
than male THEN marriage is ok, otherwise not.

Finally we have to consider what kind of information we want to get out of the simulation.
In this case it is proportion of married to unmarried males and females, and probably more
specifically the proportion of eligible males and females. The former is fairly easy to
accomplish: we need only count the number of unmarrieds before we attempt to marry
them off, and compare this to the number of unmarrieds afterwards. Specifying the
proportion of eligible males and females is a bit more complicated, especially if we are
strict about eligibility, since eligibility could be construed to be dependent on the prior
existence of a male or female of the proper age. A weaker constraint would be to select 18
for males as the eligibility age, and 11 for females. This weakening is reasonable, since we
are in a sense doing the simulation to find out the eligibility rate, but there is value in the
stronger constraint as well, since an 18, 19, and 20 year old male can marry a 13 year old
female, but a 20 year old could also marry a 15 year old female, while the 18 and 19 year
old could not. Thus the 20 year old can eliminate a possible mate for the younger males.

This latter point raises an important issue about the process of making the marriage
decision. How are we to decide the order of choice among the males. This is an issue that
emerges over and over in even the simplest simulations or even especially in the simplest
simulations, since they often have the most simplified decision models. Sometimes we can
decide to use a simple principle, such as oldest (or higher statuses etc.) choose first. It is
best if there is some ethnographic evidence for these kinds of principles, but they can be
used without if you are willing to accept the bias that is introduced. Another choice,
especially if one is intending to run the simulation several times, is to randomly apply the
decision, perhaps with a bias towards some principle, say older males are more likely to
choose before younger, but not necessarily. In our simple example we will choose the
oldest first principle, but will examine the randomised approach.

Building Blocks for Simulations

There are usually a number of different models at work in a given simulation, and indeed
this could be taken as a functional definition of simulation: solving a problem by the
interaction of at least two models. A simulation is then composed of a number of building
blocks, whose properties we more or less understand. These models are themselves
arranged in a larger interacting model, which represents the larger context of these models.
A simulation provides a proving ground for these sub-models. This complicates the
general usefulness of simulations, since the results that arise from a simulation are only as
good as all of the models that the simulation is built from. This can make the validation of
the simulation very problematic indeed.

Validation of a simulation centres on two different aspects: the sub-models and an
evaluation model. The sub-models must be independently validated to establish that they
behave as specified. The evaluation model is used to establish that the simulation does or
does not have specific validity.

The results of any simulation is, of course, only a descriptive model. Any explanatory
power that it might have must be argued based on points that are outside the simulation
proper. In anthropology this is generally ethnographic data. As with a model, any
simulation is a simplification of the system under study, and in many cases does not even
represent any 'real' system at all, rather the simulation is intended to generate model data
for an 'ideal' world, which we can then compare our data to, noting where it corresponds
and departs from the ideal world. This is a useful technique, especially in the early stages
of analysis, since it can be used to establish a sense of how important specific aspects of
the context are to the analysis of the data.

The nature of these models used in the simulation depends very much on what the
objectives of the simulation are. If all we require is a model that behaves correctly with no
explanatory pretensions what so ever, then our job is relatively easier. This is generally the
case for any sub-model whose behaviour is independent, or can be modelled as
independent, of the rest of the simulated context. This includes structures such as rain and
weather in general, the passage of time, and the presence or absence of game, fish, honey,
or other 'natural' resources. It can also include other sub-models, if they are not the
principal object of study. Thus we can include crop yields in this category if we are not
interested in the micro-mechanics of corn growth. What we care about in the simulation is
that given specific environmental conditions, specific inputs, and human labour we will
expect a corn yield. In many simulations all of the sub-models can be descriptive
behaviour generators, if we are principally interested in their interaction.