Representing Anthropological Knowledge: Calculating Kinship
Michael D. Fischer
Analyzing and Understanding Cultural Codes


Kinship Introduction
Learning Kinship with
the Kinship Editor
Use the Kinship Editor
Using data about Kinship

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents


One of the general problems with representation using a computer are the tools available for that purpose. Programming languages are expressions of the language designer's ideas of how a computer should be instructed to solve the kinds of problems the designer thinks are important. Hence languages such as FORTRAN are good for programming numerical problems, but not very efficient for solving problems involving non-numeric symbols. Prolog is a programming language that originated in the theoretical work of Colmerauer at the University of Marseilles in the early 1970's. The name stands for PROgramming in LOGic, and is an attempt to allow problem statement be descriptive and the solution process to be non-deterministic, rather than prescriptive and deterministic like FORTRAN and most other programming languages. A procedural ('prescriptive') language requires an explicit list of operations required to arrive at a solution. This requires the programmer first to model the domain of objects in some abstract way, to define what the set of operations will be, and to describe a method of sequencing the operations; an algorithm, or procedure.

A declarative ('descriptive') language requires the programmer to state the relations between objects, define a set of operations (rules), but not the sequencing of the operations. The aspects of process are implicit rather than explicit. The solution is non-deterministic in the sense that the programmer does not explicitly determine the method of solution out of a number that might be provided, and that the problem may have several solutions, each of which may be formed using different derivations. Although Prolog is not entirely successful as a logic language, it facilitates the statement of many problems that would be very difficult to state in mainstream procedural programming languages.

Programming languages have typical introductory examples that are given as a brief description. For FORTRAN the example is converting Fahrenheit to Centigrade, for BASIC inputting and printing the users name, for LISP computing n-factorial, and for Prolog it is building and interrogating genealogical databases. Prolog requires little 'setup' or preparation to get a result; it is an interactive language. It is capable of handling very complex, arbitrary, data types, such as lists of items, trees, or graphs. It is practically limited to certain problem domains, but many of these are ideal for anthropologically interesting problems. The language is widely available for mainframes and micros, especially in the UK, where much of the development of the language has taken place. We are using a version of Prolog written in Java, so that we can use it in java applets on web pages. Although not a full implementation, it is sufficient for the purpose of representing our kinship models.

A couple of excellent books on Prolog are given in the bibliography (see Bratko 1986 for a full introduction. Kononenko and Lavrac 1989 provides a quick introduction with examples ).

Representing relations in Prolog: a ten minute introduction

The basic building block in Prolog is called a clause, which can take forms which include:

P(X,Y), Q(Y,Z).
P(X,Y) :- Q(Y,X).

where P is any predicate and X and Y are variables which range over some domain. The predicate P asserts a relationship between the arguments, if any, and can evaluated by Prolog as either true or false. The relationship asserted by a predicate can be of any sort: 'male(X)' /X is male/;' mother(X,Y)' /X is the mother of Y/; 'younger(X,Y)' /X is younger than Y/. The programmer, by their authoring of the predicates, states the relationships and is responsible for ensuring that conclusions drawn from evaluating the predicates has a reasonable interpretation outside the program. For obvious reasons, names are used which are meaningful to the programmer. These names are not meaningful to Prolog. For example, we might have two named predicates, spouse and gender:

spouse(X,Y) /X is a spouse of Y/
gender(X,Y) /the gender of X is Y/

Prolog must have some basis for assigning values of true or false to statements consisting of or including these predicates. The most basic method of supplying this information is to insert concrete instances of the predicates, by replacing the variables with data:

spouse(mary, john).
spouse(carl, janet).
gender(mary, female).
gender(john, male).
gender(carl, male).
gender(janet, female).

This form of predicate instance is, obviously, data. The anthropologist selects the predicate names and types in the information. At this early stage the anthropologist can access the information in four ways, by instructing Prolog to evaluate:

spouse(mary, john). /is mary the spouse of john?/
spouse(john,mary). /is john the spouse of mary?/
spouse(F, john). /who is the spouse of john?/
spouse(mary, K). /mary is the spouse of who?/
spouse(V,L). /who is the spouse of whom?/

The first query simply evaluates to true, because this is an instance previously supplied to Prolog. The second query evaluates to false. There is no instance which matches 'spouse(john,mary)'. Prolog, in evaluating this query, can make no use of the knowledge that spouse is a reciprocal term, because this knowledge has not been made available. A simple (but wasteful) means of establishing this relationship is to add the instance 'spouse(john,mary)' to the program.

The terms F, K, V and L in the latter three examples are variables. Any term which begins with an uppercase letter (A-Z) is treated as a variable by Prolog. When variables are present, information from each matching instance of the predicate is assigned to the variable from the corresponding position. The latter three will evaluate to true, but the presence of variables in the query will give more information. The third query supplies the additional information that the value of the variable F is 'mary'. The fourth that K is 'john'. The fifth evaluates to true twice, the first time with 'V=mary' and 'L=john', and the second time to 'V=carl' and 'L=janet'.

This is mildly useful, particularly if a large number of spouses are under investigation. Much more useful are compound clauses. If, rather than spouses, we want to list wives, make the query:

spouse(Wife,Ego), gender(Wife,female).

Prolog will evaluate 'spouse(Wife,Ego)' to be twice true. The first time Wife will be bound to 'mary', and Ego to 'john'. The comma instructs Prolog to evaluate the second clause with these same variable bindings. Treating 'gender(Wife, female)' as 'gender(mary, female)', this clause will evaluate to true, and the variable bindings can be examined to determine who this wife is and to whom. The second time 'spouse(Wife, Ego)' will evaluate to true, with Wife bound to 'carl' and Ego to 'janet'. Evaluating the second clause will these bindings results in evaluating 'gender(carl,female)' to be false, therefore the compound clause is false, and no variable bindings are available for examination.

Quite complex compound clauses can be formed, but if you intend to use the same clause more than once, or use it as the basis of an even more complex clause, the clause can be added to the program as a rule. We can define a predicate wife with the following instance in the program:

wife(Wife, Ego) :- spouse(Wife,Ego), gender(Wife,female).

This can be read: 'wife(Wife, Ego)' is true when the compound clause is true. If the compound clause is true, then the variables in 'wife(Wife,Ego)' will be bound to the values which led to this evaluation. If you make the query, 'wife(X,Y)', this will be evaluated precisely as the previous example, reporting that X is 'mary' and Y is 'john'. The variables in the query need not be those in the definition. Prolog attaches no meaning to variable names other than the establishment of uniqueness in a clause or compound clause, within a clause all variable instances with the same name will be bound to the same value.

The present definition of wife only finds wives when these are in the first position of the spouse clause. This can be corrected by adding a second definition of wife:

wife(Wife, Ego) :- spouse(Ego,Wife), gender(Wife,female).

Any predicate can have any number of instances, either of a concrete form or defined in terms of compound clauses. When evaluating the query, wife(Wife,Ego), the first instance will be tried, and then the second.