Representing Anthropological Knowledge: Calculating Kinship
Michael D. Fischer
Analyzing and Understanding Cultural Codes


Kinship Introduction
Learning Kinship with
the Kinship Editor
Use the Kinship Editor
Kinship Editor Results

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Kinship Contents

Specification models

There are many different forms a specification can take and different models are used for specification (Burnard 1987). Specifying a program is basically the task of explaining exactly what a program must do to accomplish the research objections required of it. Specifications for computing applications are usually a conjunction of assertions in the form of data, ethnographic, imaginary or simulated, and operations on that data. These are two different things: any number of different programs can satisfy a single computing applications specification. If properly written, however, the specifications for computing applications can form a basis for specifications for program applications.

For the present purpose we will use an informal version of the interpreted predicate logic model (ibid: 66; Kowalski 1984). This model translates well into rules of the sort that are common and explicable to anthropologists (though sometimes much criticised). These sorts of rules are familiar to both anthropologists and their programming partners, and are relatively easy to implement in existing computer programming languages and relational database languages, especially the programming language Prolog which is used to implement these specifications as programs in the next section.

The anthropologist provides the material for analysis and the program which performs the analysis. Specifying research objectives required of the program can only be done by the person who posses such knowledge. Many a project has floundered because it was left to the unfortunate programming partner to make crucial decisions about a problem area of which they had only a marginal understanding.

The main problem inexperienced computer users have with specifying a program is the level of detail required. For example, in defining a basic relationship between two people chosen from the population (e.g. the data for a population input to the program) we might write:

If A is Parent of B then B is Child of A.

where A and B represent some two people from the population. Many people would be satisfied with this as a valid specification of the link between Parent and Child. But it is not logically complete relative to our conceptual schema in which the only case where B can be derived as a Child of A is when A is Parent of B . We have specified one of a possible many conditions for when B is a Child of A. To make this exclusive we can revise this statement to:

Iff A is Parent of B then B is Child of A.

where 'iff' is read, if and only if. This statement is now logically complete relative to our intention.

There is a 'positive' bias in programs which is commonly used to imply closure by not supplying data or rules which violate closure. Thus at the level of the program, if the only rule implemented for deriving Child is:

If A is Parent of B then B is Child of A.

then exclusivity is implicit. It is still best at the level of specification to indicate the exclusiveness to differentiate these cases from those where more than one specification rule has the same conclusion, such as:

If A is Father of B then B is Child of A.
If A is Mother of B then B is Child of A.

A first specification

In making the specification you write down the set of rules and data available which are consistent with your goals in an analysis and which do not 'compromise' the meaning and interpretations you intend. You should note the implied assumptions (iff) and identify terms which will be defined from initial data and terms which will be derived from these.

For convenience of exposition (and to avoid too many abstract symbols) the following example specification uses the English terms (without italics) for basic genealogical relationships, and upper case letters for variables which can denote any person from the population to be examined. Ego is e used to denote the person to whom the derived relationship is relative. Ego ranges over the same values as variables denoted by upper case letters.

I. The purpose of the program is to establish a list of people who stand in one or more genealogical 'positions' relative to a specified ego. The conceptual schema underlying the program intends to denote people and the kinship relationships they recognise as existing at some moment in time and space. The model also includes a genealogical space within which an etic position is established for each person.

II Data will be supplied for Gender , which ranges over the values male and female, and the relations Child and Spouse. The basic genealogical terms Father, Mother, Son, Daughter, Brother, Sister, Husband and Wife will be derived from these relations. More complex relations will be derived from these derived basic genealogical terms.

iff Ego is a Child of B then B is Parent of Ego.

iff A is a Child of Ego and A is male then A is Son of Ego.
iff A is a Child of Ego and A is female then A is Daughter of Ego.
iff A is a Parent of Ego and A is female then A is Mother of Ego.
iff A is a Parent of Ego and A is male then A is Father of Ego.

For the relation sibling we could try the following:

iff A is Father of Ego and B is Mother of Ego and A is Father of C and B is Mother of C then C is Sibling of Ego.

However, this specification of Sibling is not complete. When we are using variable terms, such as A, B, C, Ego above, whose values are drawn from the set of people under consideration, there is nothing to prevent the same person being assigned to two (or more) variables. This is not a problem for relationships which are not of the same type, but this definition for Sibling will result in a program which will agree that Ego is Sibling of Ego. To correct this use the amended statement:

iff C is not Ego and A is Father of Ego and B is Mother of Ego and A is Father of C and B is Mother of C then C is Sibling of Ego.

If A is Sibling of Ego and A is male then A is Brother of Ego.
If A is Sibling of Ego and A is female then A is Sister of Ego.

iff A is Spouse of Ego and A is male then A is Husband of Ego.
iff A is Spouse of Ego and A is female then A is Wife of Ego.

This specifies a means for deriving the basic etic links from a given data model, which might seem quite a bit of work just to specify what was known already. Except for fairly specialised analyses, for example, comparing the household structure of large populations, we would be unlikely to employ a computer just to represent the basic 'nuclear' relationships. The point of the specification so far is simply to provide the basic knowledge required to derive more distant genealogical links, which are not known or known only for a fraction of the population.We continue along the same lines as before:

iff A is Father of Ego and B is Father of A then B is Father's Father of Ego.
iff A is Father of Ego and B is Mother of A then B is Father's Mother of Ego.

iff A is Mother of Ego and B is Father of A then B is Mother's Father of Ego.
iff A is Mother of Ego and B is Mother of A then B is Mother's Mother of Ego.

iff A is Father of Ego and B is Brother of A then B is Father's Brother of Ego.
iff A is Father of Ego and B is Sister of A then B is Father's Sister of Ego.

iff A is Mother of Ego and B is Brother of A then B is Mother's Brother of Ego.
iff A is Mother of Ego and B is Sister of A then B is Mother's Sister of Ego.
iff A is Mother of Ego and B is Mother of A and C is Sister of B then C is Mother's Mother's Sister of Ego.

Although this can become rather tedious (computer applications are often tedious), it is one simple method to specify precisely the relationships and links you need to represent, and to ignore those with which you are not interested. Only relationships for which specifications exist will be identified. The exact extent of tracing relationships through the dimensions of generation, collaterality and affinity can be controlled. In situations where you are interested in more complex and far-ranging relationships more abstract specifications can be written.

III. A procedure to list the set of kin corresponding to the definitions set out in II, relative to a given ego can proceed as follows:

Initialise the data specified in II for Gender, Child and Spouse relationships for a specific population of people.

Set Ego to a specific person value.

For each specification rule in II, except Child, Parent, Sibling and Spouse, trace the links in the rule until it is either impossible to continue with the rule (no person exists in the population for a specific genealogical position relative to some link in the rule) OR until all the links in the rule are found to exist between the specific people n the population. If the rule is satisfied store the following information in a list: the derived relationship in the rule (e.g. Father's Brother), and a list of the people who satisfied the rule (e.g. 'Abdul, Mustaffa' for the Father's Brother rule if Abdul is the Father and Mustaffa is the Brother of Abdul). If the rule can apply more than once (there may be more than one Brother, Sister, Wife, Husband etc.) then continue until it can no longer be satisfied.

After all rules have been applied, output the list of stored people and relationships (on a computer terminal or perhaps to a computer data file).

Problems with specifications to be applied to data

The (quite informal) specification can be translated into a computer program relatively easily after some discussion with a programming partner, since plenty of detail is assumed. It has some flexibility for extensions. For example, modifications to which relationships should be reported can be easily effected. However, this specification is tied not only to a particular conceptual model, but also to a particular view or model of the data that might be available to it.

Although another specification and program can be devised to reformulate a number of varieties of 'real' data to the abstract data format of the example, the abstract model must be consistent with our objective: the description of of kinship relations within a (finite) population.

Although this specification is generally adequate for representing genealogical links within some ideal populations, there are some practical and logistical problems with its application to most data derived from ethnographic collections.

a) Genealogical data is often incomplete, and necessarily so because of the 'horizon' of a finite population.For example, we specified that siblings were people who shared two parents. This is fine so long as we never have the situation where one or both parents fail to be recorded. This is, of course, inevitable, because a genealogy must stop somewhere, and this is unlikely to be at a point where every person is an 'only child'.
b) Kinship data is not usually collected in the format set out in the specification.

The method recommended by Barnard and Good (1984:26-33) specifies a minimum set of 39 relationships to be elicited from a single consultant. Most ethnographers do not try the patience of their consultants by recording genealogical information from each member of a family or household unit unless there is a compelling reason to do so. It is therefore necessary to derive genealogies for the non-consultants who appear in genealogies from the collected genealogies.
c) The specification presented is not adequate to deal many kinds of genealogical data which anthropologists might need to represent There is, for example, no reference to relative birth order of the children of a union, the order of spousal unions, nor any means of referring to the status of a particular union (e.g. married, divorced, widowed, informal, adulterous). These can be corrected fairly easily by adding these attributes to the specification. However, more serious problems emerge, for example, when more than union have occurred between the same couple, and this is an emic factor in indigenous judgements.
d) Where the genealogical data is gathered from documents, or even from consultants, there are problems reliably establishing the identity and details of some (or all) of the people indicated in the data. In this case it may be necessary to allow more flexible evaluation rules to suggest contingent candidates for links. The specification in § 6.3.4 has no provision for contingent data.

Addressing these issues is not insurmountable, although d) is intrinsically problematic. Incomplete data from 'horizon' effects is often resolved by linking to imaginary people who are not part of the data set. Other solutions have required that bogus records be inserted in the data set in order to maintain the integrity of linkages. The relationship between the format of data required within the model vis a vis the form of the input data can be addressed by writing another program segment which translates the input data into the 'normalised' form required by the specification. The inadequacy of the specification can be rectified by examining in greater detail the requirements of representing genealogical data, and producing a specification with greater power. The problems of using genealogical data derived from documentary sources (and other sources of contingent data) is intrinsically problematic, but can be approached by a rather more flexible and broader range of abstract data models.

An example

Gilbert (1971) presents a framework which suggests reasonable solutions to the three problems in the previous section. Limitations of both hardware and software at the time dictated much of this framework. Currently, expanded memory capability on inexpensive computers and the development of relational models in software offer alternative solutions to some of the problems with his approach. An extended discussion of Gilbert's 1971 work presents a good example of a specification model and the interaction of hardware and software in research solutions. Current hardware and software developments suggests a different approach, one which will be discussed after Gilbert's material.

His model required the data to be input in the form:

Ego Sex Father Mother Sibling Spouse1 First Child Spouse 2 First Child ...
Input data record format suggested by Gilbert 1971:132

where, other than Sex, all variables (e.g. Ego, Mother, First Child) are unique references to people with the appropriate relationships. He addresses the incomplete data and 'horizon' problem by pointing all 'unknown' cases to a single computer-generated bogus record he calls 'Mr. Zero' (there is also a 'Ms. Zero'). What he is suggesting is more of missing data indicator which happens to take the form of a data record with all zeros. This solution imposes no real conditions on the user preparing a data set, and all such references are fairly easy to ignore later when the genealogical data base is interrogated.

Sibling refers to the next sibling in some order (usually birth), unless there is none, in which case this field refers to the first sibling in the order (which may be Ego). These references can be followed to construct the set of siblings.Spouse1 ... Spousen refers the spouses, in order, that this Ego has at the moment or (possibly) in the past. There is no status (e.g. divorced, married) for the marriage, although a status variable could be added after each spouse. First Child is the reference to the first child of each union. His data model determines the other children by following the Sibling reference. Gilbert is thus using a 'trick' of structure to encode both children and siblings and their relative order with relatively low redundancy. Each data record points to the First Child of each union, and each Ego points to the next sibling in relative order (with the last pointing to the first). With this scheme it becomes possible to recover the following additional information using the informal sketch method included:

a) Set of children of a union in relative order. Start with Eldest Child. Follow Sibling reference until Eldest Child repeats.
b) Number of children. As a) with count.
c) Set of siblings relative to ego. Follow Sibling reference until Ego.
d) Number of siblings. As c) with count.
e) Set of siblings in relative order. Look at Father record (if present). Find Mother (if present). Start with First Child. Follow reference, excluding Ego, until First Child repeats.
f) Elder siblings. As e) but stop when reference is Ego.
g) Younger Siblings. As c) but stop when Eldest Child is reached. Eldest Child is determined using method in e).

To add a person to a database in Gilbert's format is relatively straightforward, if a little complex: Ego's data record is inserted. If Ego is the eldest child of a union, Ego is referenced on both the Father's record and the Mother's record. Otherwise Ego is referenced in the Sibling field of the next Eldest Sibling's record. If Ego is married to one or more spouses, Ego is referenced in each of these records, and visa versa.

Gilbert's model was representative of the time in which it was devised, and it addresses most of our previous problems (excepting the fit of Gilbert's input data model with the collection methods of ethnographers).

The main problems with it are its complexity and fragility. It has a relatively complex structure for the data, The complexity cannot be alleviated to any significant degree; the conceptual structure the data represents is itself complex. The location of the complexity can be shifted however, from idiosyncratic structures in the format of data records to a more regular form, which will also make the structure less fragile.

The structure depends upon sibling reference, and it is relatively easy to misplace a sibling reference. A great deal of Gilbert's problem was working exclusively with what is often called a 'flat' data structure: all data relating to a case is aggregated in a single record relating to that case. Aggregated data is also more inflexible. When data is aggregated, in the form of cases, it is almost impossible to dis-aggregate the data and reform it in new social units, which limits the manipulations possible. (Anyone that has worked with census information and tried to go from household records to family records, or vice-versa, knows how very difficult this can be.)

I will propose an alternative form of recording genealogical information which is more robust, less fragile, and radically dis-aggregated--leading to greater ease in forming new social units with the data.

A more robust example

The fragileness of the referential structure can be improved, using a data model which is now relatively common, and which is compatible both with the interpreted predicate logic model, and with the relational model which is in common use for data base management. This compatibility is not coincidental, since the relational calculus upon which the relational model is based is an elaborated subset of the predicate calculus. An advantage of relations as a model of data is that data is disaggregated in a form which is easy to relate to its interpretation, and which can be present for some cases but need not be present for all. New data types (relations) can be added to an existing database with no changes being necessary (or desirable!) to data already present. The data can be structured in one form, and logically specifiable expressions can be used to create different 'views' of the data, if these views are a possible interpretation of the expression when applied to the database. For example, in the example in 6.3.5, from the basic predicates (or relations) in the data model, Child, Spouse and Gender a view of the data was created as the abstract data model using a number of expressions which specified the basic relationships Father, Mother, Son, Daughter, Brother, Sister, Husband and Wife.

Using this disaggregated structure for the input data model we can, for example, represent all the information in Gilbert's record structure, including the special constraints on sibling references, with the following set of three relations (or predicates):

This schema aims to record all information from the reference of a single ego, making no direct reference to other people, instead using indirect references to a Spouse_Set using a Spouse_Set ID, which is simply a unique identifier for a particular spousal union (or other kind of union if desired) and a Sib_Set reference using the Parent's Spouse_Set ID reference. Using this scheme only data for a given ego need be inserted into the data base at any particular time. Children can be found by looking for a Sib_Set record which uses one of Ego's Spouse_Set IDs. Siblings can be found independent of the presence of parents by locating all people who share Ego's Parent's Spouse_Set ID. This can be particularly useful if it is necessary for analytic or ethnographic reasons to separate out children allocated to the same parents in different unions, or where the union status has changed, by assigning the two unions (or union statuses) different Spouse_Set IDs, for example children born to a union which existed before a formal recognition of union, from those born after the union was recognised. The primary benefit for all cases is that bogus parent records or spouse records need not be constructed; only an ID need be assigned.

A minimum record for a given unrelated ego (in the data) would involve the input of two data elements, Ego and the Gender value for Ego. A once married Ego with parents involves eight data elements, Ego three times, the Gender of Ego, the Spouse_Set ID (a name given to this particular union), and the Parent's Spouse_Set ID (the ID of the parent union), along with ordinal indicators for the Spouse_Set and Sib_Set. Additional spouses will require three additional data elements each. Gilbert's model required only seven per case, plus two elements

Relation Name Key Value1 Value2

Gender Ego {male,female,unknown}
Spouse_Set Ego Spouse_Set ID ordinal of union
Sib_Set Ego Parent's Spouse_Set ID ordinal of sibling position

per additional spouse, but he was encoding both the order of spouses and the order of siblings by the position of the data records, an error prone process in which the errors are difficult to detect. All 'horizon' cases would require the creation of bogus records to provide sibling links.

An objection that might be lodged against the newly proposed input data model is that it is not 'natural' or reflect ordinary analytic intuitions. There are two possible defences. Firstly, there is analytic support for the structure, since it mirrors fairly accurately the conventions used in genealogical diagrams, establishing a set of spouses and set of siblings and a link between them. Secondly, it is only the input data model. It is not necessary to provide raw data in this form. Rather, the input data model must be adequate to represent the information in the raw data, and compatible with the abstract data model used internally, that of the eight basic relationships.

The abstract data model of the specification for basic relationships could be rewritten (briefly) to incorporate this new input data schema as:

parent iff Spouse_ID of A is Parent's Spouse_ID of Ego then A is a Parent of Ego.
father iff A is a Parent of Ego and Gender of A is Male.then A is Father of Ego.
mother iff A is a Parent of Ego and Gender of A is Female.then A is a Mother of Ego.
sibling iff A is not Ego and Parent's Spouse_ID of A is Parent's Spouse_ID of Ego then A is a Sibling of Ego
brother iff A is a Sibling of Ego and Gender of A is Male.then A is Brother of Ego.
sister iff A is a Sibling of Ego and Gender of A is Female.then A is Brother of Ego.
child iff Ego is a Parent of A, A is a Child of Ego.
son iff A is a Child of Ego and Gender of A is Male.then A is a Son of Ego.
daughter iff A is a Child of Ego and Gender of A is Female.then A is a Daughter of Ego.
spouse iff Spouse_ID of A is Spouse_ID of Ego then A is a Spouse of Ego.
husband iff A is a Spouse of Ego and A is Male then A is a Husband of Ego
wife iff A is a Spouse of Ego and A is Female then A is a Wife of Ego

These definitions are at least as 'natural' as the previous definitions, and in the new specification we can incorporate all the additional information of Gilbert's specification:

a) Set of children of a union in relative order. Set of people where Parent's Spouse_ID of their Sib_Set is Spouse_ID of Ego's Spouse_Set in order of their Sib_Set ordinal value.

b) Number of children. The number of people where Parent's Spouse_ID of their Sib_Set is Spouse_ID of Ego's Spouse_Set.
c) Set of siblings relative to ego. Set of people with same Parent's Spouse_ID as Ego, except Ego
d) Number of siblings. Number of people with same Parent's Spouse_ID as Ego, except Ego
e) Set of siblings in relative order.Set of people with same Parent's Spouse_ID, except Ego in order of their Sib_Set ordinal value.
f) Elder siblings. Set of people with same Parent's Spouse_ID, with Sib_Set ordinal value greater than Sib_Set ordinal value of Ego.
g) Younger Siblings. Set of people with same Parent's Spouse_ID, with Sib_Set ordinal value less than Sib_Set ordinal value of Ego.

Comparing the descriptions specified here to those required as a consequence of Gilbert's data model, note that these are defined entirely in terms of structural relationships between data elements, not in terms of the procedures required to identify the relationships, which requires a greater familiarity with the way a particular computer language works than the former, which is defined in terms of the data model, which must be understood by the designer (researcher) in any case.

Another advantage of the schema is that it is fairly easy to include additional data for all cases or selected cases, without altering the other items in the database. For example the following information on the beginning and ending of unions can be included:

Union_Begin Spouse_Set ID date of union {married, informal, etc.)
Union_End Spouse_Set ID date of end {widowed, divorced, etc.)

By specifying union and its status (with marriage one possible value) traditional anthropological issues, such as matrifocality, can be ignored or further explored. This structure provides information on each Spouse_Set, assuming that this information will be agreed by all partners in the union. If there is likely to be a difference of opinion about the status or dates, then Ego must also be included in the structure. Among the Nayar, for example, partners to the union would assign a different status to that union. (Dumont, 1964) this is not a business database, where things must reconcile. (Lack of reconciliation is one of the most interesting problems in the field, in fact, and must certainly be accommodated.)

Union Ego Sid Doe Status

Diachronic comparisons are possible, if reasonably accurate information is available, and the addition of year of birth, year of death, beginning of union and end of union dates are included. The scheme can be extended with additional information on each Ego, such as land ownership, or on other structures necessary for the particular analysis, such as clan, lineage or section attributes.

Next section: Programming Kin