Overview Specific Files
General Structure More on Structure

 

Overview

The UCDS is a multi-wave study of the participants from 60 groups first surveyed in 1974.  This data set is complicated by the fact that in the first three waves, the group determined the sampling frame for individuals, while in the latter waves, the individual in the sampling frame were retained whatever their relation to the original group.  This document describes the sampling frame over time.  There were 7 different waves of data collection, the last still in progress. These may be grouped into three “long cycle” waves, each with “short cycle” waves within.  We denote the thee “long cycles” as I (1974-6), II (1984-5) and III (2000).  Within these, IA was 1974, IB was 1975, and IC was 1976.  (We note that Boston data collection lagged the other groups for all of wave I by 6 months.)  In some cases, data was collected for more than one I wave, and in other cases, it was not.  Thus for some purposes, it is preferable to construct a single wave I, and in other cases, to keep the “short cycles” separate.  Wave II also had two sub-potions, though here there were differences in the items asked.  Wave III has two sub-portions, IIIA being a mini-wave with minimal data collection designed to track respondents for wave IIIB.  The first wave of data (IA) is currently being made public; we hope to clean and publish other data if there is sufficient interest and resources; we go on to discuss the larger sampling frame; someone only interested in this first wave of data may skip to the GENERAL REMARKS.

 This document focuses on the data structure as it appears to the secondary analyst—if you would like more information about the construction of the sampling frame and the instruments used, follow this link to THE DATA SET AND SAMPLING FRAME.

 In the wave IA, 60 groups were chosen, and the membership assessed.  All those members who were in the group at the time entered the sampling frame.  By wave IB, three types of changes may have occurred. 

  1. A group may have dissolved, and its members hence left the sampling frame.

  2. A member in a group may have left the group, and hence the sampling frame.

  3. Someone not a member at wave IA may have joined the group, and hence entered the sampling frame.

The exact same considerations hold for the difference between wave IB and IC.  In general, those who had left the groups were not in the standard sampling frame, though in some cases, they were recruited for a special questionnaire given only to ex-members.

In all of the I waves, the survey of the Boston groups took place 6 months later than the survey of the other groups.  Consequently, there are sometimes changes made to the survey instruments in between these two halves of the year, and Boston groups were used to try out improvements in wording or new items.  For this reason, we distinguish between Boston and the other “Five Cities” in codebooks:  an item may be marked “Boston only” or “Five cities only.”

By the II waves, the majority of groups had dissolved, and all those who had been members in any of the I waves were considered part of the sampling frame.  This also holds for the III waves.  Thus the following grid shows a number of possible trajectories that groups and members might have taken through the data collection process, where group 1 dissolved between waves IB and IC, while group 2 did not dissolve until between IC and IIA.

Group

Person

IA

IB

IC

IIA

IIB

IIIA

IIIB

1

1

IN

OUT

OUT

IN

IN

IN

IN

1

2

IN

IN

OUT

IN

IN

IN

IN

1

3

OUT

IN

OUT

IN

IN

IN

IN

2

4

IN

IN

OUT

IN

IN

IN

IN

2

5

OUT

OUT

IN

IN

IN

IN

IN

 Person 1 left group 1 before it had dissolved, while person 2 was a member until dissolution.  Person 3 only joined the group after data collection was started.  Person 4 left group 2 before it dissolved, while person 5 only joined two years into data collection.  There is a difference between persons 2 and 4, though each has the same pattern of collected data, in that person 4’s group is still in the sampling frame at IC, while person 2’s group is not.

 

 General Remarks on the Data Structure

Person and Group Identification

            Each person in the data set is given a unique ID number; in general, a block of numbers tends to be found within the same group, but this is not always the case.  The original numbers were three digits; the current numbering system is four digits (to allow for the incorporation of additional information discussed below).  This variable is in every record at the individual or dyadic level, and is named ID for an individual level file and EGO (or ALTER) for dyadic level data.  All of these numbers are three digits, though for some special cases a fourth leading digit has been appended that contains information as to the special nature of this person (usually a person dropped from the sampling frame for some reason; see below on special cases).  Each group was given a two digit ID number.  This variable is found in every record and is named GROUP.  The original GROUP and individual ID numbers have been randomized to help guarantee anonymity of our respondents.  These two numbers may, for any record, be combined into a single six-digit ID number in which the first two numbers are the group id and the second two the person id. 

In some cases, a person moved from one group to another during the course of the study.  In such cases, any record for this person has the group which she or he was in at the time of answering.  For this reason, it is necessary to preserve a distinction between six-digit and four-digit identifications, to allow for the matching of persons with the same four-digit identification across groups.

  

Wave notation.

Because many programs limit variable names to 8 characters, we were reluctant to use two of them to denote which wave the data at hand are from.  Hence for any variable, the last character is a number indicating the wave, which can be retranslated to the terms above as follows:

  1. Wave IA

  2. Wave IB

  3. Wave IC

  4. Wave IIA

  5. Wave IIB

  6. Wave IIIA

  7. Wave IIIB

 

Special Cases:

            In a number of cases, data was collected from persons in 1974 who were then dropped from the sampling frame.  However, we considered the data sufficiently interesting that it should be preserved, though not all analysts will want to use them.  There are three types of special cases.

1)  The first is whole groups that were eliminated from the sampling frame.  There were two reasons for such elimination

a)     The group had a membership over the cut-off point for inclusion when the data collection began (five unrelated persons) but then dropped below this point; or

b)    Membership in the group turned out to be largely involuntary (rehabilitational groups which a judge might offer as an alternative to jail time for drug addicts)

2)     Persons who gave data but were later determined not to qualify as members because

a)     They were children; or

b)    They had only been staying briefly in the group;

3)     Persons who had left the group and did not give data (or at least not vis-à-vis the group in question), but whom other group members treated as present when completing dyadic level questionnaires.

 

Since some members treated as “not in the group” one year might join the next, and some persons who were determined to have left the group were later interviewed as ex-members, these “illegitimate” data may be of great interest.  The GRANDMAP file (and the discussion of it below) contains further information on these persons and how to locate them.

 

Some persons moved from one group to another between IA and IC; these persons appear for each wave in the group they were in at the time; this does not affect analysis of the wave IA data currently made public.  One person (663), however, moved during wave IA, and hence has valid data for two groups (60 and 63), which affects the merging of dyadic and individual level data.  See the GRANDMAP file for details.

 

Specific Files

There are a number of types of files that you may download from this site.  They are as follows:

1)     Data files—all files are stored as SPSS 9.0 For Windows.  If you do not use SPSS nor have you a conversion utility like STAT-TRANSFER, please contact us.

2)     Statistical programs for the analysis of these data.

3)     Mapping Files—these are in the form of Excel files.  Only one is currently needed, which concentrates all information that might be useful in mastering the data set; this is called the GRANDMAP file.

4)     Text Files—these contain

a)     descriptions of the data set,

b)     descriptions of each group, including identification (by ID number only) of key members. 

* NOTE:  this file is not on the web site.  While no groups are identified by name, we still consider this information sufficiently sensitive that it will be given only to researchers with both clear departmental affiliations and clear research goals.

c)     codebooks for specific data files;

d)     manuals for the programs.

Note that text files are bundled with the data or program they accompany, and are downloaded as ZIPPED archives.

 

Here we go on to describe the data files; first we make a number of general notes that apply across data files:

1)     MISSING VALUES ARE DECLARED.  In the files you receive, the values in the codebook indicated as “missing” are set to missing.  You must change this if you wish to examine these responses.  Note that in some cases, the missing values still contain a great deal of information as to why a response is missing (respondent did not fill out questionnaire, respondent did but did not answer this question, respondent gave uncodable response, etc.).  More specific information about missing value treatment is contained in each codebook.

2)     Cases are unweighted.  If an analysis of the dyadic level is examining symmetric variables, each dyad will be counted twice.

3)     We have attempted to keep the data files as small as possible, and have hence refrained from adding various indicators that might prove useful, such as whether some person gave any valid relationship data for some type of question.  However, such indicators can be easily constructed, and we note in the codebooks where we think they might prove useful.

 

Wave 1 Dyadic Data

            Each group has N members for purposes of this file.  Note that this number may be different from the number of persons with data in the Individual level file for this year.  For this group, there are N(N-1) records, one for every ego/alter combination excluding self-reference.  Even if there is no data from some person, all N-1 records will be present in the data file, though all variables will be missing.  Note that for each group of two, there are two records—hence all dyads are considered directed.

             As mentioned above, certain persons are included in this file, both as EGO and as ALTER, that are excluded from the larger sampling frame.  A dummy variable INCLUDE is 1 only for those who were judged to fit all criteria for inclusion in the sampling frame and zero otherwise; filtering or weighting by this removes all the others.  Further, there are a few cases in which the same person had valid data in two groups (having resided in each for a portion of the year).  This person is found in each group, having the same EGO / ALTER number, but a different COMMUNE number.  Further information about these cases is found in the GRAND MAP file.

 

Wave 1 Individual Data

            These data include basic social variables (age, sex, religion) and an array of attitude and belief items.  Other data at this level that has not been cleaned is discussed below.

 

Wave 1 Group Data

            These data come from the codings of the original observers.  As these codings are laborious to document, we include only the most important ones for secondary analysts.

 

Other data files

            Two files exist to orient the secondary analyst and facilitate complex uses of the data (such as combination of data at different levels of analysis, longitudinal research as other waves are made available).  The first of these is termed the “MERGE KEY” file, and exists as an SPSS system file.  This is an individual level file, indicating what group each person was in for each wave, and has flags for the existence or non-existence of various forms of data.  With this information, it is possible to merge one file with another and ensure that each person is only counted once (if this is desired) or that any individual’s characteristics go to all the groups that she was associated with (for persons in more than one group).  The codebook for this file contains further information.            The second file is in the form of an EXCEL for WINDOWS text file; and is described above. 

            After this general information, there is a listing of all persons (given only as ID numbers) and any bits of important information associated with that person, such as whether he was dropped from the sampling frame for some reason, whether there has been confusion as to this person’s identity in the past, and such like.  Finally, significant relations that would be cumbersome to express as a set of dummy variables are often noted (e.g. these two people are cousins or are married).

 

Further Remarks on the Structure of the UCDS

Levels of Data

The structure of the data set is complicated by the following three considerations:

1)     There are three levels on which data can be collected, namely the individual (I), the dyadic (D), and the group (G).  Further, data collected on one level may be manipulated to produce data on another.  We will call data that is collected on this level “original” data (O), as opposed to “constructed” data (C).  For example, original individual (OI) level data include beliefs and attitudes, life history, and current commune position.  OD data include how much time the members of a dyad spend together, whether they think well of one another, and whether they are married.  OG data include observers’ codings as to the commune’s ideology, formal structure, and age.  CI data might include how many others choose some person as influential (constructed from OD data); CD data might include whether two people entered the group at the same time (constructed from OI data); CG data might include the degree to which all persons agree on some belief (constructed from OI) data.  

2)     We provide a key using the notation discussed in section (1) as to what data was collected for which groups in which periods.  Bearing the difference between O and C data, the reader will note that CD data is present even when OD is not; this may affect whether waves are combined or kept separate.  This information is stored in two SPSS files, one at the individual level, and one at the group level.  We also have two WORD files that contain information about each respondent, one organized by respondent ID, and the other by group.  These contain information that might be of interest to secondary analysts, but are better presented verbally than through codings.  For example, in the file organized by number, it is possible to quickly note when a particular person was first assumed to be a member and later dropped from the sampling frame (such cases are extremely interesting for understanding recruitment); the file organized by group also provides descriptive information about each group, again oriented to providing the secondary analyst with key information otherwise difficult to recreate.  For example, group 1 was formed by a married couple; these are identified in the WORD file, as well as other crucial interpersonal dynamics that may have affected this group. 

3)     The initial sample can be considered to consist of persons or of groups.  It was originally seen as both.  By the second wave, some original persons were no longer in the original groups, and they had been replaced with others.  The overall 25-year panel follows all the first persons identified in long cycle I.  So persons who entered in IC are also in the sampling frame, as well as persons who entered in IA but left before IC.   Hence there are persons in the sampling frame from the same group who could not have had relations with one another.

 

Other Data

Here we describe other sections of data that are not public but may be of interest to persons conducting certain types of projects.

 Ex-members:  Interviews were conducted with persons who had left the group between wave IA and wave IB; these interviews included information on why respondents had left.  The wave IB relationship questionnaire also asked members to report on contact with ex-members.

 Transcripts of long interviews:  There were in-depth interviews conducted with key informants in each group—these interviews were taped recorded, and many transcribed, although they have not been edited to remove names and identifying information.