These instructions provide an example of how to run RAPID to obtain both estimates of the number of alleles shared IBD and to do linkage analysis. The files that will be used for input are: ex.pedigree - A sample pedigree file ex.geno - An example genotype file ex.geno.freq - A frequency file with the true allele frequencies ex.study - A list of "affecteds" ex.idcoefs - Identity coefficients for all quasi founders and descendants In addition, you need to specify an output file, which I will call 'ex.ouput' in this example. When doing an analaysis, you can either specify the frequency yourself, if you have a good estimate of the frequencies in the founding population, or you can have RAPID estimate the frequencies from the data. To make RAPID estimate the allele frequencies, input '0' for the name of the frequency file. The example below assumes we do not have an a priori estimate of the founder allele frequencies and we want RAPID to estimate them. In general, you can tell RAPID the name of the required input files listed above via command line arguments, or by placing the names on separate lines, in the order they are listed above, in a separate input file and giving this file to RAPID to read. In the examples below, I will use this latter method with the file 'ex.input'. If you prefer to use command line arguments, you would use the following arguments and file names (note: it is required that there be a space between the 'flag' (e.g. -p) and the file name): -p ex.pedigree -g ex.geno -f ex.geno.freq (or '0' to use estimated frequencies) -s ex.study -d ex.idcoefs -o ex.output Below, the command line prompt is represented as '%' and everything following the prompt is something you would type in. Step 1 ====== The first thing to do is to get all the necessary identity coefficients. To do this, you may use the 'idcoefs' software package downloadable from my website (http://www.genes.uchicago.edu/abney.html) or any other method that you see fit. I will assume that you have installed the 'idcoefs' package. A. Get the list of id's for which you will need identity coefficients: % rapid -q < ex.input This will create a file called 'ex.output.clist' which has one id per line. You need to get the identity coefficients for all possible pairs of these id's: % idcoefs -p ex.pedigree -s ex.output.clist -o idcoefs -r 500 The number '500' tells idcoefs that it can use up to about 500 MB of RAM. For this example it is not necessary to use more, and you may use less, if your computer has less free RAM available. The output is the file 'idcoefs' which should be identical to the 'ex.idcoefs' file already supplied. Step 2 ====== There are two types of computations that may be done. To compute the number of alleles shared IBD for each marker, for the pairs specified in 'ex.study', go to step 2.A. To do linkage analysis using the S_pairs statistic go to step 2.B. In all the analyses an output file called 'ex.ouput.mend_errs' is created listing the markers for which a Mendelian error was detected. In this example this file should be empty. Step 2.A ======== We want the number of alleles shared IBD at each marker for all the pairs specified in 'ex.study'. We will also ask RAPID to estimate the allele frequencies at each marker. To do this use the '-k' command line option: % rapid -k < ex.input This creates files 'ex.output.nshared' and 'ex.output.spairs' which should be identical to the files of the same names in the Output_files directory. If you want to use the true allele frequencies, edit 'ex.input' and replace the line that specifies the allele frequency file, currently with a '0', to read 'ex.geno.freq' (without quotation marks) and execute the above command. Step 2.B ======== Here we want to do linkage analysis with the individuals in 'ex.study' considered as affeteds. In the first part we first do a scan over all the markers to find the best approximate p-values. We then choose the two markers with the smallest p-values and compute an empiric p-value for these markers. 2.B.i Approximate p-values: We need to specify the number of simulations used to create the reference distribution. Since these are just approximate p-values 10,000 should be plenty. We will also use the default value of 50 simulations to estimate the bias and variance and allow RAPID to estimate the allele frequencies. % rapid -u 10000 < ex.input This will create files 'ex.output' and 'ex.output.spairs' which should either be identical or nearly the same as the files 'ex.output.apprx' and 'ex.output.spairs' in the Output_files directory. Ideally, the results would be identical because of a file called 'seed' in the Example directory which specifies the random number seed to use. Without this file, RAPID would use the system clock to seed the random number generator resulting in somewhat different estimates of the p-value. Different platforms, however, may come up with different random values even with the same seed, so you may see slightly different results. Hopefully, at least the ranking of markers based on the approximate p-value (the second column) will be the same (or nearly so). Regardless of the random numbers generated, the computation of the S_pairs statistic should be unaffected. The value of the statistic is in the column labeled 'raw_spairs' of 'ex.output' and should be the same as the values in 'ex.output.apprx'. 2.B.ii Empiric p-values: Looking at the approximate p-values in ex.output (in the second column, labeled aprx_pval) we see that marker1 and marker7 have the smallest values. So, we created a new genotype file called 'ex.geno.emp' which has the genotype data only for these markers. We also use a different input file 'ex.input.emp' which lists this new genotype file and a new output file 'ex.output.emp' (so that our original output file is not over-written). We also need to specify the number of simulations on which to base the empiric p-values. Since we chose from only ten markers and have rather modest p-values, we will only do 1,000 simulations. If these markers were selected from a genome wide scan, it would be more sensible to do something like 100,000 or more simulations. % rapid -u 1000 -e < ex.input.emp This results in a file 'ex.output.emp' which should be either identical to or nearly the same as the file of that name in the Output_files directory.