Program:  maxhap
Purpose:  To estimate rho (4Nr) and gene conversion parameter f (=g/r)
	from phased haplotype polymorphism data.
Usage:  maxhap ... <datafile
where the "..." are command line arguments that are described in readme_dip,
the documentation for maxdip.

datafile is a file with one line of data for each pair of polymorphic sites.
The line for a pair of sites should have  the following 
   format:
site_id_i site_id_j  distance_apart u_or_ad n00 n01 n10 n11 n0q n1q nq0 nq1 nqq 
 
where site_id's are arbitrary labels (no white space within them) which are not
used by the program.  distance_apart is just the distance between the pair of sites
in whatever units you choose. (The rho estimate will be for the units you used for
distance_apart.)  u_or_ad is either u or ad.  "u" means that ancestral-derived 
status of alleles is unknown.  "ad" means that ancestral-derived status is 
known and "0" means ancestral and "1" means derived. So if "u" is specified,
which allele is designated "0" and which is "1" is arbitrary.  But if
"ad" is specified, the the alleles labeled "0" must be the ancestral allele.
 "n00" is the number of
gametes with the "0" allele at site one and the "0" allele at locus two. "n0q" is
the number of gametes in the sample where the "0" allele is present at site one,
and the allele at site two is unknown (missing data).  Etc.  If the number of 
gametes specified ( sum of the nij's ) are less than the sample size of the 
likefile specified, the program will augment nqq to make the total sample size 
correspond to the likefile specified.

So for example for two pairs of sites, datafile might look like:

1 2 1200. u  10 12 5 3 0 0 0 0 0 
3 4 3000. u  15 8 0 6 0 0 0 0 1

To estimate rho from these two sites, type these lines in a file called say, 
twopairs and then run maxhap as follows:

maxhap 1 h30rho  .01 50. .001 0. 10. 11 400  <twopairs 

The output should look something like this:

 npairs: 2
(fill this in).

I have written a crude utility to convert a set of haplotypes to a data file with
the pairs data.  The program is called exhap.  I wrote the program to handle
haplotype data stored in excel files, which were exported as text. (The format of
the data for this program is the same as that for exdip, except the first number is
the number of chromosomes, not the number of diploids.  The format
of haplotype files is as follows:

#gams #sites
pos_1 pos_2 ... pos_#sites
anc_id anc_1 anc_2 ... anc_#sites
gam_id_1 allele1_1 allele1_2 ... allele1_#sites
gam_id_2 allele2_1 allele2_2 ... allele2_#sites
...


#gams is the number of gametes surveyed.
#site is the number of sites surveyed.
pos_i is the position of the ith site (say in base pairs, or whatever units you like.)
anc_i is the allele which is ancestral  ( or ? if unknown.)

gam_id_j is an arbitrary label (no white space within it).

and allelei_j is the allele in the ith gamete at the jth site.  These can be 
  any string (no white space within it).  If the first character is a "?", "n" or "N", then
it is assumed the allele at this site is unknown (Missing data). If more than 
two distinct alleles are listed at a site, the program types an error message and
exits.


Example:
exhap <R4ithap | maxhap h30rho .01 50. .001 0. 10. 11  400  >R4it.out

should produce:


[rhudson@sparky hapdat]$ more R4it.out

exhap -u (explain)

