Corpus Linguistics

SLAV 20500/30500 (=LING 27340/37340)

MW 1:30-2:50, Cobb 201B

Dr. Steven Clancy <sclancy@uchicago.edu>

Потому что все оттенки смысла Because the intelligent number conveys
Умное число передает. all shades of meaning.
— Николай Гумилев — Nikolai Gumilev

Course Description

This course introduces the use of language corpora (large-scale electronic collections of authentic written and spoken language) in linguistic research from both soft (qualitative) and hard (quantitative) perspectives. Students will receive hands-on experience in corpus processing and data analysis and will learn how to work with existing corpora for their languages of interest as well as how to construct corpora of their own. Particular attention will be paid to the role of the corpus as a source of linguistic data and to the potential of corpus methods to enrich research in other theoretical frameworks, such as usage-based linguistics, construction grammar, cognitive linguistics, and functional linguistics. S. Clancy. Spring.

Syllabus

Here in PDF format.

Links

Corpus Linguistics Overviews

David Lee's Devoted to Corpora
A Survey of Corpora
Corpus Linguistics, a textbook by Tony McEnery and Andrew Wilson and a course site built around the book


Journals

Corpus Linguistics and Linguistic Theory
Corpora
International Journal of Corpus Linguistics
ICAME Journal
Computational Linguistics
Literary and Linguistic Computing
Language Resources and Evaluation (formerly known as Computers and the Humanities)
Computer Speech and Language
Empirical Language Research; cf. also Citeseer
ACL Anthology

English 
Corpora and Resources

British National Corpus (BNC)
American National Corpus
BYU Corpus of American English
Phrases in English (BNC)
Variation in English words and phrases (BNC)
Collins Bank of English
Corpus Concordance Sampler: Free Demo
Business Letter Corpus (BLC)
Corpus of Late Modern English Texts
ICAME (incl. Brown and Frown, LOB and FLOB, Helsinki, and others)
ICE
International Corpus of Learner English (ICLE)
Just the Word
MICASE
MICUSP
(Parsed) Corpus of Early English Correspondence
The Switchboard Corpus
Word Neighbours

Corpora and Resources in other languages

Slavic Language Corpora

Croatian: Croatian National Corpus
Czech: Czech National Corpus
Polish: IPI PAN corpus of Polish and Polish subcorpus of the ICLE
Russian: Russian National Corpus

German: Cosmas German corporaThe NEGRA Corpus, and the Leizpig Corpora Collection

Greek: Greek National Corpus

Hungarian:Hungarian National Corpus

Italian: La Repubblica Corpus

Portuguese: Corpus do Português

Scottish:Scottish Corpus of Texts and Speech

Spanish: Corpus del Español

Other Corpora and Resources

The CELEX Database
CHILDES
JRC Acquis Multilingual Parallel Corpus
Linguistic Data Consortium (LDC) (UofC has subsribed to many of these corpora)
Corpus-based Multilingual Dictionaries
TalkBank
WaCKy corpora

Build-your-own Corpora from large text collections

Moshkow's Library of Russian Texts
Old Russian Texts at the Pushkinskij Dom in St. Petersburg, Russia
Etext center at the University of Virginia
FullBooks.com
Oxford Text Archive (OTA)
Project Gutenberg
ReadPrint

Software, Statistics, Etc.

 See links at Stefan Gries' site