What data do we have?

PakCaselaw includes all book-published Pakistan case law — every volume designated as an official report of decisions by a court within the Pakistan.

Our scope includes all provinces courts, federal courts, and special courts including landmark international court decisions.

Each volume has been converted into structured, case-level data broken out by majority and dissenting opinion, with human-checked metadata for party names, cause list number, citation, and date.

Scope limits

PakCaselaw does not include:

  • Cases not designated as officially published, such as most lower court decisions.
  • Non-published trial documents such as party filings, orders, and exhibits.

Digitization Process

We created this data by digitizing million pages of court decisions contained in roughly 15000 bound volumes.

Members of our team created metadata for each volume, including a unique barcode, reporter name, title, jurisdiction, publication date and other volume-level information. We then used a high-speed scanner to produce JP2 and TIF images of every page. A vendor then used OCR to extract the text of every case, creating case-level XML files. Key metadata fields, like case name, citation, court and decision date, were corrected for accuracy, while the text of each case was left as raw OCR output. In addition, for cases from volumes not yet in the public domain, our vendor redacted any headnotes.

Data quality

Our data inevitably includes countless errors as part of the digitization process. The public launch of this project is only the start of discovering errors, and we hope you will help us in finding and fixing them.

Some parts of our data are higher quality than others. Case metadata, such as the party names, cause-list number, citation, and date, has received human review. Case text and general head matter has been generated by machine OCR and has not received human review.

We particularly welcome metadata corrections, feature requests, and suggestions for large-scale algorithmic changes. We are not currently able to process individual OCR corrections, but welcome general suggestions on the OCR correction process.

Data citation

Data made available through the Pakistan Caselaw Project API and bulk download service is citable. View our suggested citation in these standard formats:

APA
Pakistan Caselaw Project. (2018). Retrieved [date], from [url].

Chicago / Turabian
Pakistan Caselaw Project. "Pakistan Caselaw Project." Last modified [date], [url].

Have you used Pakistan Caselaw Project data in your research? Tell us about it.

Usage & access

The PakCaselaw data is free for the public to use and access.

Case metadata, such as the case name, citation, court, date, etc., is freely and openly accessible without limitation. Full case text can be freely viewed or downloaded but you must register for an account to do so, and currently you may view or download no more than 500 cases per day. In addition, research scholars can qualify for bulk data access by agreeing to certain use and redistribution restrictions. You can request a bulk access agreement by creating an account and then visiting your account page.

The Pakistan Caselaw Project team cannot help with personal legal research problems or legal representation. Our data is valuable for scholarship, but it is a work in progress and is not kept up to date. Please do not rely on our data set to solve personal legal problems.