Data and Methods


DATA SOURCES

Property Tax Data

A FOIA request was submitted to the Cook County Treasurer in order to obtain property taxpayer data from the year 2023. This data includes the property PIN, address, owner name, and mailing address as recorded in the property tax bill for 1.87 million properties in Cook County.

Business Registry

The Illinois Secretary of State's business registry was used to gather data on Limited Liability Companies (LLCs) and corporations. The datasets included in the current report are the “Master” and “Company Name” files for both LLCs and corporations and the “Managers” file for LLCs only.

Property Parcel Characteristics

The Cook County Assessor's (https://ccao-data.github.io/ptaxsim/index.html) PTAXSIM database was used to obtain property characteristics for all of Cook County. The characteristics included whether a Homeowner Exemption was applied to the property tax bill, the property classification, and the property's assessed value. Parcels in PTAXSIM are identified by property PIN, making it possible to link these data to the Property Tax Data obtained through the FOIA request.

DATA CLEANING

A standardized data cleaning pipeline was established across all datasets. Many of the decisions made in the design of the cleaning pipeline were modified versions of Anthony Moser's cleaning pipeline for the Dese Guys project that utilized some of the same data (https://github.com/anthonymoser/deseguys). The raw datasets were downloaded and separately cleaned using the standardized pipeline before any merging across datasets occurred. For the corporation president data, additional steps were applied before cleaning because president names and addresses were included in the raw data under a single record with no clear separator between them and so had to be separated with additional algorithms.

DATA LINKING

In order to link records both within and across datasets, both exact string matching and fuzzy (i.e., probabilistic) strategies were used, across all names and addresses. We use the term entity to refer to any of the following: an LLC, an LLC manager, a corporation, a corporation president, or a taxpayer. The matching algorithm used was released as a separate python package -- mi-chainlink .

NETWORK ANALYSIS

After linking, we constructed a combined graph weighted by the strength of the matches between entities. On this, we apply the Louvain community detection algorithm followed by a custom recursive algorithm leading to communities being detected. The search displays these communities, if found, or falls back to showing the neighbors of the searched entity.