Background – Election Documents, Austin City Clerk, PDF, OCR, and more
Though Austin mayoral and council candidate campaign finance reports (CFRs) are available on the City Clerk’s website, the files are not easy to work with. Candidates bring printed copies to the Austin City Clerk, who scans the printouts and puts the resulting PDF files online for the public to view. The PDF files can be large, slow-scrolling, and difficult to read.
Unfortunately, there is not yet an electronic entry and submission process for Austin campaign data, though it has been discussed for years. One can find articles on that topic, going back well over a decade, from election transparency advocate and local Austin attorney, Fred Lewis. Such a process would allow for the creation of CSV or Excel files for the public to use, or an online database searchable by field name. It would eliminate the huge problems that come with relying on OCR’d documents.
In 2012, the Austin City Council passed a resolution to create an online searchable and downloadable database for election documents and data by 2013 – a requirement which is still incomplete. Check out this article for more information on that effort. Ultimately, the resolution was repealed and replaced by an alternative idea to get the State to open up their software, though that idea stalled without any progress.
Though there is no election database that allows searching by individual fields, there is a way to search all election data – campaign finance reports, treasurer appointments, and so on – using the city website. I keep reading articles saying you cannot search Austin’s election data, so I suspect many people are unaware of this capability. Later in this article I will explain how to do it.
Contribution Data Now Available In CSV Format
Even with the searchable interface the city provides, there is still the need for Excel files (or CSV files) that an individual can download and manipulate – sort, search, etc. While I have not been able to completely solve that problem, I have been able to create a partial solution that sort of works for contribution data (only) for some CFRs.
Below are links to .txt files that I have created for the 2014 election. Each file contains the contribution data (and only actual contribution data – no pledges, no expenditures) for all the readable (ie not handwritten) CFRs for all candidates during the given time period. (note that in this article, I refer to these files as either “the .txt files” or the “CSV files” or “CSV data” or some variation thereof.)
Here are the links to the .txt files:
Update: I have produced an xls of this data, saving people the headache of importing into Excel: all 3 contribution data files into a single .xls file (each data file as its own sheet)
In almost all cases, if a candidate starts off with a printed/typed report (which usually means he’s using the Ethics Commission software to produce his report), then he stays with that and does not switch over to handwritten reports at a later point. One notable exception is Jay Wiley, who is included in the July 15th .txt file, but then he went to handwritten and is not included in the later files. Perhaps he did that to obscure his data some, perhaps he just got lazy, or perhaps his computer broke – who knows.
CSV Data File Format (.txt file format)
Each of the three .txt files I have provided is actually a CSV file (but using the vertical bar “|” as the separator rather than a comma). Most of the fields should be self-explanatory. The DOCID field is the document ID# that the Clerk assigns. You can find this number by hovering your mouse over a document link on the Clerk’s CFR page. The PAGE field is the page number in the CFR PDF file that that data came from. CFR’s contain more than just Schedule A contribution data, so there will be pages in the PDF file that are not referenced by my .txt files (for example, the Schedule A data might be pages 4 thru 29 of a 40 page PDF).
I attempt to grab all the data fields for each contribution, except for the two checkboxes, which are too small to get (one is about out-of-state PACs and the other is about travel).
Errors, Stray Characters, and Gibberish In The Data
In order to get the data out of the Clerk’s PDFs and into a more usable format I reprocessed the PDF files. First, I re-OCR’d every CFR pdf file, in order to improve the quality of the Contributor Name data. I applied a number of filters to correct skewed documents and other issues. Even with those adjustments, there are many, many errors in some parts of the data, and periodic errors in all of the data. Sometimes the separator lines and boxes from the form itself can get OCR’d and mixed in with the candidate data. When this happens you will often see characters like these at the beginning or end of data: l 1 I i . , – ~ ‘
Despite the various data errors, most of the data is good enough to be useful. Or, with a few minutes of cleanup, you can significantly improve the data for any given report.
In order to help you deal with fields that have stray characters or are otherwise indecipherable, I have made available all of the CFR PDFs as individual pages. In other words, there a is a small pdf file for every page in the original PDF. I have placed these PDF files in a directory hierarchy that you can browse and drill down into:
Thoughts On Using The Data Files
You need to be a little thoughtful and careful in how you evaluate the data in a given .txt file. For example, you cannot just total up all the contribution amounts for a given candidate and expect it to be roughly accurate, because some of the submitted CFRs documents are corrected versions of previously submitted files. Thus, you would have duplicates of many contributions. Note that sometimes candidates submit corrected versions that contain all the submissions from the prior report, while other times candidates submit only the corrected one or two items (the delta). There is no consistency, so pay attention.
Here are a few examples of what you could easily do now using the .txt files, that might not have been so easy before:
- Just by quickly paging through the July 15th file and eyeing the amounts, I caught a couple of $49 contributions. $49 is an extremely odd contribution, so I stopped and looked closer and saw that Melissa Zone had listed a couple of “anonymous” contributions of $49 — apparently trying to get under a $50 reporting requirement that she has mistakenly interpreted to mean anonymous contributions are allowed as long as they’re under $50.
- While paging quickly through the data, I noticed Sheryl Cole’s latest 8-day report has many contributions over $200 that are missing the required occupation and employer info. This is something that almost everyone is guilty of once or twice, but nothing like the number of times Cole has violated that rule in that particular report — better fix it quick, Sheryl.
- Sorting by contributor name (and perhaps combining all 3 .txt files), may help you find violations of the $350/individual contribution limit. For example, you can easily see that Jim Arnold contributed $150 on the 0715 report and $227 on the 8day report, for a total of $377, to Katrina Daniel. Or, in the 8-day report is a $700 contribution from Mr. & Mrs. William Reagan, followed by a $350 contribution from William Reagan at the same address, which looks like it might be an over-limit violation. (Note that it is difficult to be 100% certain on over-limit violations – there is always the possibility that a parent and a child share the same first and last names, have the same address, and each contributed to a candidate, which is legal, but would look like an over-limit violation in the report.)
Election Data In The Future and Other/Alternative Efforts
I will continue to make these CSV files for each election until they are no longer needed. However, I think a better option is to incentivize campaigns to export and publish their own campaign data, which is easily done from the Ethics Commission software.
A group called Open Austin questioned several 2014 candidates about exporting and self-publishing, so the idea is out there, we just need candidate agreement and compliance. I propose creating an “Election Transparency” badge that campaigns can place on their websites, much like the my-website-is-secure type badges seen on some websites, as an incentive.
How to Search Election Documents on the City Website
Earlier, I mentioned that you can actually search Austin Mayoral and City Council candidate and election documents on the city website. First, go to the search page. (if that link doesn’t work, try this one). Scroll to the bottom and select the Municipal Election Documents radio button. Next, click Start Advanced Search. The resulting screen will be an advanced search form allowing you to search filed election documents.
On the advanced search form, you can narrow your search by form type and/or by candidate. Click the Help button to the right of each field to learn the syntax. For example, in the keyword field you can put something like Kath* <Near/3> Tov* which gets you variations of words within 3 words of each other, like Kathy Tovo, or Katherine Beth Tovo.
It is important to understand that when you are searching, you are searching the OCR’d values of words from the PDF documents. But, when you view the PDF, you are seeing the scanned image (the OCR’d text is not visible without copy/paste or searching). You need to get creative with your searching and use the advanced word-stemming/wildcard features. For example, “Hillco Partners” is actually read-in as “Hilico Partners” by the OCR process.
Till the Next Post
This is a long post and I still have a bunch more I want to say – particularly about this idiotic pledge that the more conservative/Republican/Tea Party Council candidates have been signing. That pledge is straight out of Grover Norquist’s playbook. If you want Austin to be run like the federal government, or one of the Republican Texas cities like Houston or Dallas, then vote for someone who signed that stupid pledge. But, I will save that rant, and a discussion about the Effective Rate, for another post…