The comprehensive Pennsylvania Historical Counties Dataset shapefile holds the polygons, metadata, and attribute data for every different configuration of every county or county equivalent in Pennsylvania, dated to the day, from the creation of the first county on 22 October 1669, through 31 December 2000. The Historical Counties Dataset, together with a number of supplementary cartographic data files and text files, enable users easily to employ a geographic information system for the analysis and display of county-related historical data.
First among the non-cartographic data files is the Pennsylvania Comprehensive Database (a tab-delimited text file that can be imported into a database or spreadsheet program), which provides descriptions of all known changes in state and county boundaries, changes in county organization and attachments, and changes in status and name, together with citations to the sources. These data include unmappable boundary changes, which usually means changes too small to plot as polygons at compilation scale, changes whose shapes could not be plotted at compilation scale (e.g., shift of a boundary line from the centerline of a road to one shoulder or the other), and changes that could not be mapped for other reasons (e.g., the location of the change could not be determined). In the Comprehensive Database, there is a separate entry for each county involved in each event. That facilitates assembling all the events pertaining to a single county.
In addition to the Comprehensive Database, there are five supplemental texts. These are: (1) a comprehensive County Index (includes proposed and extinct counties and non-county areas and provides cross references for name changes, with hyperlinks to corresponding individual county chronologies), (2) a Consolidated Chronology that organizes all the data by date, combining all the counties involved in an event into a single, composite entry, (3) a set of Individual County Chronologies, each one covering all the changes in a single county or equivalent, (4) a Bibliography that lists the primary and secondary sources found useful in the historical research, and (5) a Commentary on the research problems and materials that were remarkable or unusual in the process of historical compilation (Not every state requires a commentary.). A "Read Me" file introduces all these files and indicates how to get started with them.
Counties and their equivalents (e.g., parishes in Louisiana and independent cities in four other states) cover all the territory of the United States, function as repositories of valuable records, and long have been used as the geographic base units for the gathering of essential social, political, and economic data. The authority to create, change, or eliminate counties and to specify their functions lies with the states and their predecessors. In detail, the role of counties varies from state to state, but in every state they administer the judicial system and provide a great number of services. In the process, counties collect and preserve large quantities of information. For example: records of marriages, births, and deaths; probated wills; militia training; real-estate transfers; tax collections; welfare benefits; school programs; voter registrations; etc. Outside densely populated cities, counties have served as colonial, territorial, and state legislative districts and as the building blocks of congressional districts. In the nineteenth century they became the grassroots centers for the development of political parties. Moreover, counties have been the principal geographic units for the collection and aggregation of data from colonial/territorial, state, and federal censuses.
Unfortunately for researchers, the average county has changed size, shape, or location between four and five times. Therefore, knowing the present county of the place where a past event occurred may not be sufficient to find its official records. If county boundaries changed in the meantime, it is necessary to learn what county had jurisdiction at the time of the event to identify the courthouse where the record is stored today. If the reported population of a county changed from one census to another, was that because of an increase or a decrease in the number of people, or an annexation or loss of populated territory, or a combination of both? Trying to analyze county-based historical data without controlling for boundary changes is almost certain to yield errors and lead to false conclusions.
Plotting boundary changes of all counties together and in sequence, not merely reconstructing the counties at different points in time (e.g., dates of censuses) or concentrating on a single county at a time (thereby taking it out of the context of what happened to its neighbors), is an important aspect of the historical compilation process. Doing so gives the compiler valuable insight into how the counties developed and whether the intentions of legislators were realized in their enactments. For example, a law may say its purpose is to transfer territory from County A to County B, but the actual effect, visibly evident from the plot, may be to transfer territory from both A and C to B. When boundaries are plotted this way, gores (gaps between counties) and overlaps created accidentally by the legislature are readily apparent, and errors in plotting are discovered almost immediately. It is nearly impossible to detect such developments unless the counties are plotted together. Descriptive entries in the comprehensive database and in the chronologies reflect actual changes because they are written from the compilation plots, not from the laws alone or from secondary works.
One additional benefit of this approach is that it provides an automatic checking mechanism. When the historical compiler reaches the end of the development of the county network, the final version should be identical with the boundaries of the present county. If there is a difference between the completed compilation and the standard, current map, the compiler knows there is a mistake somewhere. Such a discrepancy is rare, but when one is discovered, the compiler reviews the compilation to find the source of the problem. Usually it is a matter of the compiler erring in the plot of a boundary or accidentally omitting some change, either of which can easily be corrected, but occasionally the fault is found on the current, federal map. When the error appears on the federal map, the boundary is plotted accurately and a brief explanation of the difference is added to the supplemental Commentary.
Problematic Data. Every so often, a state's law makers mistakenly overlapped the lines of two or more counties. Once such an overlap was detected, it seldom lasted long because dual jurisdictions generate only trouble, and states acted swiftly to eliminate them. This atlas treats areas of overlapping jurisdiction as distinct polygons and provides the usual data (e.g., start dates and end dates) for each one.
Much more common than overlaps are non-county areas, that is, areas not within the jurisdiction of any county. Sometimes legal boundary descriptions left small areas, known as gores, outside the bounds of any county. Such inadvertent omissions errors most often occurred in the early days of a state's history when boundary makers lacked knowledge of the state's topography. Sometimes, legislators purposely did not extend county jurisdiction over all of their state's territory as early as possible, but waited until they had a better understanding of the lay of the land and until the prospect of European settlement was closer. Under those circumstances, they often provided a minimum of legal and administrative services for each non-county area by formally attaching it to a fully operational county; later, when the area was ready for settlement or was already under development, the state created one or more counties from the non-county area.
This atlas aims to be absolutely comprehensive and, with a few exceptions (see next paragraph), to leave no "holes" in its historical and geographic coverage of a state. In practice, each state compilation includes all the territory within its bounds in 2000, regardless of what authority created or altered a county there, plus all other territory that may have been within the state's jurisdiction at an earlier time. Also, there are no empty spaces, no areas outside a named polygon. Each non-county area, whether an accidental gore or a region purposely set aside for future settlement, is represented by a polygon, the polygon is named (often merely as a non-county area with a number, such as NCA1), and a full set of data about it is entered in the database and the attribute file.
The exceptions to the "no-holes" policy described above are the large non-county areas in western Virginia, New York, and the New England states during much of the seventeenth century. In London and the other European capitals, officials had access to so little accurate information about inland territory that imperial claims and land grants, including colonial charters, often were incomplete or imprecise or asserted limits (e.g., the Pacific Ocean or "South Sea") that were so extreme as to be impractical to plot. Compilers treated those large, indefinitely bounded, and inadequately described, non-county areas as empty territory and made no attempt to represent them as coherent, historically complete polygons. Because the ArcGIS program requires that all polygons be closed, the compilers supplied estimated boundary lines to close polygons representing indefinitely extensive frontier counties and noted their action in the "Change" field.
Some changes have not been mapped because the change is too small to map, or the location is unknown, or both; for example, a law that transferred ten acres belonging to farmer Smith from one county to another would be unmappable because the parcel is too small to be mapped at the standard compilation scale or because the location of Smith's farm cannot be discovered. When the location of a change too small to map is known, the historical compiler marks the location and the digital compiler digitizes it as a point. All such tiny changes are collected in a separate shapefile, usually labeled [YEAR]_pt.shp.
Using the historical compiler's plotting overlays and associated material (e.g., notes, copies of the laws), the GIS compiler draws the counties in digital form. For digitizing, the program is ArcGIS 9.1, and the electronic modern "base map" is from the Digital Chart of the World (DCW) provided with ArcGIS by Environmental Systems Research Institute (ESRI), plus, as needed, such other data (often from another source) as the grid of the Public Land Survey System (PLSS). By repeating much of the procedure of the historical compiler, the digital compiler implicitly checks the work of her predecessor and occasionally finds line segments that must be corrected. As digitizing proceeds, data from the comprehensive database are entered into the attribute table.
After perfecting the boundary lines, the GIS digitizer assembles copies of all county polygons and attribute data into a single shapefile, the Historical Counties Dataset shapefile.
The locations of places and landmarks cited in the boundary descriptions are gathered from the modern, federal base maps or from secondary publications (e.g., gazetteers, county histories, articles in historical journals), old maps, or local experts.
Several steps are taken to insure the accuracy of the boundaries as they are manually plotted, and to maintain the precision of those plots as they are manually digitized. The digitizing process involves faithfully drawing the sketched counties using landmarks such as rivers, roads, and places. These positional data were obtained from ESRI's Data and Maps collection (1:100,000 scale). Once the initial digitizing is complete a master file is created and uploaded on IMS. When the digitizing is complete, the digitized polygons and their attribute data are once again checked for accuracy against the chronology for the state.
No regular or systematic updating of the pre-2001 data is anticipated because (a) the historical data cannot change and (b) the compilers believe their methods and materials are sufficient to produce data that are complete and correct. (That is not to say no error can slip through. Suggestions for ad hoc changes or additions to the historical data, together with an explanation of why the change should be made and supporting evidence, should be directed to scholl@newberry.org or Dr. William M. Scholl Center for American History and Culture, The Newberry Library, 60 W. Walton St., Chicago, IL 60610.) County boundary changes that occur after 31 December 2000 will routinely be digitized by both the state of Pennsylvania and the federal government and, therefore, will be available from agencies of those governments in separate files in the indefinite future.
The base map for this operation was the Pennsylvania map from the U.S.G.S. State Base series at the scale of 1:500,000. (The 1:1,000,000 version of the map was employed whenever smaller scale was appropriate or needed to plot large or simple changes.) The original strategy for the Atlas was to publish all states in book form before venturing to digitize the data, and the 1:500,000 scale maps were used in making the books. Before switching to all digital products and methods, about 80% of the states had been researched and compiled using this series of base maps, including 24 states published in 19 printed volumes. It was not practical to re-compile those data at a larger scale like 1:100,000. (See below, the next two process steps.)
As digitizing proceeds, data from the Comprehensive Database are entered into the attribute table. The process of entering attribute data entails an implicit review of the database and, if the greater map detail involved in working at digitization scale (see below) is different from the original descriptions, that may lead to updates of the database, including dates and version numbers and even descriptions of changes.
The compiler works "heads up," facing the monitor and using the mouse to draw lines against a background of the digital base map. The historical compiler's plots are not scanned and overlaid on the digital base map, nor does the digital compiler trace the earlier work on a digitizing tablet, because neither technique is as efficient or accurate as drawing the lines anew. One reason is that the scale for most of the historical compilations is 1:500,000 and the scale for digitization is 1:100,000. It is most unusual to draw a map at a larger scale than its source or early version, but in this case it was unavoidable because digitization did not commence until after nearly all the states had already been compiled at the smaller scale. In effect, the historical plots are a preliminary stage, and the plots from that work become the chief sources or guides (supported by the historical notes and copies of the legal descriptions, and other material) for the digital compiler who renders the final, detailed version of the boundary lines.
Based on the original historical county data, a table was created to specify, for each component polygon, the different counties to which it belonged and the time frames. The table was programmatically checked to verify that each component polygon was correctly assigned to historical counties throughout its life, with no unexpected gaps or overlaps.
The component polygons were then reassembled back into the historical counties, and converted to a shapefile. The resulting historical county shapefile consists of a large number of overlapping polygons; however, as a result of the topology check process, the subset of counties in effect at any selected date is topologically correct, with no unexpected gaps or overlaps. There are a number of known gaps and overlaps, however, due to legislative or surveying errors, and to conflicting territorial claims.
Because the FIPS system (see below) provides no codes for some extinct counties, no codes for non-county areas, and no codes for the colonies and territories that were predecessors of the states, it has been necessary to create a more comprehensive, alternative system of identifiers. The system adopted by the Atlas identifies each state and colony or territory with three letters, the first two based on the system of two-letter codes employed by the U.S. Post Office and the third indicating the status of the organization. (In most cases that is simply a C for colony, a T for territory, or an S for state.) For example, IAT stands for Iowa Territory and IAS for the state of Iowa. Some precursors of states need special ID codes, most of which are intuitively easy to read and to apply, especially in the context of a particular state's dataset. Examples are NWT (Northwest Territory, formally named Territory Northwest of the River Ohio), SWF (Spanish West Florida), FRS (State of Franklin), DKT (Dakota Territory), CRC (Colony of Carolina), and TXR (Republic of Texas).
Counties are identified by appending their names to the state codes, as in "KYS_Adair" for Adair County in the state of Kentucky. Non-county areas are abbreviated NCA; within a specific state they are differentiated from each other by adding a numeral to the abbreviation, as in "MOS_NCA1" for non-county area number 1 in the state of Missouri. Occasionally special codes are needed to deal with unusual historical situations, as in Vermont where the original Washington County, identified as "VTS_Washington01," became extinct and later the name was applied to another county ("VTS_Washington") that continues today. The county identifiers also have been created with an eye towards users who may wish to download and work with more than one state file for regions and want a comprehensive way to sort and select shapefiles or to link the attribute table to the comprehensive database.
Historically, almost every colony and territory transformed smoothly into statehood with no complications that might have required separate datasets for the state and its predecessors. The exception is Dakota Territory, which has its own dataset, and which split into a pair of states.