GIS and PyShp
A so-called shapefile is a directory or zipped directory that contains standard files used to represent a map and overlays in a Geographic Information System (GIS).
GIS displays are a great way to present complex data: various overlays can be applied to a map, and usually you can drill down to see further details.
GIS tools allow one to update shapefiles, but this could be extremely tedious for any significant amount of data. Here’s where python comes to the rescue: there is a free, open source, python package, PyShp, that can read and update shapefiles.
If you have an Internet connection and pip, installing PyShp is easy – simply execute
pip install PyShp
For this demo, I used a shapefile from the National Institute of Justice’s Real Time Crime Forecasting Challenge. A response to the contests includes a zip file of 20 shapefiles of Portland, Oregon’s police districts. The NIJ provided an example submission to use as a template. To help with a demo of PyShp, I used their shapefile 3MO/NAME_SC_3MO.
PyShp in Action
The best tutorial I have found for PyShp is the README.md file at its GitHUB site. But even with that help, it took me awhile to puzzle out how to access field names, how to associate them with shapes on the map, and how to update them. The goal here is to spell out those answers.
For these examples, I’ve extracted directory 3MO from the SC (“street crimes”) directory of the NIJ example submission.
To get started, import PyShp and create an Editor instance for the shapefile:
>>> import shapefile >>> >>> # The shapefile name is the filename part, sans extention, of the >>> # various files in the shapefile directory. >>> >>> e = shapefile.Editor('3MO/NAME_SC_3MO')
In PyShp, shapes are represented as a list of shape and record objects. The correlation between record and shape is the index of the lists, and the index itself is referred to as an “Object ID”. In this shapefile, there are 17164 shapes, each with its own instance in e.records and the list returned by e.shapes().
>>> len(e.records) 17164 >>> len(e.shapes()) 17164
Each element in the records list is a list of field values. Metadata for field values – the position, name, type, and length – is in shapefile.Editor.fields:
>>> e.fields [('DeletionFlag', 'C', 1, 0), ['Id', 'N', 6, 0], ['hotspot', 'N', 4, 0], ['area', 'N', 12, 0]] >>> e.records [0, 0, 221879]
In the above snippet, we see that the defined fields for shapes are Id, hotspot, and area. For each record, Id is field 0, hotspot field 1, and area field 2. Object 1000 (the 1001st object) has an Id of 0, hotspot 0, and area 221879. (A value for the DeletionFlag does not occur in the record instances.)
The following snippet gets all the object IDs of shapes for which hotspot is 1:
>>> for i in range(0, len(e.records)): ... if e.records[i] == 1: ... print 'Hotspot set in object %d' % i ... Hotspot set in object 45 Hotspot set in object 1027 Hotspot set in object 1200 Hotspot set in object 2199 Hotspot set in object 2243 Hotspot set in object 2688 Hotspot set in object 3340 Hotspot set in object 3342 Hotspot set in object 3928 Hotspot set in object 4148 Hotspot set in object 4734 Hotspot set in object 4956 Hotspot set in object 6068 Hotspot set in object 6690 Hotspot set in object 6936 Hotspot set in object 7439 Hotspot set in object 8012 Hotspot set in object 8909 Hotspot set in object 9471 Hotspot set in object 9588 Hotspot set in object 10034 Hotspot set in object 10791 Hotspot set in object 10792 Hotspot set in object 11730 Hotspot set in object 12723 Hotspot set in object 13287 Hotspot set in object 14085 Hotspot set in object 14701 Hotspot set in object 14702 Hotspot set in object 15547 Hotspot set in object 16165 Hotspot set in object 16912 >>>
An instance of Editor can be used to set field values as well as read them.
>>> for r in e.records: ... r == 0 >>> >>> for i in range(0, len(e.records)): ... if e.records[i] == 1: ... print 'Hotspot: %d' % i ... >>>
The Editor instance can commit changes in the shapefile by executing save :
As an example, let’s set the hotspot field to 1 for every Object ID less than 500 and save the change:
>>> for i in range(0, 500): ... e.records[i] = 1 ... >>> >>> e.save('3MO/NAME_SC_3MO') >>>
The result of loading this shape file into ArcGIS and selecting for hotspot equal 1 is shown below. The yellow highlights mark the 500 teal-colored cells in which hotspot has been set to 1.
Believe me, writing that short for-loop in python is easier and more accurate than changing the hotshot fields with the UI.