Python and Shapefiles

GIS and PyShp

A so-called shapefile is a directory or zipped directory that contains standard files used to represent a map and overlays in a Geographic Information System (GIS).

GIS displays are a great way to present complex data: various overlays can be applied to a map, and usually you can drill down to see further details.

GIS tools allow one to update shapefiles, but this could be extremely tedious for any significant amount of data.  Here’s where python comes to the rescue: there is a free, open source, python package, PyShp, that can read and update shapefiles.

If you have an Internet connection and pip, installing PyShp is easy – simply execute

pip install PyShp

Demo Data

For this demo, I used a shapefile from the National Institute of Justice’s Real Time Crime Forecasting Challenge.  A response to the contests includes a zip file of 20 shapefiles of Portland, Oregon’s police districts.  The NIJ provided an example submission to use as a template.  To help with a demo of PyShp, I used their shapefile 3MO/NAME_SC_3MO.

PyShp in Action

The best tutorial I have found for PyShp is the file at its GitHUB site.  But even with that help, it took me awhile to puzzle out how to access field names, how to associate them with shapes on the map, and how to update them.  The goal here is to spell out those answers.

For these examples, I’ve extracted directory 3MO from the SC (“street crimes”) directory of the NIJ example submission.

To get started, import PyShp and create an Editor instance for the shapefile:

>>> import shapefile
>>> # The shapefile name is the filename part, sans extention, of the
>>> # various files in the shapefile directory.
>>> e = shapefile.Editor('3MO/NAME_SC_3MO')

In PyShp, shapes are represented as a list of shape and record objects. The correlation between record and shape is the index of the lists, and the index itself is referred to as an “Object ID”.  In this shapefile, there are 17164 shapes, each with its own instance in e.records and the list returned by e.shapes().

>>> len(e.records)
>>> len(e.shapes())

Each element in the records list is a list of field values. Metadata for field values – the position, name, type, and length – is in shapefile.Editor.fields:

>>> e.fields
[('DeletionFlag', 'C', 1, 0), ['Id', 'N', 6, 0], ['hotspot', 'N', 4, 0], ['area', 'N', 12, 0]]
>>> e.records[1000]
[0, 0, 221879]

In the above snippet, we see that the defined fields for shapes are Id, hotspot, and area. For each record, Id is field 0, hotspot field 1, and area field 2. Object 1000 (the 1001st object) has an Id of 0, hotspot 0, and area 221879. (A value for the DeletionFlag does not occur in the record instances.)

The following snippet gets all the object IDs of shapes for which hotspot is 1:

>>> for i in range(0, len(e.records)):
...     if e.records[i][1] == 1:
...         print 'Hotspot set in object %d' % i
Hotspot set in object 45
Hotspot set in object 1027
Hotspot set in object 1200
Hotspot set in object 2199
Hotspot set in object 2243
Hotspot set in object 2688
Hotspot set in object 3340
Hotspot set in object 3342
Hotspot set in object 3928
Hotspot set in object 4148
Hotspot set in object 4734
Hotspot set in object 4956
Hotspot set in object 6068
Hotspot set in object 6690
Hotspot set in object 6936
Hotspot set in object 7439
Hotspot set in object 8012
Hotspot set in object 8909
Hotspot set in object 9471
Hotspot set in object 9588
Hotspot set in object 10034
Hotspot set in object 10791
Hotspot set in object 10792
Hotspot set in object 11730
Hotspot set in object 12723
Hotspot set in object 13287
Hotspot set in object 14085
Hotspot set in object 14701
Hotspot set in object 14702
Hotspot set in object 15547
Hotspot set in object 16165
Hotspot set in object 16912

An instance of Editor can be used to set field values as well as read them.

>>> for r in e.records:
...     r[1] == 0
>>> for i in range(0, len(e.records)):
...     if e.records[i][1] == 1:
...         print 'Hotspot: %d' % i

The Editor instance can commit changes in the shapefile by executing save :


As an example, let’s set the hotspot field to 1 for every Object ID less than 500 and save the change:

>>> for i in range(0, 500):
...     e.records[i][1] = 1

The result of loading this shape file into ArcGIS and selecting for hotspot equal 1 is shown below.  The yellow highlights mark the 500 teal-colored cells in which hotspot has been set to 1.


Believe me, writing that short for-loop in python is easier and more accurate than changing the hotshot fields with the UI.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s