Tuesday, August 14, 2007

Area Weighted Join vs. Standard Join

Students from an advanced suitability analysis course this summer needed to create a report that specified the percent of the join layer that intersected the target layer. For example, they needed to calculate the percent of each use from a landuse shapefile that intersect each zip code in Texas. I wrote a quick script for the class that generated the report they needed, but the implications are astounding to me.

Here is what I mean by astounding. As a test, I calculated the median household income and the total population within a 1-mile radius around each dance club in Arlington, TX comparing the following methods: (1) Total population using ArcMap's standard spatial join tool, (2) Total population using an area-weighted summation, (3) Average median household income using standard spatial join, (4) Average median household income using an area-weighted average.

The following table displays the results.



Field A displays the dance club's name. Fields B, C, and D above display the difference between using the standard ArcMap spatial join tool and a weighted-average spatial join tool when calculating average household income. Fields E, F, and G display the differences when calculating the total population.

The differences in both cases are quite high. I am thinking of myself and all of the students who I have seen naively rely on the standard spatial join tool for these types of calculations. Wow...

Why is there such a difference?

An area-weighted spatial join between two polygons comes in two flavors, depending on whether ti is calculating an average or a sum.

If it is calculating an average, the formula is [area-percent] * [value] + [area-percent] * [value]... The most important consideration is the percentages of the join features that are within each target feature. For example, in a particular zip code, there might be 3 block groups. Let's further suppose that block group 1 comprises 50%, block group 2 comprises 35%, and block group 3 comprises 15%.

If it is calculating a sum, the most important consideration is the percent of the join feature that actually intersects the target feature. The formula is ( [% area intersects target feature] * [value] + [% area intersects target feature] * [value] ) / number of intersecting join features. This is why you will see a much larger error when using the standard spatial join tool for summations than for averages. If 2% of a block group intersects a zip code, the standard tool will include the entire population of the block group instead of only 2%.

Is This a Perfect Solution?

No. This assumes a perfectly even distribution within each join feature. It is, however, a huge improvement.

Where Can I Get the Script?

Download it here. Extract the compressed archive and you will see three Python scripts and an ArcGIS toolbox. Open ArcMap or ArcCatalog, ensure ArcToolbox is visible, and add the Spatial Join Tools.tbx (single-click).

Caveat: These scripts are first drafts and have not been tested on any systems other than the ArcINFO Desktop 9.1 & 9.2 systems here at UT Arlington. There is no documentation. Also, the scripts run on the slow side. Eventually these will be optimized, but at the current time they are presented as is.

Description of the three tools:
  1. Average Area Weighted: Use this tool to calculate an area-weighted average spatial join between two polygons.
  2. Sum Area Weighted Join: Use this tool to calculate an area-weighted summation spatial join between two polygons.
  3. Percent Area Report: Use this tool to generate a report that specifies the percent of the join layer that intersected the target layer.

5 comments:

Anonymous said...

Yeah, and no point MAUPing around worrying about that last problem :)

For some analyses introducing additional weighting factors such as zoning can help to re-distribute the data within the aggregated polygons.

loki said...

Do you have a script for the media. The scripts you gave were for the average and the sum. I have been having problems even using the standard tool in arcttool box to compute the median. Can you offer any suggestions?

Unknown said...

Just found this post and used the average area weighted join tool to figure out percent impervious cover - worked beautifully. I had to make a couple of modifications; it looks like 9.3 chokes if you don't aggressively delete variables, in particular cursor objects and row objects. Once I added those though - great tool.

Anonymous said...

Good Morning Children Ltd was first thought of in February 2006 with the company being launched in April 2006, closely followed by the website launch in Semptember 2006. It is run by Mike and Chrissie who have about 50 years of teaching experience between them, plus five academic years of providing our resources to the primary education market.  Our rapidly expanding business has led to thousands of classes in the UK and beyond regularly using our resources.
Google Search Engine
Creative Thinking Skills
Primary Teaching Resources
Thinking and Reasoning

Ken said...

Hey, I'm not sure if you are still around, but I really could use your script/tool here for a GIS research project. The link is broken =(

I'm reachable at kyeoh@brandeis.edu