My Letter in Support of Increasing the Transportation Sustainability Fee on Large Office Developments

Dear San Francisco Board of Supervisors, I urge you to support ordinance #180117 to increase the TSF by $5/sq.ft. for non-residential projects larger than 99,999 sq.ft. Instead of exempting the…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




How to find the interior centroid of US counties using BigQuery GIS

How to handle the problem of centroids being in the Great Lakes

with Eric Schmidt

Trying to plot county-by-county confirmed cases of COVID-19 data using a just-launched BigQuery dataset, we noticed a problem. Overall, the map seems fine:

In the case of maps, though, details are important. Every user of this map will zoom into to where they live. Any user who lives on the coastline of Upper Michigan will immediately notice a problem:

Note that their county’s data is in the water! Why is that?

It’s because US county boundaries go to the state or country boundary. Normally, that’s not a problem, but here, the border is the maritime boundary and is located halfway in the Great Lakes. Here’s another example, this time on the US-Canadian border:

When you create maps like the confirmed cases of COVID-19, you want the marker to be where people live, not out in the water. In this article, I will show you how to create a map marker that is located in the centroid of the land part of the county.

First idea is to do a spatial join against a dataset of coast-line boundaries and then find the intersection (common area) between the county boundaries and the coastline. This should give us the land areas only.

BigQuery has a public dataset of US geo boundaries, so we can do that:

We can visualize the result using BigQuery GeoViz and this tell us that the method didn’t work:

Notice that the technique worked beautifully for the islands (because the islands are polygons), but not for the other counties because in general, the US coastline consists of multilines, and not polygons.

US Zipcodes are actually a bounding box for a collection of postal routes. Since the post office will not deliver mail in the water, we can treat zipcode boundaries are comprising population areas.

Let’s look at what the zipcodes look like for each county:

We notice that some zipcodes (e.g. 49684) cover several counties:

So, we need to split the county field and put 49864 into every county. There is another problem. In some cases, the county name is written as Ostego, Ostego county, or ostego County. So, here’s a function that will do the necessary cleanup:

and create a table of counties to zipcodes, and make a union of all the geometries:

Now, we can do markers, one for each county by doing the intersection against the zipcode bounds:

The county clean up function:

Note the addition of state fips to the GROUP BY and removal of Michigan from where clause:

The same thing to the landareas query:

Finally, audit the table:

It turns out that there are 7 counties that do not have geo information due to lack of zip code or missing information in the counties table.

Enjoy!

Add a comment

Related posts:

What being on lockdown is telling us about life in retirement

Right now around the world, a number of people will be getting a sense of what retirement might feel like. In the UK a large number of workers have been furloughed and are at home, with 80% pay, but…