Housing Land Audit – Data Mining Proposal Field

This is a practical example of how you can, if you are using Postgres (or any other database), use the power of SQL to speed up the analysis of information. Here I am using it to both tag housing sites and to identify exact numbers of housing and flats from a list of planning applications proposals.

Most won’t have access to a csv of all planning applications (without resorting to scraping of public access sites) and their proposals but anyone can request a copy of the data from the Spatial Hub.

If you are within a planning authority you will or someone within the organisation will be able to get this information. Unfortunately the raw list of planning application proposals I have access to does not store the number of houses or number of flats coming forward from an application, this, if listed being buried within the proposal string.

How then can we identify and separate the scale in terms of houses and flats on planning applications?

Answer : Data mining using SQL Statements

First of all take your table of planning applications, import this into a database of your choosing here I am using postgres but I know similar syntax works in SQL Server and I would expect it to work in SQL Lite and My SQL as well.

nohouses
noflats

Next create a table called t1001sqlrun

SQL

CREATE TABLE IF NOT EXISTS public.t10001sqlrun
(
    pkid bigint NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 9223372036854775807 CACHE 1 ),
    sqltorun character varying(2000) COLLATE pg_catalog."default",
    runindividual boolean DEFAULT false,
    runorder integer,
    CONSTRAINT t10001sqlrun_pkey PRIMARY KEY (pkid)
)

CREATE TABLE IF NOT EXISTS public.t10001sqlrun
(
    pkid bigint NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 9223372036854775807 CACHE 1 ),
    sqltorun character varying(2000) COLLATE pg_catalog."default",
    runindividual boolean DEFAULT false,
    runorder integer,
    CONSTRAINT t10001sqlrun_pkey PRIMARY KEY (pkid)
)

Next create a function – here I call it __udf0002runsql() this is what you call to run all of the queries.

SQL

CREATE OR REPLACE FUNCTION public.__udf0002runsql(
	)
    RETURNS void
    LANGUAGE 'plpgsql'
    COST 100
    VOLATILE PARALLEL UNSAFE
AS $BODY$
DECLARE
    stmt text;
BEGIN
    FOR stmt IN
        SELECT sqltorun FROM t10001sqlrun order by runorder
    LOOP
        EXECUTE stmt;
    END LOOP;
END;
$BODY$;

CREATE OR REPLACE FUNCTION public.__udf0002runsql(
	)
    RETURNS void
    LANGUAGE 'plpgsql'
    COST 100
    VOLATILE PARALLEL UNSAFE
AS $BODY$
DECLARE
    stmt text;
BEGIN
    FOR stmt IN
        SELECT sqltorun FROM t10001sqlrun order by runorder
    LOOP
        EXECUTE stmt;
    END LOOP;
END;
$BODY$;

Now populate t10001sqlrun with appropriate values each value of sqltorun will have a structure similar to the following

SQL

UPDATE t0016uniformplanappsanalyse SET residential = true,  nohouses =4 where proposal ILIKE '% 4 house%';

UPDATE t0016uniformplanappsanalyse SET residential = true,  nohouses =4 where proposal ILIKE '% 4 house%';

Here I am keeping all the strings in a table called t0016.

Here I made the strings in Excel and then I will import them into postgres server..

Below is a list of 4000 sql statements each statement runs against each line within the table so be aware that you will probably be running millions of queries. I ran this against the application list which was available to me of some 30k applications that’s 120 million queries. For me this runs in less than 5 minutes but its the kind of thing which might depend on your hardware and the complexity of the update statement that you choose to run.

t10001sqlrun Download

And of course you can import them with the help of this tool

Postgres CSV Import Tool

Post category:All / Data mining / Housing Land Audit
Post author:Mark