Archivo del Autor: RealIvanSanchez

EMT Madrid, or Open Data antipatterns


Today, february 21st 2015, is the Open Data Day. And given that I’m far asay from my favourite Open Data nerds down at the Medialab Prado, I decided to work on giving the old ¿Cuánto Tarda Mi Autobús? website a facelift.

The story behind ¿Cuánto tarda mi autobús? is rather simple. A couple of years ago I got my first smartphone, and one of the things I wanted to do is check for the bus times in Madrid. Back then, EMT Madrid (the public company which runs the buses) was heavily promoting its new website in the Spanish GIS scene. The major problem was that the website for checking the times was made with Flash (actually, with ESRI Flex) and simply there is no way to use that with a smartphone.

So I reverse-engineered the protocol (if you call “reading a WSDL document” reverse engineering), did a bit of PHP plus Leaflet, and I was able to check the bus times with a web app.


 

Fast-forward to the present. EMT had released the API details under the banner of «Open Data EMT», I had a saturday to devote to the Open Data Day, and my little web app needed some love after two years of no updates at all.

But, oh boy, every time I try to develop anything with interfaces made by big subcontractors, I cannot stop cringing at the amount of WTF found around.

The antipatterns

Antipattern 1: «Open Data» which isn’t actual Open Data.

In the same way that Open Source software can only be Open Source if it meets the Open Source Definition, Open Data is only Open Data if it meets the Open Definition. Period. These definitions are evolved versions of the DFSG and Free Software Definition, curated with years of experience and discussions about what is open and what is not.

So, the Open Definition states:

2.1.8 Application to Any Purpose

The license must allow use, redistribution, modification, and compilation for any purpose. The license must not restrict anyone from making use of the work in a specific field of endeavor.

From «OpenData EMT»’s terms and conditions:

1. The re-user agent is explicitly prohibited from distorting the nature of the information, and is obliged to:
a. Not to manipulate nor falsify the information.
b. Ensure that any information displayed in your system is always up to date.
c. Not to use the information to undermine or damage EMT’s public image.
d. Not to use the information in sites that might lead to a relation with illegal acts or attempts to sabotage EMT or any other entity, organization or person.

So apparently I cannot:

  • Display historical information (because my data must be up-to-date).
  • Say that the system is a complete piece of steaming crap from a technological point of view.
  • Use the information in sites such as Facebook or Twitter, because those sites might be used for an attempted sabotage to «any entity or person». Who the fuck wrote this blanket clause, folks?

Please, don’t call it «Open Data».

I have to give praise to the EMT, though. The previous version of the agreement obliged the reuser to not disclose that he/she signed an open data agreement. At least they fixed that.

Antipattern 2: Your SOAP examples contain raw XML.

The whole point of SOAP was to abstract data access. And, still, every public SOAP interface I’ve ever seen includes examples with raw XML fragments that you’re supposed to build up.

If you cannot write code that access SOAP without writing XML, you’re doing it wrong.

Think about how WMS interfaces work nowadays: you just publish the WMS endpoint, and your GIS software takes care of the capabilities and the

Antipattern 3: Keep default fake values in your production code.

From the docs:

tempuri

Note «tempuri.org». A quick search will tell you that the system was designed using Visual Studio, and some lazy so-called software engineer didn’t bother to change the defaults.

Antipattern 4: Fuck up your coordinate systems

Note to non-spaniard readers: The city of Madrid is located roughly at latitude 40.38 north, longitude 3.71 west.

Now look at this example from the EMT docs about how bus coordinates are returned:

positionbus

Antipattern 5: Mix up your coordinate systems

Write things like “UTM” and “geodetic” in your documentation, but do not write which UTM strip you’re referring to (hint: it’s 30 north, and the SRS is EPSG:23030). Make some of your API methods to return coordinates in EPSG:23030 and some others to return coordinates in EPSG:4326.

And for extra fun, have your input coordinate fields accept both of those SRSs as strings with comma-defined decimal point, and then do this in the documentation:

coordinateint

Antipattern 6: Self-signed SSL certificates

Srsly?

sslcert

Antipattern 7: Wrap everything up in HTTP + JSON and call it “REST”

REST is a beautiful technology because it builds up pretty much on top of raw HTTP. Every object (“resource”) has its own URI (“universal resource identifier”), and the HTTP verbs are used semantically (GET actually gets information, POST modifies information, DELETE deletes a resource, and so on).

Also, HTTP status codes are used for the return status. An HTTP status code 200 means that the data is OK, a 404 means that the resource doesn’t exist, a 401 means that you are not authorized to get/post/delete the resource. Reusing the semantics of HTTP is way cool.

So, in a REST interface for bus stops, the stop number 1234 (a resource) would be located at its URI, e.g. http://whatever/stops/1234. It’s beautiful because the URI has a meaning, and the HTTP verb GET (which is the default when a web browser is fetching something) has a meaning. The server would answer with a “200 OK” and then the resource.

Low-level, it should look like:

GET /stops/1234 HTTP/1.1

-----

HTTP/1.1 200 OK
Content-Type: text/JSON

{
"stopId": 1234, 
"latitude": 40.3,
"longitude": -3.71,
"stopName": "Lorep Ipsum Street",
"lines": ["12", "34"]
}

Now compare the theoretical RESTful way to fetch one bus stop with the real-life mess:

POST /emt-proxy-server/last/bus/GetNodesLines.php HTTP/1.1
idClient=user&passKey=12345678-1234-1234-12345678&Nodes=1234

-----

HTTP/1.1 200 OK
Content-Type: text/JSON

{
"resultCode":0,
"resultDescription":"Resultado de la operacion Correcta",
"resultValues":{
  "node":1234,
  "name":"ROBERTO DOMINGO-AV.DONOSTIARRA",
  "lines":["21\/1"],
  "latitude":40.434861209797,
  "longitude":-3.6608600156554
  }
}

So, meaningless URI, POST request to fetch a resource, duplicated return status codes (for the HTTP layer and some other underlying layer).

Let me say this very, very clearly: In REST, you do not wrap a resource in a call to get that resource. Geez.

My takeaway

Readers will do good to keep in mind that I’m using my spare time in order to participate in the Open Data Day, for fun. The problem is that working with such an arquitecture is no fun at all. I can deal with these kind of enterprisey, Dilbertesque software stacks in a day-to-day basis, but only if I’m getting paid to endure the cringing and teeth-grinding.

I think it’s because the mind of an Open Source / Open Data nerd like me is difficult to understand by the classic propietary software people. I develop software just for the fun of doing so, just because I can, just because I like the feeling of empowerment of creating my own tools.

I write a lot of open source stuff. I fix wikipedia from time to time, I map stuff on OpenStreetMap if something’s wrong. I like to do it, and I empower other people to build upon my work.

Building upon good Open Source/Data doesn’t feel like standing on the shoulders of giants. It feels like standing on the soulders of a mountain of midgets… and if you’re lucky, you’ll be one of those midgets and someone will stand upon your shoulders.

For me, that involves a great deal of humility. When I watch the closed-source crowd talk about their latest product or software, they are congratulating themselves, but when I watch the open-source crowd, they’re constantly critizising themselves. And I can critizise them and they can critizise me and we all learn and that’s why they’re awesome.

</rant>

The Null Island Algorithm


We geomaticians like to gather around a mythical place called Null Island. This island has everything: airports, train stations, hotels, postcodes, all kinds of shops, a huge lot of geocoded addresses, and whatever geographical feature ends up with null coordinates due to whatever buggy geoprocessing pipeline and ends up in the (0,0) coordinates.

But earlier this year, some geonerds such as @mizmay and @schuyler realized that there is no one Null Island, but one Null Island per datum / coordinate system (depending on who you ask). And @smathermather had the spare time to find out how the “Null Archipielago” looks like:

(Null archipielago image by @smathermather, containing Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under CC BY SA)

Fast forward a few months. I received an e-mail from one of my university peers, asking for help with a puzzle:

A friend of mine received a puzzle with some coordinates. He has to find a place on earth represented by 861126.41, 941711.64.

It’s supposed to be a populated place. Any ideas?

Well, off the top of my head, those looked slightly like UTM coordinates – two digits after the decimal point, suggesting centimeter precision… but the easting is way off the valid range for UTM coordinates.

And I realized this is the Null Archipielago problem, all over again; but instead of plotting (0,0) on a map, let’s plot all points having (861126.41, 941711.64) coordinates in any reference system.

Cue PostGIS. We can create a point in every CRS like so:

select srid, ST_GeomFromText('POINT(861126.41 941711.64)',srid) as geom
 from spatial_ref_sys;

Note the complete absence of PL/SQL in there.

But it will be much easier to work with the data if all the points are in our beloved EPSG:4326 latitude-longitude coordinate system. And while we’re at it, let’s materialize that data into a table:

select srid, ST_Transform(ST_GeomFromText('POINT(861126.41 941711.64)',srid),4326) as geom
 from spatial_ref_sys;

But there is a problem with this – the PostGIS query will crash due to some CRSs having an empty Proj4 string. This took me a while to trace and fix:

select srid, ST_Transform(ST_GeomFromText('POINT(861126.41 941711.64)',srid),4326) as geom
 from spatial_ref_sys where proj4text!='';

And now we can take this data out into a file… but once again, there’s a catch: some of the coordinates are out of bounds and represented by (∞, ∞) coordinate pair. Even though file formats can handle ∞/-∞ values (good thing we know how IEEE floating point format works, right folks?), some mapping software can not accommodate for these values. And I’m looking at you, CartoDB upload page.

In this particular case, there are only points for (∞, ∞) so the data can be cleaned up in just one pass:

delete from archipielago where ST_X(geom)>180;

Then just add a tiny bit of CartoDB magic, and publish a map:

nullarchipielago

https://ivansanchez.cartodb.com/viz/1ac4a786-805a-11e4-bc48-0e853d047bba/public_map

I still don’t know if the original puzzle has anything to do with any obscure used-in-the-real-world CRS, but at least it’s worth a try.

Instalando MapProxy en windows, paso a paso


La semana pasada tuve el placer de formar parte de los formadores de los voluntarios de EUROSHA, un grupo de 25 jóvenes destinados a levantar cartografía en diversos países de África, como parte de las actividades del HOT. Uno de los problemas a los que se enfrentan estos voluntarios es una conexión a internet no muy fiable.

Es perfectamente posible editar datos de OSM offline (guardando los datos a fichero, editando, y resolviendo conflictos de versiones a posteriori), pero lo que no se puede hacer es consultar cartografía de fondo para comparar. Había que hacer algo al respecto. Y la solución fue instalar MapProxy, que permite tomar imágenes ráster de varias fuentes y servirlas como WMS, en local. En un portátil con linux (y python, python-pil y python-pip), instalarlo y probar la configuración por defecto fue una cuestión de minutos.

Ahora bien, los ordenadores que el HOT va a desplegar en África van con windows, principalmente por no disponer del tiempo suficiente para hacer una instalación completa con las herramientas adecuadas para la situación. Improvisemos pues, e instalemos MapProxy tal y como sugiere el manual

We advise you to install MapProxy into a virtual Python environment.

Bueno, pues no hagáis esto. Al instalar python desde cero, lo más probable es que os encontréis con problemas a la hora de instalar las librerías necesarias, en concreto PIL (Python Imaging Library). La manera sencilla de instalar Python para hacer funcionar MapProxy encima es OSGeo4W. Así que descargamos el instalador, elegimos una instalación avanzada, y nos aseguramos de que al menos los paquetes para python y python-pil se van a instalar:

El siguiente paso es descargarse distribute-setup.py y ejecutarlo dentro de una shell de OSGeo4W como administrador:

En esa misma consola, ejecutamos un easy_install mapproxy, y justo después un easy_install pyproj:

En este punto, los ejecutables de MapProxy ya están instalados. Lo podemos comprobar ejecutando mapproxy-util:

Ahora bien, MapProxy es inútil sin un fichero de configuración que le diga qué servicios tiene que cachear. Así que hacemos copia-pega de una configuración de MapProxy para OpenStreetMap, guardamos el fichero resultante como (por ejemplo) C:\OSGeo4W\mapproxy.yaml, y lanzamos mapproxy-util:

¡Et voilà! Nuestro MapProxy está funcionando y respondiendo a peticiones desde localhost:8080, cacheando tiles de OSM para convertirlas en un servicio WMS:

El resto de opciones se pueden consultar en el manual de MapProxy, pero hay unas cuantas cosas a tener en cuenta:

  • MapProxy siempre debe ejecutarse dentro del entorno de OSGeo4W.
  • … lo que quiere decir que si queremos que se ejecute automáticamente, se puede hacer un .bat haciendo copia-pega de C:\OSGeo4W\osgeo4w.bat, y modificando el comando que se lanza en la última línea de ese script.
  • La utilidad para inicializar o refrescar la caché, mapproxy-seed.exe, ha de ejecutarse también dentro del entorno de OSGeo4W.
  • Los datos cacheados se almacenan en el directorio que se especifique en el fichero de configuración, y es relativo a la ruta donde se lanza mapproxy.