Spring JDBC and Ehcache

A couple of months ago I started refactoring a client’s code to bring it up to current standards. The client had three projects talking to the same database three different ways: MyBatis, Hibernate, and Spring JDBC. Also, two of the projects were still using ANT. I set out to switch all the projects to Maven, Spring, and a single, shared DAO layer using Spring JDBC.

One of the client’s projects exports millions of rows of data. Small changes can greatly affect export time. There was one heavily called block of code that used a HashMap in an instance variable to cache some objects. I decided to move caching into the DAO instead. That change caused the export to run three times longer than before. Why was Ehcache so slow? Had I done something wrong?

I set-up a test project to measure how long it took to randomly read 10,000 rows from a 340 row table. Here are the results.

Cache Performance

Cache Performance

It took 0.043 seconds to perform 10,000 random reads from a HashMap of 340 objects. By contrast, it took almost 27 seconds to fetch the same set of 10,000 random keys directly from MySQL. The H2 database only took 0.58 seconds. The elapsed times were about the same using Ehcache backed by either H2 or MySQL, taking 2.36 seconds and 2.42 seconds, respectively.

Example Project on GITHub

To experiment with my configuration I created a simple test project that has H2 Database and MySQL. You can find the code on GITHub if you want to run the tests yourself.

https://github.com/smitchell/SpringJdbcEcache

Setting up Spring JDBC and Ehcache

I used annotations to set-up Spring JDBC and Ehcache.

Here is how I set-up the cache in the Spring context.

    <bean id="ehcache" class="org.springframework.cache.ehcache.EhCacheManagerFactoryBean"
          p:configLocation='ehcache.xml'
          p:shared="true" />
    <bean id="cacheManager" class="org.springframework.cache.ehcache.EhCacheCacheManager" p:cacheManager-ref="ehcache" />
    <cache:annotation-driven />

Here is an example from the Country DAO in the sample project.

@Repository
public class CountryDaoImpl implements CountryDao {

    @Cacheable("countries")
    public Country findByPrimaryKey(final Long id) {
        final MapSqlParameterSource params = new MapSqlParameterSource("value", id);
        final String sql = selectSQL.concat(" where ").concat(CountryColumn.COUNTRY_ID.name()).concat(" = :value");
        final List results = jdbcTemplate.query(sql, params, countryRowMapper);
        if (results.size() > 0) {
            return results.get(0);
        }
        return null;
    }

    @CacheEvict(value = "countries", allEntries=true)
    public void save(Country instance) {
        final Date now = new Date();
        instance.setLastUpdateDate(now);
        if (instance.getId() ==  null) {
            instance.setCreateDate(now);
            insert(instance);
        } else {
            update(instance);
        }
    }
...
}

The Ehcache configuration was equally straightforward.

<?xml version="1.0" encoding="UTF-8"?>
<ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ehcache.org/ehcache.xsd"
         updateCheck="true" monitoring="autodetect" dynamicConfig="true">
    <diskStore path="java.io.tmpdir"/>
    <defaultCache maxEntriesLocalHeap="10000" eternal="false" timeToIdleSeconds="120" timeToLiveSeconds="120"
                  diskSpoolBufferSizeMB="30" maxEntriesLocalDisk="10000000" diskExpiryThreadIntervalSeconds="120"
                  memoryStoreEvictionPolicy="LRU">
        <persistence strategy="localTempSwap"/>
    </defaultCache>
    <cache name='countries' maxElementsInMemory='400' eternal='true' overflowToDisk='false' />

</ehcache>

Conclusions

I’m still figuring out how best to leverage Ehcache, but it is clear that for small collections of simple objects, like categories, types, countries, states, roles, etc, Ehcache is overkill. For those types of objects I will add a HashMap to my service object to act as a simple cache.

The place for Ehcache will probably turn out to be those compound object graphs assembled from calls to multiple tables. I’m going selectively leverage Ehcache in my service (@Component) implementations to hang on to those complex, composite objects.

If I have success I’ll add a follow-up to this post.

Help Wanted: QGIS Mentor

quantum

HELP WANTED: QGIS Mentor

My goal was simple, generate a heat map in QGIS that looked similar to the heat map that I generated using Google Fusion Tables with just a couple of clicks, or similar to what was eventually implemented on Garmin Connect.

This is what my test data set of 350,000 GPS points looks like in QGIS 1.9.0 on a PostGIS layer:

PostGIS Layer in Quantum GIS 1.9.0

PostGIS Layer in Quantum GIS 1.9.0

Several attempts resulted in either no visible heat map (above) or a large blobbish, weather map looking effect, not what I wanted.

I talked to the cartographer who took my Google Fusion Table prototype from concept to production. His process bears little resemblances to my prototype. Instead of a loose collection of GPS points, he extracted running and cycling activity polylines. Instead of using a heat map plugin he wrote custom C++ code to convert the vector data into raster data.

So, I’ll switch from loose points to polylines too. Oracle Spatial and Graph has a SDO_UTIL.FROM_GML311GEOMETRY function to export the polylines in GML format. Likewise, PostGIS has a ST_GeomFromGML to import GML data into a geometry. I’ll work on creating a data set of 20,000 – 30,000 polylines.

I sure could use a QGIS mentor to answer my basic questions and to help me avoid making wrong turns. It might be time to reach out some cartographers in other departments at work.  I also plan on reading “Learning QGIS 2.0,” by Anita Grase, when that comes out later this month. I guess Anita can be my virtual mentor.

Loading GPS Data into PostGIS

postgis-logo

The next area I will explore is heat maps using 8 Mb of GPS data collected from fitness devices near Olathe, KS.  My data source is a CSV file with three columns: latitude, longitude, and intensity. Latitude and longitude were rounded to four decimal places and I kept track of how many rounded GPS locations fell into each rounded latitude/longitude cell. Intensity is a decimal number between zero and one reflecting the number of hits for each latitude/longitude cell.

The first step was to create a table with a column for intensity and a spatially enabled column to hold the rounded GPS location:

CREATE TABLE gps_points (INTENSITY Real);
SELECT AddGeometryColumn('','gps_points','LOCATION', 4326, 'POINT', 2);
CREATE INDEX gps_points_location_idx ON gps_points USING GIST (location);
ALTER TABLE gps_points ADD CONSTRAINT gps_points_location_uniq UNIQUE (location);

The GPS_POINTS Table with Index and Constraints

The GPS_POINTS Table with Index and Constraints

Next, I set-up a Spring Batch project to import the data. The reason that I took the time to create a Spring Batch project is so that I can add more item writers to load the data set into other spatial repositories like Oracle Spatial and Graph or MySQL.

Here is the item reader and item writer definition from my Spring context:

Spring Batch Item Reader and Item Writer Definitions

Spring Batch Item Reader and Item Writer Definitions

I was not able to format the SQL correctly in the item writer bean in-line, so I implemented an ItemWriter class in Java.

Custom Item Writer to Insert PostGIS Points

Custom Item Writer to Insert PostGIS Points

The data and the code is available from github: https://github.com/smitchell/SpatialLoader

In my next post I’ll take a look at the data with a PostGIS layer in QGIS 1.9.

MapBox vs ArcGIS Online

mapboxLogoesri_banner

Today I finished the MapBox TileMill crash course. Just as the programming world has the “Hello World!” example for learning new programming languages, the GIS world appears to use the USGS Earthquake data feed for the same purpose. I say that only because the Maps and the Geospatial Revolution course that I completed last week also used that data feed to introduce ArcGIS Online.

I’ve embedded an example of both sites, MapBox and ArcGIS Online, mapping the earthquake data on my web site. These are very basic examples as directed by the respective lessons:

Both tools/sites are easy to use. I liked the CartoCSS used by TillMill better than ArcGIS Online because I felt as if I had more control over the map style. It’s possible that ArcGIS Online has similar control but I just didn’t dig deep enough to find it.

The other thing that I liked about TileMill is that I am not tied to a single map tile server like I am with ArcGIS Online. Maps can be exported in an XML format compatible with Mapnik, should I ever choose to host map tiles on my own server.

I’ll continue exploring TillMill this Labor Day weekend to see what other capabilities I discover. I’m looking forward to seeing what it can do!

Let there be PostGIS

postgis-logo

To prepare for the TileMill Crash Course I needed to install a GIS, (in addition to Oracle Spatial and Graph which I previously installed). PostGIS was an easy choice for Ubuntu 12.04 since it is available in the Ubuntu Software Center.

First, I installed PostgreSQL and pgAdmin III.

pgadmin

Initially, I install PostGIS from the Ubuntu Software Center, but I was unable to create the postgis or postgis_topology extension.  In the end, I removed it and then followed the instructions found in How to install PostGIS 2.0 on Ubuntu 12.04 LTS (precise) from source. Following those instructions you will do the following:

  1. Install prerequisites, including gdal, if you haven’t already installed it.
  2. Build GEOS 3.3.x from source.
  3. Build PostGIS from source.
  4. Spatially enable your database.

I created a database and added the postgis extensions to it, and then used the function postGIS_full_version() to verify the extension.


$ createdb exploringspatial
$ psql -U postgres exploringspatial
Password for user postgres:
psql (9.1.9)
Type "help" for help.
exploringspatial=# CREATE EXTENSION postgis;
CREATE EXTENSION
exploringspatial=# CREATE EXTENSION postgis_topology;
CREATE EXTENSION
exploringspatial=# SELECT postGIS_full_version();

————————————————————————————————————————————————————————–
POSTGIS=”2.0.3 r11128″ GEOS=”3.3.8-CAPI-1.7.8″ PROJ=”Rel. 4.8.0, 6 March 2012″ GDAL=”GDAL 1.11dev, released 2013/04/13″ LIBXML=”2.8.0″ LIBJSON=”UNKNOWN” TOPOLOGY RASTER
(1 row)
Finally, launch the Shape File to PostGIS Importer GUI to verify it is installed:
$ sudo xhost +localhost
localhost being added to access control list
$ shp2pgsql-gui &
The Shape File to PostGIS Importer GUI launched as expected, so now I can get back to the TileMill crash course:

shp2pgsql

Next, I’ll follow the Tilemill crash course and play with heat maps.

Exploring Leaflet Raster Classes

leaflet

Below is the class diagram for the Leaflet raster classes. As I prepared this diagram I realized it did not go far enough for me since I don’t have experience working the raster images. I’m going to put my exploration of Leaflet classes on hold and take a detour into the world of raster graphics.

Leaflet Raster Classes

Leaflet Raster Classes

I found several posts on Google map tilesBing map tiles, and OSM map tiles. The article on OSM map tiles referenced the Slippy Map Tile Application, and and Mapnik, an open source tool for rendering maps.

If I start with Mapnik I need some raster tiles to render, don’t I? This would be the perfect time to continue my post from June about working with satellite images with TileMill. In that post I installed the following tools on my computer:

  1. GDAL, a low-level GIS toolkit
  2. libgeotiff, to work with geotags
  3. ImageMagick, an image processing package
  4. TileMill, an open-source mapmaking app from Mapbox

For my next post, I’ll pick up where I left off and follow the article Processing Landsat 8 Using Open-Source Tools by Charlie Loyd. Then perhaps I’ll look at GMap Creator and MapTiler. I’m open to suggestions for good raster learning resources.

When I’m done I’ll revisit this Leaflet raster class diagram again.

Exploring Leaflet Vector Classes

leaflet

This post, third in my series on the Leaflet API, explores the vector classes. Vector graphics are primitive geometric shapes that easily scale across pixel resolutions, such as points, lines, circles, and polygons. The point class, L.LatLon was covered in my previous post, Exploring Leaflet Geographic Classes. The remaining vector classes are shown below.

Leaflet Vector Classes

Leaflet Vector Classes

As was the case with the geographic classes, there are similar vector classes available in all online map APIs.

Not pictured above are L.MultiPolyline and L.MultiPolygon, as those are contained in L.FeatureGroup, which extends L.LayerGroup, which I will covered later in my series. I included L.PolyUtil and L.LineUtil because they have some helpful functions, such as Douglas-Peuker line simplification, distance and proximity calculations, and clipping.

A picture is worth a 1,000 words, so I’m not going to expand on my diagram. You can read more in the Leaflet API documentation.

Next stop, raster classes.