Parsing Dates: When Performance Matters

A while back, an app I was working on ran into some performance issues. The app needed to process thousands of JSON objects, and store them in a Core Data database - not such a trivial task, so quite understandable that there was going to be some sort of performance hit. But processing times of ~30 seconds were outside the realm of acceptable.

After firing up Instruments and running the Time Profiler tool on an optimized build, I was a little surprised to find that about half of the processing time was being devoted to parsing dates. Computers these days are fast - at least, way faster than that - so I was on a mission to improve date parsing performance. Today, thanks to performant date parsing and a few other changes, JSON processing time is a small fraction of what it once was.

My initial, naive approach

Each JSON object had perhaps a couple of dates, formatted as an ISO 8601 string. For each object, a DateFormatter was used to turn date strings into Date objects, which were then stored using Core Data. I was creating a new date formatter for each object, something akin to the following:

for dateString in lotsOfDateStrings {

  let formatter = NSDateFormatter()
  formatter.format = "yyyy-MM-dd'T'HH:mm:ssZZZZZ"
  let date = formatter.date(from: dateString)
  doStuff(with: date)

}

The code above creates a new date formatter for each iteration of the loop. You might not think creating a date formatting object would be very expensive, but boy would you be wrong: creating the formatter once and reusing it results in a big performance boost. In my case, parsing was not just a simple ‘for’ loop, but we can solve the issue for more complex cases by creating a caching mechanism.

First, how much faster is it exactly? A simple test formatting 100,000 date strings found the following:

Formatter Creation Time
Naive 10.88 seconds
Once 4.15 seconds

Wow - that’s a big improvement. Not significant enough to completely fix the performance issue, but big enough to make it clear that date formatters are expensive to create.

Caching date formatters

I would suggest using something like the following as the default way to create any date formatter. Every formatter created using this method will automatically be cached for later, even for more complex situations like UITableView cell reuse. To use it, just call DateFormatter.cached(withFormat: "<date format>") - it doesn’t get much easier than that.


private var cachedFormatters = [String : DateFormatter]()

extension DateFormatter {

  static func cached(withFormat format: String) -> DateFormatter {
    if let cachedFormatter = cachedFormatters[format] { return cachedFormatter }
    let formatter = DateFormatter()
    formatter.dateFormat = format
    cachedFormatters[format] = formatter
    return formatter
  }

}

Faster, but not fast enough

Despite the large improvement, a 2x speed boost wasn’t enough. After a little research, I found that iOS 10 has a new date formatting class, ISO8601DateFormatter… nice! Unfortunately iOS 9 support was a must, but let’s find out how it compares to a plain old DateFormatter anyway.

Running the same test with 100,000 date strings results in a 4.19 second parse time, which is a little slower than a DateFormatter, but only just. If you’re supporting iOS 10+ and performance isn’t a concern, you should probably still use this new class, despite the minor speed degradation - it probably does a more thorough job of handling all possible variants of the ISO 8601 standard.

Before moving on, I’d like to point out that if you’re in the same situation, and you have any say over the format of data you’re dealing with, replacing the ISO 8601 string with a simple unix timestamp would be a smart move. A quick test shows that 100,000 timestamps parse in 0.001 seconds… now that’s more like it.

strptime() - don’t be fooled

A little more research into alternate date parsing solutions procured an interesting function: strptime(). It’s an old C function, meant for low level date parsing, complete with what seems like all the formatting specifiers we need. It’s available directly in Swift, and you can use it as follows.

func parse(dateString: String) -> Date? {

  var time: time_t
  var timeComponents: tm = tm(tm_sec: 0, tm_min: 0, tm_hour:
    0, tm_mday: 0, tm_mon: 0, tm_year: 0, tm_wday: 0, tm_yday:
    0, tm_isdst: 0, tm_gmtoff: 0, tm_zone: nil)
  guard let cDateString = dateString.cString(using: .utf8) else { return nil }
  strptime(cDateString, "%Y-%m-%dT%H:%M:%S%z", &timeComponents)
  return Date(timeIntervalSince1970: Double(mktime(&timeComponents)))

}

Looks perfect, right? Well, I thought so at first too… long story short: don’t use it. The Mac/iOS implementation of strptime() doesn’t properly support the %z formatting specifier needed for ISO 8601 date offsets, and it has issues with daylight savings. It’s fast, but the call to mktime() slows it down a little - the code above ends up being around twice as fast as anything previously. This code actually made it to the App Store after correcting the timezone offset, until issues with daylight savings started to occur. You might be able to use this by manually correcting for daylight savings differences between the current and given timezone… alas, there is a better, faster way, so no need to do this.

vsscanf()

The final solution uses another C function, vsscanf(), derived from sscanf().

vsscanf() is fast, but I spent some time figuring out how to convert this to a Date without hindering the performance. Let’s get straight to it:

class ISO8601DateParser {

  private static var calendarCache = [Int : Calendar]()
  private static var components = DateComponents()

  private static let year = UnsafeMutablePointer<Int>.allocate(capacity: 1)
  private static let month = UnsafeMutablePointer<Int>.allocate(capacity: 1)
  private static let day = UnsafeMutablePointer<Int>.allocate(capacity: 1)
  private static let hour = UnsafeMutablePointer<Int>.allocate(capacity: 1)
  private static let minute = UnsafeMutablePointer<Int>.allocate(capacity: 1)
  private static let second = UnsafeMutablePointer<Float>.allocate(capacity: 1)
  private static let hourOffset = UnsafeMutablePointer<Int>.allocate(capacity: 1)
  private static let minuteOffset = UnsafeMutablePointer<Int>.allocate(capacity: 1)

  static func parse(_ dateString: String) -> Date? {

    let parseCount = withVaList([year, month, day, hour, minute,
      second, hourOffset, minuteOffset], { pointer in
        vsscanf(dateString, "%d-%d-%dT%d:%d:%f%d:%dZ", pointer)
    })

    components.year = year.pointee
    components.minute = minute.pointee
    components.day = day.pointee
    components.hour = hour.pointee
    components.month = month.pointee
    components.second = Int(second.pointee)

    // Work out the timezone offset

    if hourOffset.pointee < 0 {
      minuteOffset.pointee = -minuteOffset.pointee
    }

    let offset = parseCount <= 6 ? 0 :
      hourOffset.pointee * 3600 + minuteOffset.pointee * 60

    // Cache calendars per timezone
    // (setting it each date conversion is not performant)

    if let calendar = calendarCache[offset] {
      return calendar.date(from: components)
    }

    var calendar = Calendar(identifier: .gregorian)
    guard let timeZone = TimeZone(secondsFromGMT: offset) else { return nil }
    calendar.timeZone =  timeZone
    calendarCache[offset] = calendar
    return calendar.date(from: components)

  }

}

And there you have it. This parses 100,000 date strings in 0.67 seconds - almost 20 times faster than the original method, and about six times faster than possible with a regular DateFormatter.