Balancing the Desire for Location Analytics with User Privacy

The Impact that MAC Randomization has on Location Analytics

Introduction
Bluetooth is a wireless personal area networking standard for exchanging data over short distances. Bluetooth low energy (BLE) (also known as Version 4.0+ of the Bluetooth specification, or Bluetooth Smart) is the power- and application-friendly version of Bluetooth that was built for the Internet of Things (IoT). The power efficiency and low energy functionality make this protocol perfect for battery operated devices. Since BLE now comes native on every modern phone, tablet and computer, it also makes for a perfect starting point to connect with the vast multitude of devices that IoT promises to bring to the world. A Bluetooth device, like any wireless device, announces itself to the world, by sending out advertisement packets.

Advertising
BLE advertisements are a periodic unidirectional broadcast from the peripheral to all devices around it. A listener can then use the information in these packets to gather the information being advertised or connect to the advertiser. Certain devices can not be connected to, and this depends on the what is announced in the advertisement header. The four types of advertisements are:

  1. Connectable undirected advertising
  2. Connectable directed advertising
  3. Non-connectable undirected advertising
  4. Scannable undirected advertising

Devices that only transmit, such as beacons, use the 3rd advertising type. Devices that need to quickly connect to something else, use the 2nd type. Most other devices use the 1st type of advertisement type. While advertising, the device can also indicate if it is using a random MAC address, or using its own MAC address. This gets important when doing passive analytics.

The advertisement packet has up to 31 bytes that can be used to advertise additional information about the device. The most common payloads are:

  • Local Name
  • Manufacturer Specific Data
  • Power Level

The manufacturer specific data, as indicated by the name, is where a device manufacturer can slot in their own specific information, while also identifying the make of the device. Every company that advertises over BLE is supposed to obtain a company identifier from Bluetooth SIG, and these identifiers can then be used to distinguish devices that are heard over the air. The manufacturer specific data is also where the payloads for beacons such as iBeacon, AltBeacon and Eddystone are present. For standard BLE devices, this is where Apple, for example, places information that can be used for services such as Handoff, Airdrop and Airplay.

Analytics and Privacy Implications
With just the awareness of this knowledge, we can hypothesize about good mechanisms for ensuring a user’s privacy and then run some real world analysis to see if this holds up.

For starters, any device that does not require a connection should use non-connectable advertisements. If connections are required, and only to specific previously known devices, then the connectable directed advertisement would be a suitable advertisement type to use.

In either case, and as we have seen in the world of WiFi, randomizing the MAC address used to transmit is almost always useful.

If one were take iOS and MacOS as an example we see some interesting patterns (see Table 1). Both do a fairly good job of keeping things random and ensuring that the device is not easily trackable. In my experiments, every time an iOS or MacOS device wakes up, it uses a new random MAC address. The device also only advertises in some scenarios. When the screen is unlocked, I was able to connect via BLE to a device and read out basic information like the Hardware Model number, Firmware version and current Battery status of the device. Some interesting packets that Apple Devices also send out include those for supporting common features like Handoff, Airplay, and Airdrop provided the device has BLE enabled.

Action Advertising
Phone Ringing X
Phone Answered yes Connectable
Text Message Alert x
“Hey Siri” yes No device info – so not identifiable but connectable
Screen Unlocked yes Connectable

From what I’ve seen, only the Apple TV does not randomize its MAC address. From an analytics standpoint, this constant randomization does a great job of maintaining user privacy while also making it seem that there is a lot more devices around than they actually are. In current analysis, I have yet been unable to determine a pattern in the randomization, but this continues to be a work in progress.

A more detailed capture via ubertooth sheds some light on BLE behavior by other devices. Of the mobile devices, it turns out that there are a few that never seemed to randomize the MAC address in an advertisement packet, which implies that once traced to a user that device can be monitored anywhere in the world.

Mobile accessories don’t seem to follow a consistent behavior. At home, for example, my smart TV and headphones all advertise over BLE without any randomizing, while also being connectable. Some connectable devices, share details like the Device Information Service once a BLE connection is maintained, but also seem to have timeouts in place to kickoff random connections. Using a mixture of listening for advertisements and sending scan requests to devices that use Connectable Advertisements, one can also derive the user specific name of a device.

Generically, accessories that need connections tend to avoid the randomized MAC address. I believe this is to facilitate easy connection by an app. Examples of such devices would be wireless headphones and headsets and connectable lamps.

Conclusion
While many devices do use methods to obscure themselves from prying eyes, there are still some ways in which one can run passive analytics for BLE devices. This has limited scope, however, and can get into murky waters when it comes to user privacy.

Active analytics, however, shows more promise. By getting people to install apps, one can drive user engagement and have people more aware of the system overall. This also helps in de-anonymizing the data coming from devices and opens up the possibility of relying on a mixture of WiFi (connected as well as unconnected) in conjunction with BLE.