Firebase Authentication: Migrating User Data

Firebase Authentication: Migrating User Data

This article is part of a series of articles that explores building a real-world application using SwiftUI, Firebase, and a couple of other technologies.

Here is an overview of the series and what we're building:

  • In part 1 of the series, we focussed on building the UI with SwiftUI, using a simple data model.
  • In part 2, we connected the application to Firebase, and synchronized the user's tasks with Cloud Firestore.
  • In part 3 we implemented Sign in with Apple, allowing users to sign in from multiple devices to access their data.

At the end of part 3, we briefly touched on an issue that users who sign in to the app on a secondary device might face:

You might think that using an app on two different devices should be an entirely reasonable thing to do. Still, when trying to sign-in on the second device, Firebase will return an error message saying

"This credential is already associated with a different user account".

Does this mean you cannot sign in to more than one device when using Firebase?

Read on to learn more about what this actually means and how we can gracefully recover the situation.

Why does this happen?

To better understand the issue, let’s first set the scene and look at how a user might end up in a situation in which Firebase refuses to perform account linking:

  1. Alice launches our app, MakeItSo, on her phone.
  2. She starts using the app without signing in and adds a couple of tasks.
  3. As she hasn’t signed in yet, all her data is connected to the anonymous user account the app created when Alice started using the app. Whenever she adds a new task, the app sets the userId property of that task to the uid of Alice’s anonymous user account.
  4. After using the app for a while, Alice is pleased with the app. She decides to sign up to take advantage of the extra features of the app, such as sharing data between multiple devices.
  5. She discovers the Sign in with Apple button and signs in using her Apple ID.
  6. The app receives a credential object from Sign in with Apple. The app then calls into Firebase Authentication to link Alice’s anonymous user with these credentials.
  7. Firebase Authentication upgrades the anonymous account into a permanent account, connecting it with Alice's Apple ID.
  8. A while later, Alice decides to use the app on her iPad as well.
  9. She installs the app on her iPad.
  10. Upon launching the app, she sees an empty screen.
  11. Assuming that it might take a while for the data to sync from her iPhone to her iPad, she starts adding new tasks on her iPad.
  12. What Alice doesn't know: as she is now using a different device, she is represented by a new anonymous user ID on her iPad.
  13. As the new tasks she enters on her iPad don't show up on her iPhone, she gets suspicious and realises she needs to sign in to the app on her iPad as well.
  14. She finds the Sign in with Apple button on her iPad and signs in.
  15. The app receives Sign in with Apple credentials representing Alice.
  16. The app tries to link the anonymous account with these credentials.
  17. At this moment, Firebase Authentication returns an error, stating that the credential has already been linked to another Firebase user account (the one in step 7)

It helps to keep in mind that, at its core, a Firebase user is just a thin wrapper around an ID token. Any authentication provider you might use in your app essentially just helps you to exchange specific credentials for a Firebase-specific ID token.

In their Firebase Summit 2020 session “Firebase Authentication: From fully managed to fully customizable”, my colleagues Sam and Malcolm dive into how the token exchange flow works in detail - I highly recommend watching this talk:

You can link a Firebase user to one or more authentication providers. Each of these providers can then provide a credential that uniquely identifies the user in the context of this authentication provider. To uphold this identifying relationship, Firebase needs to ensure that the same credential cannot be used to identify a different Firebase user.

This is why you will receive an error message when you take a credential that has already been linked to a Firebase user and try to link it to another Firebase user.

How can we resolve this situation?

To implement a suitable solution, let’s consider the user’s situation:

  • They have signed in on their primary device and started to add data to the app. The app has assigned the user’s ID to this data and has synced it with Cloud Firestore.
  • When the user signed in, the app linked their user account with the user’s Apple ID credentials.
  • The user then launched the app on a secondary device and might have added some data to the app on this device as well. They then realised they need to sign in on this secondary device to ensure the app can sync their data between all of their devices.

We can assume that the user wants to merge all data they entered on the second device with any data they already entered on the first device.

For our implementation, this means the following:

  • We will first try to link the user’s Apple ID credentials to the current anonymous account.
  • If this fails because the Apple ID credentials have already been linked to another Firebase user account, we will sign in to that user instead (keep in mind, this is the Firebase user we created on the user’s first device).
  • We will then update any data the user has created on the second device, and make sure it gets assigned to the first user account.

To perform this data migration, we’ve got two main options:

  1. Perform the migration on-device. This requires that the user is signed in to both the anonymous user account and the new, permanent account (by instantiating a separate FirebaseApp instance for each of them). We can then iterate over all tasks that are owned by the anonymous user and set their userId attribute to the ID of the permanent account. Firebase will then synchronise the updates to the backend, and all tasks will appear on all devices on which the user is signed in with the permanent account.
  2. Perform the migration in the backend / in the cloud. This approach requires a Cloud Function that queries all tasks that belong to the anonymous user and updates their userId attribute to the ID of the permanent account.

Both options are viable, but for this article, I decided to implement the second option to demonstrate how to make use of Cloud Functions.

Migrating data in the cloud

To perform the data migration, our Cloud Function needs to update all documents that belong to the first (anonymous) user, and assign them to the second (permanent) user.

Since it runs in a trusted environment, the function can make use of Firebase's Admin SDK and access all documents in our project's Cloud Firestore instance, independent of any Security Rules we've set up. If this sounds surprising to you, keep in mind that Security Rules are a mechanism that Firebase uses to protect data from being accessed by an untrusted client. Code running on Cloud Functions is considered secure; hence Security Rules don’t apply.

The client app knows both the anonymous user and the permanent user account at the time of calling the Cloud Function. So can we just send the user IDs to the Cloud Function?

No - that would be rather unsafe and open an attack vector: a malicious actor might be able to guess the user ID of a real user and call the Cloud Function with this user ID. This would put them in a position to either steal that user's data (by migrating it to an account the malicious actor has control over) or inject bogus data into the user's account.

There are several measures we can put in place to eliminate this attack vector. The key facts we need to establish are:

  1. Both ID tokens are indeed valid, and represent user accounts of our application.
  2. The first account (the one we're migrating from) is an anonymous user.
  3. The second account is not an anonymous user account.
  4. In addition, we can put a check in place to ensure the second account was signed into recently (i.e. within the past few seconds).

Let's start by looking at how we can establish that the ID tokens the function receives are valid. Instead of just sending the plain user ID (which can easily be spoofed), it is much safer to use JWTs (JSON Web Tokens) as a tamper-proof way to communicate user IDs. JWTs are cryptographically signed, so it is easy to verify their integrity.

Here is a JWT for an anonymous user in both encrypted and decoded form:

eyJhbGciOiJSUzI1NiIsImtpZCI6ImQxOTI5ZmY0NWM2MDllYzRjNDhlYmVmMGZiMTM5MmMzOTEzMmQ5YTEiLCJ0eXAiOiJKV1QifQ.eyJwcm92aWRlcl9pZCI6ImFub255bW91cyIsImlzcyI6Imh0dHBzOi8vc2VjdXJldG9rZW4uZ29vZ2xlLmNvbS9wZXRlcmZyaWVzZS1tYWtlaXRzby1zYW5kYm94IiwiYXVkIjoicGV0ZXJmcmllc2UtbWFrZWl0c28tc2FuZGJveCIsImF1dGhfdGltZSI6MTYwNDMzNDgzNiwidXNlcl9pZCI6Impobk1wRWNsaExTSFNGdW9SUnN4WUtLa3AwdDIiLCJzdWIiOiJqaG5NcEVjbGhMU0hTRnVvUlJzeFlLS2twMHQyIiwiaWF0IjoxNjA0MzM0ODM2LCJleHAiOjE2MDQzMzg0MzYsImZpcmViYXNlIjp7ImlkZW50aXRpZXMiOnt9LCJzaWduX2luX3Byb3ZpZGVyIjoiYW5vbnltb3VzIn19.iMSETKOeJO5vAui35fAz6h7izVJNNXLh3Q3lpx_ACw24OwFWFXsBcJAzhnjQB4D699_Nn6hoI1lupYERNzBL2VUzmdvNeqFEE16VRj8IFGih857nponVOWKSa4OpGwSnklDLHfzHhZ7wKuoozh5cAEp-oz10cHjztJiMuXMrqUPTTboGf7V7E6csAVgaaEoA990GNNBZuuRnihohKYu8-bV3Lt8DhtRaMhA4C-YXdImSha1WVtuZR9_quqAuULyFp4V8rWnJkUz9jOwv3jKVk3sf3Svv3jU5_RnLcILN12DqGHGKg1J5DxjrgWH3podZ2tOQb3j4cvzXAW9ruXQ3Jw
{
  "provider_id": "anonymous",
  "iss": "https://securetoken.google.com/<your-project-id>",
  "aud": "<your-project-id>",
  "auth_time": 1604334836,
  "user_id": "jhnMpEclhLSHSFuoRRsxYKKkp0t2",
  "sub": "jhnMpEclhLSHSFuoRRsxYKKkp0t2",
  "iat": 1604334836,
  "exp": 1604338436,
  "firebase": {
    "identities": {},
    "sign_in_provider": "anonymous"
  }
}

As you can see, Firebase-issued ID tokens are specific to your Firebase project: the project ID is part of the issuer (in the iss attribute), and the audience (in the aud attribute).

You can retrieve the ID token for a Firebase user by calling user.getIDToken(). Just like most Firebase API calls, this is an asynchronous call which returns an optional result (the token string) and an optional error:

currentUser.getIDToken { (token, error) in
  if let idToken = token {
    // use the ID token
  }
}

The ID token is signed with the service account of your Firebase project, so we can verify its integrity in our Cloud Function, like this:

try {
  const decodedToken = await admin.auth().verifyIdToken(idToken)
}
catch (error) {
  logger.error(`Error when trying to verify ID token. Error: ${error}`)  
  return { error }
}

So we'll retrieve the ID token of the anonymous user on the client, ready to be sent to the Cloud Function that will migrate this anonymous user's data.

Now, we could technically do the same for the ID token of the Sign in with Apple account - however, there is no need to do so, as we're going to use an HTTPS Callable Cloud Function, which has the following advantages:

  • We can easily call the function from our client app.
  • Requests to HTTPS Callable Cloud Functions automatically include Firebase Authentication tokens for the current user. These are verified for us by Firebase, so we don't have to perform any additional verification.

Now that we've established that both the anonymous user's and the permanent user's ID token are valid, we should check that the first account is an anonymous account, and the second account is not an anonymous account.

Firebase's client SDKs make it easy to detect whether a given user object represents an anonymous user: we can just call user.isAnonymous(). The Firebase Admin SDK doesn't have such a convenience API. Instead, we need to inspect the ID token and check if the authentication provider is anonymous:

function isAnonymous(idToken: admin.auth.DecodedIdToken) {
  return idToken.firebase.sign_in_provider === "anonymous"
}

As a final security measure, we'll also want to make sure the user has signed in to the permanent account very recently (let's say within the past minute or so). Fortunately, the ID token of a signed-in user contains an auth_time field that tells us when the user signed in. Here is how we can check if this took place within the past five minutes:

const gracePeriod = 5 * 60 * 1000
const authTime = permanentAccountIdToken.auth_time * 1000
const timeSinceSignIn = Date.now() - authTime

if (timeSinceSignIn > gracePeriod) {
  throw new functions.https.HttpsError(
    'invalid-argument', 
    `Sign in must be within the past ${gracePeriod} miliseconds`, 
    permanentAccountIdToken)
}

With all of those bits and pieces in place, we can now go ahead and implement the skeleton for the Cloud Function and call it from our client app:

import * as functions from 'firebase-functions'
import * as admin from 'firebase-admin'

let db: FirebaseFirestore.Firestore
const logger = functions.logger
const gracePeriod = 5 * 60 * 1000
let initialized = false

function initialize() {
  if (initialized === true) return
  initialized = true

  logger.log(`Starting up MakeItSo Cloud Functions`)
  admin.initializeApp()
  db = admin.firestore()  
}

function isAnonymous(idToken: admin.auth.DecodedIdToken) {
  return idToken.firebase.sign_in_provider === "anonymous"
}

async function verifyAnonymousUserIdToken(anonymousIdToken: string) {
  logger.log(`Verifying anonymous ID token ${anonymousIdToken}`)
  const verifiedAnonymousIdToken = await admin.auth().verifyIdToken(anonymousIdToken)


  if (!isAnonymous(verifiedAnonymousIdToken)) {
    throw new functions.https.HttpsError('invalid-argument', 'ID token must be anonymous', verifiedAnonymousIdToken)
  }
  return verifiedAnonymousIdToken
}

async function verifyPermanentUserIdToken(permanentAccountIdToken: admin.auth.DecodedIdToken) {
  logger.log(`Verifying permanent ID token ${permanentAccountIdToken}`)

  if (isAnonymous(permanentAccountIdToken)) { // (4)
    throw new functions.https.HttpsError('invalid-argument', 'ID token must be non-anonymous', permanentAccountIdToken)
  }

  const authTime = permanentAccountIdToken.auth_time * 1000
  const timeSinceSignIn = Date.now() - authTime

  if (timeSinceSignIn > gracePeriod) { // (5)
    throw new functions.https.HttpsError(
      'invalid-argument', 
      `This operation requires a recent sign-in.`,
      permanentAccountIdToken)
  }

  return permanentAccountIdToken
}

export const migrateTasks = functions.https.onCall(async (data, context) => {
  initialize()

  if (!context.auth) { // (1)
    throw new functions.https.HttpsError('failed-precondition', 'The function must be called while authenticated.')
  }
  else {
    logger.log('Received data: %j', data)

    const verifiedAnonymousIdToken = await verifyAnonymousUserIdToken(data.idToken) // (2)
    const permanentAccountIdToken = await verifyPermanentUserIdToken(context.auth?.token) // (3)
    return performMigration(verifiedAnonymousIdToken, permanentAccountIdToken)
  }
})

First, there is some setup and initialisation code. migrateTasks is the function that will be exported and callable by our client. As you can see, we first verify that the call actually contains an auth token (1). If so, we extract the idToken attribute from the data parameter (2) and verify the ID token it contains. We specifically check (in verifyAnonymousUserIdToken) that the token is valid and that it represents an anonymous user. This is to prevent malicious actors from trying to steal data from non-anonymous accounts.

In the next step, we verify the ID token of the permanent user as well: specifically, we check if the token represents a non-anonymous account (4), and whether the user signed in within the last minute (5).

If all of this was successful, we call performMigration to perform the actual data migration.

Once this function is deployed to Cloud Functions, we can call it from the client app using the following snippet:

func migrateTasks(from idToken: String) {
    let parameters = ["idToken": idToken] // (1)
    functions.httpsCallable("migrateTasks").call(parameters) { (result, error) in // (2)
      if let error = error as NSError? {
        print("Error: \(error.localizedDescription)")
      }
      print("Function result: \(result?.data ?? "(empty)")")
    }
  }

To pass parameters, we need to construct a dictionary (1), and then invoke httpsCallable with the name of the function. By now, you should be familiar with the fact that most Firebase API calls are asynchronous. HTTPS Callable Cloud Functions are no different: once the function completes, the trailing closure will be called (2), and we can check the result of the operation.

Updating multiple Firestore documents at once

The data migration itself is a two-step process:

  1. Fetch all documents that are owned by the anonymous user
  2. Update their userId attribute with the user ID of the permanent account

Database operations like this should be executed atomically, to ensure that either all operations succeed, or none of them are applied. It would be a rather nasty surprise for the user to find out only some of their tasks were migrated!

Cloud Firestore provides two mechanisms to achieve this: transactions and batched writes. A transaction is a set of read and write operations on one or more documents. A batched write is a set of write operations on one or more documents.

In our use case, we want to first read a bunch of documents (i.e. fetch all documents that are owned by the anonymous user), and then update all of them in one fell swoop, so a transaction is the right choice.

When using transactions in Cloud Firestore, keep in mind that:

  • All read operations must precede any write operations.
  • A function calling a transaction might be run more than once in case a concurrent edit affects one or more documents that are part of the transaction.
  • Transactions will fail if the client is offline (as we're calling from a Cloud Function, this doesn't apply).

Here is the code that performs the data migration:

async function performMigration(
  anonymousIdToken: admin.auth.DecodedIdToken,
  permanentAccountIdToken: admin.auth.DecodedIdToken
  ) {
  const anonymousUserId = anonymousIdToken.uid
  const permamentUserId = permanentAccountIdToken.uid

  logger.log(`Migrating tasks from previous userID [${anonymousUserId}] to new userID [${permamentUserId}].`)

  return db.runTransaction( async transaction => { // (1)
    const tasksToMigrateQuery = db.collection('tasks').where('userId', '==', anonymousUserId)
    const tasksToMigrate = await transaction.get(tasksToMigrateQuery) // (2)

    if (tasksToMigrate.empty) { // (3)
      logger.log(`Previous user [${anonymousUserId}] didn\'t have any documents, nothing to do.`)
    }
    else {
      logger.log(`Migrating ${tasksToMigrate.size} tasks from userID [${anonymousUserId}] to new userId [${permamentUserId}]`)
      tasksToMigrate.forEach(snapshot => { // (4)
        transaction.update(snapshot.ref, { 'userId': permamentUserId }) // (5)
      })
    }
    return { // (6)
      'updatedDocCount': tasksToMigrate.size,
      'anonymousUserId': anonymousUserId,
      'permamentUserId': permamentUserId
    }
  })
}

As discussed, we use a transaction (1) to wrap all data access code. Inside the transaction, the first step is to fetch all documents that belong to the anonymous user (2). The result set might be empty, in which case we’ll just log a message.

If the result set is not empty, we will iterate over all documents (4), and update their userId attribute to the ID of the user’s permanent account (5).

Finally, we return a dictionary with some details about the data migration. The client app can use this information to let the user know how many tasks were migrated.

Conclusion

Firebase Authentication can be really simple for straightforward use cases, but that simplicity does not prevent you from building more complex solutions when needed. By introducing some server-side code in a Cloud Function, you can implement very flexible and powerful authentication systems.

Thanks for reading!

Resources


The header image is based on Cloud by Gajah Mada Studio from the Noun Project and Forklift by Victoruler from the Noun Project