Technical solutions

Based on TLS hop-by-hop

Introduction

As has been previously established, within the roaming ecosystem mobile operators often outsource various services to third parties, such as an IPX carrier or VAS provider.

In the interest of business continuity, an architecture is required that recognizes the relationship between service providers, which cater to a collection of (international) roaming relations.

The subsequent chapters describe this architecture in technical detail: the scenario where only one operator has outsourced services, the case where both operators have outsourced services, and the common aspects of both options. This is then followed by an exploration of the use of N32-c between SEPPs.

Architecture description for hosted SEPP

Imagine two mobile network operators, O1 and O2. They have a roaming agreement which allows their users to connect to each other's networks.

O1 has decided to outsource some of its services to a company that specializes in providing these services, known as a hosted SEPP provider. This helps O1 manage services such as network steering, fraud detection, and data roaming controls more effectively.

O2 has no relation with the hosted SEPP service provider and interacts with O1 for roaming services using direct TLS.

Figure 1: illustration of the basic process for a bilateral relation between 01 and O2 based on direct TLS

O1 DNS IP is published in its IR.21. At this time, it is still unclear how the DNS query in itself is going to be protected – DNS over TLS is just one option.

On receiving the NAPTR query from O2 SEPP, O1 DNS returns a service record for the SEPP service. O2 SEPP then launches a SRV request and receives a list of SEPP servers to choose from. Finally, O2 SEPP launches a A/AAAA request to obtain the IP of the selected O1 SEPP.

Figure 2: SEPP discovery flow for hosted SEPP. In this case, the SEPP service is provided by a host and “hosted” in the security domain of the host

The NAPTR query is still sent to O1 DNS, as authoritative DNS for the domain. However, the response contains a service record pointing to the domain of the host. The SRV and A/AAAA are subsequently handled by the host DNS, as authoritative for the host domain. By using <O1mnc>.<O1mcc> as subdomain, different SEPP server instances and corresponding IP can be provided per hosted customer.

O1 SEPP in turn only needs to discover the hosted SEPP, as it is managing all (or most) of the roaming relations. An FQDN unique to O1 should be used, from the domain of the hosted SEPP service provider, e.g. by using O1 PLMN ID as subdomain. A <client> label is used in the example call flow.

Figure 3: SEPP discovery flow for hosted SEPP, with use of the label

For any remaining (e.g. domestic) relations, the roaming partners can agree to use a different FQDN specific to the relation. Vice versa, the provider SEPP can discover O1 SEPP using a similar FQDN technique.

Figure 4: Specific relation SEPP discovery

The next picture shows TLS handshake and N32c exchanges of O1 and O2 with the service provider. SEPP+ indicates a SEPP with additional functionalities to handle the customer-provider relationship (n32s). The difference with a 3GPP SEPP is in the validation of a provider certificate instead of a PLMN certificate and verifying that the relation with the roaming partner (O2), as identified in the n32f traffic, is indeed managed via the provider.

Figure 5: TLS setup with provider

Note that:

O1 SEPP IP are not exposed to the general IPX community, only to the limited set of in-house managed relations (not depicted here) and to the hosted SEPP provider. O1 SEPP will not accept TLS handshakes or N32c requests from the general IPX community, outside of the hosted SEPP provider.
As for the n32c exchange, it provides the option to negotiate custom headers e.g. 3ggp-Sbi-Originating-Network-Id, as per service contract.

As for O2, there is no notable difference compared to the direct TLS model, other than receiving a provider’s server list and SEPP IP as result of the SEPP discovery process. It’s required for O2 to open their IP firewall for provider IP – this should be covered in the bilateral roaming agreement between O1 and O2. From a technical perspective, it is much more convenient to use provider IP rather than “loan” IP from each individual customer. Still, the provider should offer different IP per customer, which can be straining IPv4 resources.

Once n32c is setup between all parties, n32f traffic can flow in both directions.

Figure 6: N32-f flow via Provider

The provider SEPP forwards n32f based on the 3gpp-Sbi-Target-ApiRoot header, which indicates the target NF at the roaming partners. O1 SEPP simply forwards the n32f traffic to the provider managing the roaming relation.

The provider may offer services that require more than forwarding of messages or mediation of message content. A stand-alone SEPP+ (or PRINS) deployment is then no longer sufficient.

The following picture illustrates the NF discovery process in case a provider NF must be included in the control flow. Situation: O1 UE roaming in O2’s network, so O1 has the HPMN role and O2 has the VPMN role.

Figure 7: NF discovery of hosted NF

Steps 1-3: O2 follows the normal 3GPP procedures, so the discovery request arrives at the provider SEPP.

Steps 4-7: provider SEPP can forward the discovery request to O1 SEPP (step 4), which forwards to O1 NRF (step 5). As part of the service setup agreed between O1 and the provider, O1 NRF directs the discovery request from O2 (or roaming partners in general) to the provider NRF (step 6). O1 SEPP forwards the request to provider SEPP. Alternatively, if so agreed, the provider SEPP can skip ahead to step 8 when provider NRF is enabled to respond to discovery requests from O1’s roaming partners on O1’s behalf.

Step 8: provider SEPP checks the validity of the request, i.e. O1 is a valid customer and roaming relation with O2 is open. If the request is valid, provider SEPP forwards the request to provider NRF, as per service contract.

Step 9: provider NRF answers the discovery request, with a provider NF as result. NF naming needs to be agreed with O1 and be consistent with O1’s naming conventions and domain name.

Steps 10-12: Finally, this answer is forwarded to O2 AMF via O2 SEPP.

Further N32f and N9 flows are discussed here, as they are similar for both the hosted SEPP and service hub SEPP architectures. Discovery of O1 NF by provider NF is also handled there.

Architecture description for service hub SEPP

In this situation, both operators O1 and O2 have outsourced services to a third-party provider that involves the use of a third-party SEPP.

In case of a roaming hub scenario, O2 SEPP does not contact O1 DNS as there is no direct agreement or contract between them anymore. Instead, the agreement is brokered by the roaming hub provider. O2 SEPP therefore needs to discover the hub SEPP. This is shown in the next picture.

Figure 8: Hub SEPP discovery

Note that from O1’s perspective, not much has changed compared to the hosted SEPP model described in the previous section. In other words, if O1 uses a service provider for all its relations, bilateral and hub agreements can technically be handled in the same way.

Notable differences with the hosted SEPP model:

O2 no longer uses the well-known FQDN of O1 to discover O1’s SEPP, but instead the FQDN of the provider managing the relation with O1. Provider DNS is queried instead of O1 DNS (NAPTR, SRV and A/AAAA).
The number of TLS handshakes and N32 connections between parties is much reduced, as bilateral connections are replaced by a hub-and-spoke architecture. Low connections per roaming relation are replaced by a relation with the service provider, managing all those relations. This heavily reduces the number of persistent connections to manage and allows for a quicker upscaling of roaming relations.

The provider SEPP forwards n32f based on the 3gpp-Sbi-Target-ApiRoot header, which indicates the target NF at the roaming partners. O1 and O2 SEPP simply forward the n32f traffic to the hub provider managing the roaming relation.

Once again, the provider may offer services that require more than a forwarding of messages or a simple mediation of message content. A stand-alone SEPP+ (or PRINS) deployment is then no longer sufficient. The following picture illustrates the NF discovery process in case a provider NF must be included in the control flow. Situation: O1 UE roaming in O2’s network, so O1 has the HPMN role and O2 has the VPMN role.

Figure 9: NF discovery of hub NF

Notable differences with the hosted SEPP model:

Provider NF/NRF naming can be done independently of O1 or O2 naming conventions, using the provider’s own domain.
O2 intentionally performs the NF discovery procedure with the provider NRF instead of O1’s NRF.

SEPP verification is quite similar to what’s described for the hosted SEPP model:

Verification of client operator/provider TLS certificates.
Verification whether roaming relation is open, and the correct provider is chosen for the relation.
Rejection of traffic if either of these verification steps fail.

Further N32f and N9 flows are discussed in the next section as they are similar for both the hosted SEPP and service hub SEPP architectures. Discovery of O1 NF by provider NF is also handled there.

N32f and N9 flows

The following pictures illustrate the call flows for control and user plane, under the assumption that provider NF and UPF are to be involved as part of the service contract. The same roaming situation is taken as before, namely O1 UE roams in O2’s network.

The first picture shows a generic control plane flow, where the hosted NF either directly provides a result (a) or reissues the initial request (b), potentially with changes to the content, to the target network. The provider NF takes on different roles in the latter case.

Figure 10: Control plane flow with provider NF

Control plane flow:

Step 1: O2 NF targets the provider NF for the request, as per result of the NF discovery process shown in 5.2 and 5.3.

Steps 2-3: Request verification and forwarding by O2 and provider SEPP.

Step 4: The provider may trigger additional services, exposed via NEF API, to other systems or application functions, e.g. roaming business intelligence or welcome SMS applications.

Step 4a: As part of the service contract with O1, NF requests can be rejected by the provider NF using a proper NAS error code. Examples: O1 UE tries to register on a forbidden network, or the request is deemed to be fraudulent. This flow is then concluded in steps 5a and 6a.

Step 4b: When O2’s NF request is allowed, the provider NF acts as the visited network AMF and issues a discovery request for O1’s target NF. The NF type is determined from the request type received from O2. The discovery request is targeted at O1’s NRF. Roaming partner transparency can be kept by means of 3ggp-Sbi-Originating-Network-Id header or appropriate NF naming (implementation decision). The originating network id header can be populated by O2 or inserted by the provider.

Steps 5b, 6b,7b: Request verification and forwarding by provider and O1 SEPP.

Step 8b: O1 NRF responds with NF uri.

Steps 9b, 10b, 11b: NF uri finally arrives at provider NF.

Step 12b: the provider NF (acting as visited AMF) reissues the initial NF request from O2, possibly with altered content (NFReq*), to O1 NF.

Steps 13b, 14b: the reissued NF request is forwarded to O1 NF via provider SEPP and O1 SEPP.

Steps 15b, 16b: O1 NF returns a result to provider NF, which is then reissued by provider NF (acting as AUSF, UDM or home SMF such as the case may be) to O2 NF. In these steps, the path via the respective SEPPs is no longer shown for simplicity.

Use of n32-c

This chapter looks at the use of N32-c between SEPPs when the TLS security mechanism is used. It does not apply for the PRINS security method.

Analyzing the information exchanged in the N32 handshake, the following observations can be made.

Security capability: This information is already known for any connection. For bilateral and hop-by-hop it is TLS by default.
Supported headers: This is already known from local configuration; important headers are mandatory.
Sender IE: This is only useful to define the identity of the sender which then must be crosschecked with the certificate. Using the certificate to identify the source removes the need for Sender IE.

PLMN ID list: Once the sender is identified, local config will determine which PLMN ID are served by that peer. From a security point of view a SEPP should not rely on self-declaration from the peer SEPP to define filtering policies for incoming messages. It introduces the need for crosschecks with certificates and local config, without providing more information. It also introduces the need to update and restart N32 connections if the PLMN ID list changes (e.g. acquisitions, mergers, etc…)
TargetPLMNId: This is already part of the FQDN (where relevant) and only needed if separate N32 contexts are needed per PLMN ID and separate FQDNs per PLMN Id is not supported. For the bilateral model it is difficult to see why separate N32 contexts between the same pair of SEPPs would be useful.

IntendedUsagePurpose: This can also be part of FQDN. For the bilateral model it is not clear how this would be useful if it is targeting the same SEPP. Note that separation based on purpose depends on the capabilities of the NF in the core to indicate the purpose. For example, a SEPP might request separate N32 connections for different purposes, but this only makes sense if the NF in the roaming partner can indicate the purpose of messages and/or the peer SEPP can distinguish the traffic.
Separate instances for N32-c and N32-f are only needed because of n32-c. If there is no N32-c then there is no need for a separate N32-c instance and only the N32-f instance is left.

To conclude: n32-c is providing very little value and only for very specific use cases where traffic separation is involved. All the relevant information for filtering, identification and routing will be available in the local configuration.

It may be enough for the provider SEPP to be in the loop with regards to the control plane, e.g. when the service provider just provides passive services, or a basic hub function without financial liability.

However, in most cases the provider is expected to step in and initiate certain processes on NF level, such as rejecting requests, changing request/response parameters or modifying/deleting ongoing data sessions. The latter is illustrated in the next call flow.

Figure 11: Control and user plane flows with provider SMF and UPF

Verification and forwarding by O1, O2, and provider SEPP is no longer shown explicitly.

Control and user plane flows with provider SMF and UPF

Step 1: O2 SMF targets the provider NF, as per result of the NF discovery process shown in the hosted SEPP and service hub SEPP sections.

Step 2: the provider NF, in the role of vSMF, in turn issues a data session request to O1 SMF, with agreed upon parameters as per service contract. In addition, it manages provider UPF resources over N4, generates PDRs for wholesale settlement etc.

Step 3: O1 SMF accepts the data session request and returns GTP-U and other parameters to provider SMF. This concludes the setup of GTP-U tunnel 1 between O1 and the provider UPF.

Step 4: the provider NF, in the role of hSMF, accepts the data session request from O2 SMF and returns GTP-U and other parameters to O2 SMF. This concludes the setup of GTP-U tunnel 2 between provider and O2 UPF.

Under conditions specified by the service contract, a provider may omit steps 2 and 3 and break out the session regionally (SGi) – e.g. when the provider offers direct access to nearby edge computing centers, that may host applications used by the UE, to reduce latency.

Steps 5-6: when the provider’s budget control system detects a situation of insufficient customer (O1) funds, it may issue instructions to (heavily) reduce data consumption, interrupt data sessions with a forced breakout to a landing page, or simply to delete data sessions. Note that traffic shaping is intended to level out traffic peaks but is not suitable to handle sizeable lasting bandwidth reductions, which inevitably cause buffer overflow. Control of GTP-U parameters on NF level is essential.

The last picture illustrates the authentication flow for sponsored roaming, whereby the service provider is backed by a sponsor mobile operator to provide international roaming services to operators who don’t have roaming agreements of their own. In such a situation, the visited network operator (O2) is only aware of sponsored identities (provider SUPI) and is unaware of any operators making use of the service (O1).

Figure 12: Sponsored roamer authentication flow

For simplicity’s sake, it is assumed that the visited network operator (O2) has a bilateral roaming agreement with the provider (acting as sponsor mobile operator), based on direct TLS.

Sponsored roamer authentication flow

Step 1: O1 UE contains a SIM card with a dual SUPI (IMSI) profile. There is a single secret key (Ki) used for the 5G authentication process, managed exclusively by O1. However, there are 2 SUPIs, one from O1 and one from the provider. Each has their own set of public/private network key pairs used for SUPI concealment. The provider manages the SUPI mapping and public/private network key pair for provider SUPI concealment.

When O1 has no roaming relation with O2, the UE will use the provider SUPI to get roaming service. Therefore, the authentication request is sent to the provider NF, acting as AUSF/SDM. If it is an initial request, it contains a provider SUCI which must be re-made visible first. Once the provider SUPI is known, it can be mapped to the correct client SUPI (O1).

Steps 2-3: provider NF, acting as visited AMF, discovers O1 AUSF.

Step 4: provider NF reissues the authentication request to O1 AUSF, using O1 SUPI.

Step 5: O1 AUSF answers with a challenge (RAND) and a hash of the expected result (HXRES).

Step 6: provider NF, acting as AUSF, reissues the answer to O2 AMF.

Step 7: once the actual result (RES) is obtained from the UE, O2 AMF computes hash HRES and verifies if it matches the received HXRES. If successful, it issues another request containing RES to provider NF using the callback uri.

Step 8: provider NF reissues this request to O1 AUSF.

Step 9: O1 AUSF verifies RES and finally returns the required cryptographic materials (session keys) to provider NF.

Step 10: provider NF reissues the answer to O2 AMF, completing the authentication/ registration process.

On the other hand, n32-c introduces issues which must be mitigated through additional procedures. N32-c basically negates the advantage of working with a stateless protocol by introducing statefulness on the level of connections.

Using a separate TLS connection for n32-c introduces the need to correlate the N32 context withincoming n32-f TLS connections. The 3GPP specifications however don’t explain how to perform this correlation.

Technically speaking, correlation of TLS connections can only be done based on the client certificate, but how exactly is not specified. In any case, it means the client is already identified and there is no need for any client identification in n32-c.
Additional procedures must be introduced to solve N32-c race conditions and recovery situations.

Additional crosschecks must be introduced between N32-f and N32-c and between N32-c and certificates.
There are implications on architecture and scalability. By imposing the setup of an n32 context, SEPP architecture becomes more complex because all active N32 contexts need to be synchronised between every instance in a cluster.

Alternatively, N32-c may only be used for PRINS, but not TLS:

Negotiation of security policy is only needed for PRINS, which would anyway not involve the intermediate hops.
Identification of the peer relies on the client certificate. The certificate can still be checked against a specific trust anchor.
Traffic separation can be achieved by using different target FQDNs.

Header support can easily be exchanged in IR.21.
Correlation is no longer necessary and crosschecks between n32-c and n32-f are no longer needed.
No additional procedures to solve race conditions and recovery.
No overhead due to N32-c procedures: exchange of useless information, explicit tear down of contexts…
N32-f stability issues don’t cascade back to N32-c re-establishment.
SEPP can be made fully stateless, meaning that individual instances representing the same FQDN can be easily introduced without the need to synchronise active contexts.