Skip to end of metadata
Go to start of metadata

From Stereo/Surround to Immersive

An increasing number of audio recordings are produced not only in stereo and surround sound but also in “immersive audio” which positions sounds over the half- or full-sphere. The production of these recordings requires some special skills so immersive mixes are often created by specialist sound engineers and studio producers.

The granularity rules associated with the ISRC standard [link to ISRC handbook] mean an immersive recording and its stereo counterpart will have different ISRCs unless one has been derived from the other by a purely mechanical process (simple up-mix or down-mix for instance). Where immersive versions of a mix using different technologies are created separately, they usually have distinct creative input, have distinct identities and hence distinct ISRCs. Where the technology versions are derived from a single creative mix (for instance by transcoding or creation in a common authoring format), they share the ISRC of the original mix. This ensures that the respective contributors to the stereo and the immersive audio can be associated with the correct output.

The expression “an immersive recording and its stereo and surround sound counterpart” is meant to express the fact that the artist is working from recorded elements created together for the same project (and thus using the same underlying musical works) to create a directly comparable artistic experience. This is not therefore about creating “re-mixes” by subsequently adding or removing elements of the recording.

DDEX is using the term “edition” to express the relationship between these counterparts.

Metadata differences between these editions

The metadata differences between an immersive audio edition and its stereo and surround sound counterparts are, however, not limited to just the ResourceId. In addition, these (and only these) tags might change:

  • Duration (the fade-out might, for example, be a few seconds longer);
  • PLine (the immersive audio recording might have been created a few weeks or months earlier or later than the stereo and/or surround editions);
  • Contributor (there may be a small difference in the contributor list because different sound engineers or producers may have helped to create the different editions. The majority of contributors, however, would be the same for all editions; and
  • TechnicalDetails (if only to point to a different File).

Details on how this can be communicated using ERN is described further below.

New and back catalogue albums

More and more albums are created in stereo and surround sound as well as immersive audio from the outset. However, there are also many releases that are provided, initially, in stereo and/or surround sound only. Some of these are then subsequently, extended to be made available in an immersive audio edition. (The reverse is of course also possible.)

The same applies to back-catalogue albums. What remains fixed in circumstances where the immersive audio is created at some point after the original stereo or surround sound is the track listing, i.e. the order of the titles of the individual sound recordings that make up the album remains the same.

In this context it is important to note that the data shown to the consumer may be structured in a different way than how the data is sent – as long as all necessary data is in the ERN message that allows the DSP to make the necessary adjustments.

Two Approaches for Communicating Editions

As highlighted in the introduction two different practices for communicating such stereo, surround sound and immersive audio albums have emerged and users of the ERN standard may need to be able to handle both. These practices are discussed below using a case of a two-track album that is available in stereo only and for which two immersive audio tracks are subsequently added.

Approach 1 – Extend the existing release (ERN 4.3 and later)

Let’s assume an original release with a ReleaseId (UPC) of 1234567890 containing two stereo sound recordings called “One” and “Two” with ISRCs DE-000-20-11111 and DE-000-20-22222 respectively. These two recordings will be communicated in a SoundRecording composite each.

ERN 4.3’s SoundRecording composite has been adapted, however, to support the distribution of immersive audio tracks: A new composite SoundRecordingEdition has been added into the SoundRecording composite. This composite is mandatory and contains, basically, the tags listed above as being potentially different between different editions of a recording. Consequently, each of the stereo-only recordings of the above example will be described by one SoundRecording composite containing one SoundRecordingEdition composite each.

This is depicted on the left in the diagram below. The diagram also shows how to communicate a display artist (Björn) and contributors (Björn and Benny as the recording artist, Agnetha as the engineer/mixer for the stereo variants and Frida as the engineer/mixer for the immersive mixes):

 

When the new immersive audio editions are created, the metadata in which they differ from their stereo counterparts needs to be placed into a second SoundRecordingEdition composite for each of the sound recordings. This includes the two new ISRCs (DE-000-21-00001 and DE-000-21-00002). The Release composite itself is not changed at all and would always point to the ResourceReference of the SoundRecordings. This is depicted above on the right.

(In ERN 4.2 only the new Resource identifiers can be communicated in the TechnicalDetails/EncodingId composite.)

Resource files would be linked from the SoundRecording/SoundRecordingEdition/TechnicalDetails composite for the two editions of the recordings respectively. Their communication would not be any different from “standard releases”.

 

Sample

A valid XML example is provided here.

Approach 2 – Create separate releases (all versions of ERN)

Let’s again assume an original release with a ReleaseId/UPC of 1234567890 containing two stereo sound recordings called “One” and “Two” with ISRCs DE-000-20-11111 and DE-000-20-22222 respectively. The new immersive audio recordings, DE-000-21-00001 and DE-000-21-00002 were created to allow consumers to experience the songs through immersive audio. The communication of these four recordings would be in four SoundRecording composites with one SoundRecordingEdition composite each. (There is no difference to the stereo-only release in Approach 1 is communicated.)

To release these new tracks, a new release with the same title as the stereo release (albeit potentially a different subtitle) is created. This new release will have to have a different ReleaseId/UPC (e.g. 0987654321) and reference the two new immersive audio recordings. This approach is in line with the GRid rules.

The two pairs of recordings and releases will be linked to each other with RelatedResource and RelatedRelease composites. The appropriate RelationshipTypes are IsImmersiveEditionOf and IsNonImmersiveEditionOf.

This is depicted on the right in the diagram below. The diagram also shows how to communicate a display artist (Björn) and contributors (Björn and Benny as the recording artist, Agnetha as the engineer/mixer for the stereo variants and Frida as the engineer/mixer for the immersive mixes):

 

Despite the ERN message providing two releases with two sets of recordings, the receiving DSP would be able to merge these two releases into one view for the consumer – as long as all the release metadata between the stereo and the immersive audio release, as well as between the two pairs of resources, are the same – except for the following seven fields where the releases and/or resources may differ:

  • Release/ReleaseId;
  • Release/Title/SubTitle;
  • Release/CLine;
  • SoundRecording/Duration;
  • SoundRecording/SoundRecordingEdition/PLine;
  • SoundRecording/SoundRecordingEdition/EditionContributor; and
  • SoundRecording/SoundRecordingEdition/TechnicalDetails.

Two additional fields, Release/ReleaseReference and Resource/ResourceReference must of course differ between the editions as they act as local anchors for Deals.
Resource files for all editions are then linked from the SoundRecording/TechnicalDetails composite for the various sound recordings.

 

Sample

A valid XML example is provided here.

Pros and cons of the two approaches

The first approach avoids the need for any data duplication, thus avoiding all the possible errors that such circumstances can lead to. It is therefore seen as the simpler approach.
The simplicity of the first approach means, however, that releases can change over time, if only in the very limited circumstances described above. This is breaking a tenet of how DDEX has approached the communication of data about releases and some other industry standards such as the GRid. It also means that providing a release’s identifier alone is no longer sufficient to identify the content of the release. The addition of immersive audio recordings to a release means that a specific release may contain ten sound recordings on day one, twelve recordings on day two and nineteen recordings on day three -- all with potentially different credits for royalty payments.

This may have an impact on various processes in the music industry value chain. Sales/usage reporting to music licensing companies, for instance, may be impacted. Reporting to musical works licensors is unlikely to be affected in the circumstances discussed above as the change in recordings does not entail a change of the underlying musical works.

Linking immersive audio and stereo/multichannel Audio

For the approach of creating separate releases for immersive and for stereo/multichannel it is essential to create links between the resource and the releases using the RelatedResource and RelatedRelease composites. The appropriate RelationshipTypes are IsImmersiveEditionOf and IsNonImmersiveEditionOf.

However, it may also be beneficial for record companies to include such relationships if, for example, a stereo edition of a specific recording is used on a sampler A and an immersive edition of the same recording is used on a sampler B.

  • No labels