Embedded Video Metadata

On Parrot drones, both the streamed and the recorded video embed metadata that are publicly accessible, allowing advanced processing from aerial videos.

Two types of metadata are available: frame metadata (timed) and session metadata (untimed).

Frame metadata

Frame metadata, also called timed metadata, are data that vary across time, and are synchronized with the video frames. The purpose is to allow an easy access to image and flight data for users, by multiplexing the timed data with the recorded video in MP4 files and RTP streams.

The available data include flight telemetry (drone location, speed, attitude, etc.), picture information (frame orientation, field-of-view, exposure time, etc.) and general drone information (radio signal strength, battery level, etc.).

Possible use cases include overlaying flight data on videos (HUD), computer vision algorithms or augmented reality applications. Be creative! Below is an example of a HUD displayed by the pdraw program using the --hud 0 command-line option:

_images/hud.jpg

Note

The format v1 has been used since Bebop / Bebop 2 firmware 3.2 and is not used in firmware 4.0 and later.

The format v2 is used since Bebop / Bebop 2 firmware 4.0, Disco firmware 1.1, Bebop-Pro Thermal and Bluegrass.

The format v3 is used on Anafi and Anafi Thermal.

Integration in the MP4 file

The MP4 specification (MPEG-4 part 12: ISO base media file format, ISO/IEC 14496-12) allows timed metadata to be embedded in a dedicated track. Here are the specification of the embedded metadata track:

  • track header (tkhd) : volume = 0, width = 0, height = 0
  • track reference (tref) : reference_type = cdsc, track_IDs[0] = video track ID
  • handler reference (hdlr) : handler_type = meta
  • null media header nmhd
  • sample description (stsd) : type TextMetaDataSampleEntry, mime_format = application/octet-stream;type=<mime_type>, where <mime_type> is:
    • com.parrot.videometadata1 for format v1
    • com.parrot.videometadata2 for format v2
    • com.parrot.videometadata3 for format v3

The MIME type in the sample description box is enough to describe the metadata format. Further modification to the data format will require an update of the MIME type.

The frame capture timestamp must be included in the timed data to allow offline synchronization even if the metadata has been extracted from the MP4 file. In version 1 format, the timestamp is available in the metadata structure; in versions 2 and 3, the timestamp is available in a timestamp extension structure. Similarly, the H.264 bitstream must contain frame capture timestamps; for this purpose, the timestamps are included in H.264 picture timing SEI.

Integration in the RTP stream

The RTP protocol (RFC 3550) allows RTP packet header extensions of a custom format when the X bit is set (bit 3).

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Consequently, the frame metadata are embedded in the RTP stream as packet header extensions with a known 16 bits defined by profile field value.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      defined by profile       |           length              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        header extension                       |
|                             ....                              |

The defined by profile field is used to recognize the format and version of the data and can have the following values:

  • 0x5031, i.e. P1 in ASCII, for format v1
  • 0x5032, i.e. P2 in ASCII, for format v2
  • 0x5033, i.e. P3 in ASCII, for format v3

Any further modification to the data format will require to update the defined by profile field value.

To minimize the network overhead, the timed metadata is not sent in the header extension of every packet of a frame, but in a limited number of packets: 1 at least, 2 optionally for packet loss resilience.

It is not necessary to embed the frame capture timestamps in the metadata as the RTP headers are inseparable from the H.264 payload, and the capture timestamps are included in H.264 picture timing SEI.

Accessing the frame metadata with libvideo-metadata

The recommended way to access the frame metadata is by using Parrot’s libvideo-metadata (vmeta for short). This C library handles serializing and deserializing the metadata, presenting friendly C structures containing the metadata in the API.

One way to get vmeta_frame structures from a record or a stream is to use a video sink in PDrAW. Each frame output through the sink (either YUV or H.264) has a vmeta_frame structure if frame metadata were found. Still using PDrAW, vmeta_frame structures are also output in the external texture loading and overlay rendering callback functions.

Another way is to use libvideo-metadata’s executable tool named vmeta-extract. This tool takes as input an MP4 file or a *.pcap capture and outputs CSV, JSON and/or KML files.

If none of these options are suitable, libvideo-metadata’s API can be used to deserialize data from a record or stream by using the vmeta_frame_read() function. In the case of a record, the input to this function is an MP4 sample from the metadata track. In the case of a stream, the input to this function is an RTP packet header extension.

Version 1 binary format

The format below is the binary format as found in MP4 metadata track samples and RTP header extensions. This format is listed for informative purposes. The recommended way of accessing the metadata is to use libvideo-metadata.

Note

Version 1 of the metadata format is now deprecated and should not be used.

The version 1 format for recording has been used since Bebop / Bebop 2 firmware 3.2 and is not used in firmware 4.0 and later.

The version 1 format for streaming has been used since Bebop / Bebop 2 firmware 3.3 and is not used in firmware 4.0 and later.

Floating-point values are converted to fixed-point (for example Q4.12 means 16 bits total, 4 bits for the integer part and 12 bits for the decimal part) to optimize the compacity and ease the serialization. All data are serialized in network order.

In the case of streaming, the value 0x5031 (P1 in ASCII) is used as identifier (defined by profile field). Not all values are sent at each frame (GPS coordinates for example). 2 data structures are defined: a basic structure and an extended structure; these structures are differentiated using the header extension size (length field). The choice between basic or extended structure is made according to the following rule:

  • extended structure (56 bytes): 5 Hz
  • basic structure (28 bytes): the rest of the time

In the case of recording, no ASCII identifier exists. All values are available with each frame and the data always has the same size (60 bytes).

#define GPS_ALTITUDE_MASK   (0xFFFFFF00)  /* GPS altitude mask */
#define GPS_ALTITUDE_SHIFT  (8)           /* GPS altitude shift */
#define GPS_SV_COUNT_MASK   (0x000000FF)  /* GPS SV count mask */
#define GPS_SV_COUNT_SHIFT  (0)           /* GPS SV count shift */
#define FLYING_STATE_MASK   (0x7F)        /* Flying state mask */
#define FLYING_STATE_SHIFT  (0)           /* Flying state shift */
#define BINNING_MASK        (0x80)        /* Binning mask */
#define BINNING_SHIFT       (7)           /* Binning shift */
#define PILOTING_MODE_MASK  (0x7F)        /* Piloting mode mask */
#define PILOTING_MODE_SHIFT (0)           /* Piloting mode shift */
#define ANIMATION_MASK      (0x80)        /* Animation mask */
#define ANIMATION_SHIFT     (7)           /* Animation shift */

enum flying_state
{
    FLYING_STATE_LANDED = 0,       /* Landed state */
    FLYING_STATE_TAKINGOFF,        /* Taking off state */
    FLYING_STATE_HOVERING,         /* Hovering state */
    FLYING_STATE_FLYING,           /* Flying state */
    FLYING_STATE_LANDING,          /* Landing state */
    FLYING_STATE_EMERGENCY,        /* Emergency state */
};

enum piloting_mode
{
    PILOTING_MODE_MANUAL = 0,      /* Manual piloting by the user */
    PILOTING_MODE_RETURN_HOME,     /* Automatic return home in progress */
    PILOTING_MODE_FLIGHT_PLAN,     /* Automatic flight plan in progress */
    PILOTING_MODE_FOLLOW_ME,       /* Automatic "follow-me" in progress */
};

struct metadata_v1_recording
{
    uint32_t frame_timestamp_h;    /* Frame timestamp (µs, monotonic), high 32 bits */
    uint32_t frame_timestamp_l;    /* Frame timestamp (µs, monotonic), low 32 bits */
    int16_t  drone_yaw;            /* Drone yaw/psi (rad), Q4.12 */
    int16_t  drone_pitch;          /* Drone pitch/theta (rad), Q4.12 */
    int16_t  drone_roll;           /* Drone roll/phi (rad), Q4.12 */
    int16_t  camera_pan;           /* Camera pan (rad), Q4.12 */
    int16_t  camera_tilt;          /* Camera tilt (rad), Q4.12 */
    int16_t  frame_w;              /* Frame view quaternion W, Q4.12 */
    int16_t  frame_x;              /* Frame view quaternion X, Q4.12 */
    int16_t  frame_y;              /* Frame view quaternion Y, Q4.12 */
    int16_t  frame_z;              /* Frame view quaternion Z, Q4.12 */
    int16_t  exposure_time;        /* Frame exposure time (ms), Q8.8 */
    int16_t  gain;                 /* Frame ISO gain */
    int8_t   wifi_rssi;            /* Wifi RSSI (dBm) */
    uint8_t  battery_percentage;   /* Battery charge percentage */
    int32_t  gps_latitude;         /* GPS latitude (deg), Q12.20 */
    int32_t  gps_longitude;        /* GPS longitude (deg), Q12.20 */
    int32_t  gps_altitude_and_sv;  /* Bits 31..8 = GPS altitude (m) Q16.8, bits 7..0 = SV count */
    int32_t  altitude;             /* Altitude relative to take-off (m), Q16.16 */
    uint32_t distance_from_home;   /* Distance from home (m), Q16.16 */
    int16_t  x_speed;              /* X speed (m/s), Q8.8 */
    int16_t  y_speed;              /* Y speed (m/s), Q8.8 */
    int16_t  z_speed;              /* Z speed (m/s), Q8.8 */
    uint8_t  state;                /* Bit 7 = binning, bits 6..0 = flyingState */
    uint8_t  mode;                 /* Bit 7 = animation, bits 6..0 = pilotingMode */
};

struct metadata_v1_streaming_basic
{
    uint16_t specific;             /* Identifier = 0x5031 */
    uint16_t length;               /* Size in 32 bits words = 6 */
    int16_t  drone_yaw;            /* Drone yaw/psi (rad), Q4.12 */
    int16_t  drone_pitch;          /* Drone pitch/theta (rad), Q4.12 */
    int16_t  drone_roll;           /* Drone roll/phi (rad), Q4.12 */
    int16_t  camera_pan;           /* Camera pan (rad), Q4.12 */
    int16_t  camera_tilt;          /* Camera tilt (rad), Q4.12 */
    int16_t  frame_w;              /* Frame view quaternion W, Q4.12 */
    int16_t  frame_x;              /* Frame view quaternion X, Q4.12 */
    int16_t  frame_y;              /* Frame view quaternion Y, Q4.12 */
    int16_t  frame_z;              /* Frame view quaternion Z, Q4.12 */
    int16_t  exposure_time;        /* Frame exposure time (ms), Q8.8 */
    int16_t  gain;                 /* Frame ISO gain */
    int8_t   wifi_rssi;            /* Wifi RSSI (dBm) */
    uint8_t  battery_percentage;   /* Battery charge percentage */
};

struct metadata_v1_streaming_extended
{
    uint16_t specific;             /* Identifier = 0x5031 */
    uint16_t length;               /* Size in 32 bits words = 13 */
    int16_t  drone_yaw;            /* Drone yaw/psi (rad), Q4.12 */
    int16_t  drone_pitch;          /* Drone pitch/theta (rad), Q4.12 */
    int16_t  drone_roll;           /* Drone roll/phi (rad), Q4.12 */
    int16_t  camera_pan;           /* Camera pan (rad), Q4.12 */
    int16_t  camera_tilt;          /* Camera tilt (rad), Q4.12 */
    int16_t  frame_w;              /* Frame view quaternion W, Q4.12 */
    int16_t  frame_x;              /* Frame view quaternion X, Q4.12 */
    int16_t  frame_y;              /* Frame view quaternion Y, Q4.12 */
    int16_t  frame_z;              /* Frame view quaternion Z, Q4.12 */
    int16_t  exposure_time;        /* Frame exposure time (ms), Q8.8 */
    int16_t  gain;                 /* Frame ISO gain */
    int8_t   wifi_rssi;            /* Wifi RSSI (dBm) */
    uint8_t  battery_percentage;   /* Battery charge percentage */
    int32_t  gps_latitude;         /* GPS latitude (deg), Q12.20 */
    int32_t  gps_longitude;        /* GPS longitude (deg), Q12.20 */
    int32_t  gps_altitude_and_sv;  /* Bits 31..8 = GPS altitude (m) Q16.8, bits 7..0 = SV count */
    int32_t  altitude;             /* Altitude relative to take-off (m), Q16.16 */
    uint32_t distance_from_home;   /* Distance from home (m), Q16.16 */
    int16_t  x_speed;              /* X speed (m/s), Q8.8 */
    int16_t  y_speed;              /* Y speed (m/s), Q8.8 */
    int16_t  z_speed;              /* Z speed (m/s), Q8.8 */
    uint8_t  state;                /* Bit 7 = binning, bits 6..0 = flyingState */
    uint8_t  mode;                 /* Bit 7 = animation, bits 6..0 = pilotingMode */
};

Version 2 and version 3 binary formats

The formats below are the binary formats as found in MP4 metadata track samples and RTP header extensions. These formats are listed for informative purposes. The recommended way of accessing the metadata is to use libvideo-metadata.

Note

The version 2 format for both recording and streaming is used since Bebop / Bebop 2 firmware 4.0, Disco firmware 1.1, Bebop-Pro Thermal and Bluegrass.

The version 3 format for both recording and streaming is used on Anafi and Anafi Thermal.

Floating-point values are converted to fixed-point (for example Q4.12 means 16 bits total, 4 bits for the integer part and 12 bits for the decimal part) to optimize the compacity and ease the serialization. All data are serialized in network order.

The v2 and v3 structures are used for both recording and streaming metadata. The identifier id is 0x5032 (P2 in ASCII) for v2 format and 0x5033 (P3 in ASCII) for v3 format.

The version 2 and 3 base structures are compatible with future extensions using extension structures. The base structure length field gives the global size of the structure (base + extensions, in 32 bits words, excluding the id and length fields). If the base structure is followed by extensions, each extension starts with an extension identifier ext_id (to know the extension format) and an extension size ext_length in 32 bits words, excluding the ext_id and ext_length fields).

An application not compatible with an extension type shall skip the extension (reading the ext_length field) and read the next extension if other extensions are present. Extensions are required to be 4-bytes aligned; therefore, extension structures sizes are multiples of 4.

Only one extension of each type may be present in a metadata structure. A frame timestamp extension is defined with ext_id = 0x4531 (E1 in ASCII). For recording metadata, this extension should always be present. A follow-me extension is defined with ext_id = 0x4532 (E2 in ASCII) and can be available only with a version 2 base format, in both streams and records. An automation extension is defined with ext_id = 0x4533 (E3 in ASCII) and can be available only with a version 3 base format, in both streams and records.

#define ALTITUDE_MASK       (0xFFFFFF00)  /* Altitude mask */
#define ALTITUDE_SHIFT      (8)           /* Altitude shift */
#define GPS_SV_COUNT_MASK   (0x000000FF)  /* GPS SV count mask */
#define GPS_SV_COUNT_SHIFT  (0)           /* GPS SV count shift */
#define FLYING_STATE_MASK   (0x7F)        /* Flying state mask */
#define FLYING_STATE_SHIFT  (0)           /* Flying state shift */
#define BINNING_MASK        (0x80)        /* Binning mask */
#define BINNING_SHIFT       (7)           /* Binning shift */
#define PILOTING_MODE_MASK  (0x7F)        /* Piloting mode mask */
#define PILOTING_MODE_SHIFT (0)           /* Piloting mode shift */
#define ANIMATION_MASK      (0x80)        /* Animation mask */
#define ANIMATION_SHIFT     (7)           /* Animation shift */

enum flying_state
{
    FLYING_STATE_LANDED = 0,         /* Landed state */
    FLYING_STATE_TAKINGOFF,          /* Taking off state */
    FLYING_STATE_HOVERING,           /* Hovering state */
    FLYING_STATE_FLYING,             /* Flying state */
    FLYING_STATE_LANDING,            /* Landing state */
    FLYING_STATE_EMERGENCY,          /* Emergency state */
    FLYING_STATE_USER_TAKEOFF,       /* User take off state */
    FLYING_STATE_MOTOR_RAMPING,      /* Motor ramping state */
    FLYING_STATE_EMERGENCY_LANDING,  /* Emergency landing state */
};

enum piloting_mode
{
    PILOTING_MODE_MANUAL = 0,      /* Manual piloting by the user */
    PILOTING_MODE_RETURN_HOME,     /* Automatic return home in progress */
    PILOTING_MODE_FLIGHT_PLAN,     /* Automatic flight plan in progress */
    PILOTING_MODE_TRACKING,        /* Automatic tracking in progress */
    PILOTING_MODE_FOLLOW_ME = PILOTING_MODE_TRACKING,
    PILOTING_MODE_MAGIC_CARPET,    /* Automatic "magic carpet" test in progress */
    PILOTING_MODE_MOVE_TO,         /* Automatic "move to" in progress */
};

enum followme_anim
{
    FOLLOW_ME_ANIMATION_NONE = 0,  /* No animation in progress */
    FOLLOW_ME_ANIMATION_ORBIT,     /* Follow-me orbit animation in progress */
    FOLLOW_ME_ANIMATION_BOOMERANG, /* Follow-me boomerang animation in progress */
    FOLLOW_ME_ANIMATION_PARABOLA,  /* Follow-me parabola animation in progress */
    FOLLOW_ME_ANIMATION_ZENITH,    /* Follow-me zenith animation in progress */
};

enum automation_anim
{
    AUTOMATION_ANIMATION_NONE = 0,         /* No animation in progress */
    AUTOMATION_ANIMATION_ORBIT,            /* Orbit animation in progress */
    AUTOMATION_ANIMATION_BOOMERANG,        /* Boomerang animation in progress */
    AUTOMATION_ANIMATION_PARABOLA,         /* Parabola animation in progress */
    AUTOMATION_ANIMATION_DOLLY_SLIDE,      /* Dolly slide animation in progress */
    AUTOMATION_ANIMATION_DOLLY_ZOOM,       /* Dolly zoom animation in progress */
    AUTOMATION_ANIMATION_REVEAL_VERT,      /* Vertical reveal animation in progress */
    AUTOMATION_ANIMATION_REVEAL_HORZ,      /* Horizontal reveal animation in progress */
    AUTOMATION_ANIMATION_PANORAMA_HORZ,    /* Horizontal panorama animation in progress */
    AUTOMATION_ANIMATION_CANDLE,           /* Candle animation in progress */
    AUTOMATION_ANIMATION_FLIP_FRONT,       /* Front filp animation in progress */
    AUTOMATION_ANIMATION_FLIP_BACK,        /* Back flip animation in progress */
    AUTOMATION_ANIMATION_FLIP_LEFT,        /* Left flip animation in progress */
    AUTOMATION_ANIMATION_FLIP_RIGHT,       /* Right flip animation in progress */
    AUTOMATION_ANIMATION_TWISTUP,          /* Twist up animation in progress */
    AUTOMATION_ANIMATION_POSITION_TWISTUP, /* Postion twist up animation in progress */
};

struct metadata_v2_base
{
    uint16_t id;                   /* Identifier = 0x5032 */
    uint16_t length;               /* Structure size in 32 bits words excluding the id and length
                                    * fields and including extensions */
    int32_t  ground_distance;      /* Best ground distance estimation (m), Q16.16 */
    int32_t  latitude;             /* Absolute latitude (deg), Q10.22 */
    int32_t  longitude;            /* Absolute longitude (deg), Q10.22 */
    int32_t  altitude_and_sv;      /* Bits 31..8 = altitude (m) Q16.8, bits 7..0 = GPS SV count */
    int16_t  north_speed;          /* North speed (m/s), Q8.8 */
    int16_t  east_speed;           /* East speed (m/s), Q8.8 */
    int16_t  down_speed;           /* Down speed (m/s), Q8.8 */
    int16_t  air_speed;            /* Speed relative to air (m/s), negative means no data, Q8.8 */
    int16_t  drone_w;              /* Drone quaternion W, Q2.14 */
    int16_t  drone_x;              /* Drone quaternion X, Q2.14 */
    int16_t  drone_y;              /* Drone quaternion Y, Q2.14 */
    int16_t  drone_z;              /* Drone quaternion Z, Q2.14 */
    int16_t  frame_w;              /* Frame view quaternion W, Q2.14 */
    int16_t  frame_x;              /* Frame view quaternion X, Q2.14 */
    int16_t  frame_y;              /* Frame view quaternion Y, Q2.14 */
    int16_t  frame_z;              /* Frame view quaternion Z, Q2.14 */
    int16_t  camera_pan;           /* Camera pan (rad), Q4.12 */
    int16_t  camera_tilt;          /* Camera tilt (rad), Q4.12 */
    uint16_t exposure_time;        /* Frame exposure time (ms), Q8.8 */
    uint16_t gain;                 /* Frame ISO gain */
    uint8_t  state;                /* Bit 7 = binning, bits 6..0 = flyingState */
    uint8_t  mode;                 /* Bit 7 = animation, bits 6..0 = pilotingMode */
    int8_t   wifi_rssi;            /* Wifi RSSI (dBm) */
    uint8_t  battery_percentage;   /* Battery charge percentage */
};

struct metadata_v3_base
{
    uint16_t id;                   /* Identifier = 0x5033 */
    uint16_t length;               /* Structure size in 32 bits words excluding the id and length
                                    * fields and including extensions */
    int32_t  ground_distance;      /* Best ground distance estimation (m), Q16.16 */
    int32_t  latitude;             /* Absolute latitude (deg), Q10.22 */
    int32_t  longitude;            /* Absolute longitude (deg), Q10.22 */
    int32_t  altitude_and_sv;      /* Bits 31..8 = altitude (m) Q16.8, bits 7..0 = GPS SV count */
    int16_t  north_speed;          /* North speed (m/s), Q8.8 */
    int16_t  east_speed;           /* East speed (m/s), Q8.8 */
    int16_t  down_speed;           /* Down speed (m/s), Q8.8 */
    int16_t  air_speed;            /* Speed relative to air (m/s), negative means no data, Q8.8 */
    int16_t  drone_w;              /* Drone quaternion W, Q2.14 */
    int16_t  drone_x;              /* Drone quaternion X, Q2.14 */
    int16_t  drone_y;              /* Drone quaternion Y, Q2.14 */
    int16_t  drone_z;              /* Drone quaternion Z, Q2.14 */
    int16_t  frame_base_w;         /* Frame base view quaternion W (without pan/tilt), Q2.14 */
    int16_t  frame_base_x;         /* Frame base view quaternion X (without pan/tilt), Q2.14 */
    int16_t  frame_base_y;         /* Frame base view quaternion Y (without pan/tilt), Q2.14 */
    int16_t  frame_base_z;         /* Frame base view quaternion Z (without pan/tilt), Q2.14 */
    int16_t  frame_w;              /* Frame view quaternion W, Q2.14 */
    int16_t  frame_x;              /* Frame view quaternion X, Q2.14 */
    int16_t  frame_y;              /* Frame view quaternion Y, Q2.14 */
    int16_t  frame_z;              /* Frame view quaternion Z, Q2.14 */
    uint16_t exposure_time;        /* Frame exposure time (ms), Q8.8 */
    uint16_t gain;                 /* Frame ISO gain */
    uint16_t awb_r_gain;           /* White balance R/G gain, Q2.14 */
    uint16_t awb_b_gain;           /* White balance B/G gain, Q2.14 */
    uint16_t picture_hfov;         /* Picture horizontal FOV (deg), Q8.8 */
    uint16_t picture_vfov;         /* Picture vertical FOV (deg), Q8.8 */
    uint32_t link_quality;         /* Bits 31..8 = link goodput (kbit/s),
                                    * bits 7..0 = link quality (0-5) */
    int8_t   wifi_rssi;            /* Wifi RSSI (dBm) */
    uint8_t  battery_percentage;   /* Battery charge percentage */
    uint8_t  state;                /* Flying state */
    uint8_t  mode;                 /* Bit 7 = animation, bits 6..0 = pilotingMode */
};

struct metadata_ext
{
    uint16_t ext_id;               /* Extension structure id */
    uint16_t ext_length;           /* Extension structure size in 32 bits words excluding the
                                    * ext_id and ext_size fields */
    [...]                          /* Extension fields */
};

struct metadata_timestamp_ext
{
    uint16_t ext_id;               /* Extension structure id = 0x4531 */
    uint16_t ext_length;           /* Extension structure size in 32 bits words excluding the
                                    * ext_id and ext_size fields */
    uint32_t frame_timestamp_h;    /* Frame timestamp (µs, monotonic), high 32 bits */
    uint32_t frame_timestamp_l;    /* Frame timestamp (µs, monotonic), low 32 bits */
};

struct metadata_followme_ext
{
    uint16_t ext_id;               /* Extension structure id = 0x4532 */
    uint16_t ext_length;           /* Extension structure size in 32 bits words excluding the
                                    * ext_id and ext_size fields */
    int32_t  target_latitude;      /* Target latitude (deg), Q10.22 */
    int32_t  target_longitude;     /* Target longitude (deg), Q10.22 */
    int32_t  target_altitude;      /* Target altitude ASL (m) Q16.16 */
    uint8_t  followme_mode;        /* Follow-me feature bit field
                                    *  - bit 0: follow-me enabled (0 = disabled, 1 = enabled)
                                    *  - bit 1: mode (0 = look-at-me, 1 = follow-me)
                                    *  - bit 2: angle mode (0 = unlocked, 1 = locked)
                                    *  - bit 3-7: reserved for future use */
    uint8_t  followme_animation;   /* Follow-me animation (0 means no animation in progress) */
    uint8_t  reserved1;            /* Reserved for future use */
    uint8_t  reserved2;            /* Reserved for future use */
    uint32_t reserved3;            /* Reserved for future use */
    uint32_t reserved4;            /* Reserved for future use */
};

struct metadata_automation_ext
{
    uint16_t ext_id;                   /* Extension structure id = 0x4533 */
    uint16_t ext_length;               /* Extension structure size in 32 bits words excluding the
                                        * ext_id and ext_size fields */
    int32_t  framing_target_latitude;  /* Framing target latitude (deg), Q10.22 */
    int32_t  framing_target_longitude; /* Framing target longitude (deg), Q10.22 */
    int32_t  framing_target_altitude;  /* Framing target altitude ASL (m) Q16.16 */
    int32_t  flight_destination_latitude;   /* Flight destination latitude (deg), Q10.22 */
    int32_t  flight_destination_longitude;  /* Flight destination longitude (deg), Q10.22 */
    int32_t  flight_destination_altitude;   /* Flight destination altitude ASL (m) Q16.16 */
    uint8_t  automation_animation;     /* Automation animation (0 means no animation in progress) */
    uint8_t  automation_flags;         /* Automation features bit field
                                        *  - bit 0: follow-me enabled (0 = disabled, 1 = enabled)
                                        *  - bit 1: look-at-me enabled (0 = disabled, 1 = enabled)
                                        *  - bit 2: angle locked (0 = unlocked, 1 = locked)
                                        *  - bit 3-7: reserved for future use */
    uint16_t reserved;                 /* Reserved for future use */
};

Session metadata

Session metadata, also called untimed metadata, are data that do not vary across time i.e. that are constant during the lifetime of the video. The purpose is to give information about the device that produced the video.

The available data include general drone information (model name, serial number, software version, friendly name, etc.), flight context (media date, geotag, etc.) and time-invariant picture information (eg. picture field-of-view for drone models without a zoom feature).

Integration in the MP4 file

Session metadata are included in MP4 files as tags. Two methods of inclusion exist:

  • in a moov/udta/meta/ilst box with 4 characters keys (udta method, originally an iTunes feature)
  • in a moov/meta/ilst box with full keys (meta method, see the Apple’s QuickTime File Format Specification)

Note

The availability of session metadata in records depends on the products and firmware versions:

  • Bebop / Bebop 2 < 4.1 or Disco < 1.4: no session metadata available
  • Bebop / Bebop 2 >= 4.1, Disco >= 1.4, Bebop-Pro Thermal, Bluegrass, Anafi and Anafi Thermal: session metadata with both the udta and the meta methods

Integration in the RTP stream

Session metadata are included in streams according to two methods of inclusion:

  • in SDES items in RTCP compound packets with the sender reports sent periodically (see RFC 3550)
  • in SDP items (RFC 4566) in replies to RTSP DESCRIBE methods at the stream initialization when RTSP (RFC 2326) is supported (on Anafi and Anafi Thermal)

Note

The availability of session metadata in streams depends on the products and firmware versions:

  • Bebop / Bebop 2 < 4.1 or Disco < 1.4: no untimed metadata available
  • Bebop / Bebop 2 >= 4.1, Disco >= 1.4, Bebop-Pro Thermal, Bluegrass and Anafi < 1.5: session metadata in RTCP SDES items
  • Anafi >= 1.5 and Anafi Thermal: session metadata with both RTCP SDES items and SDP items methods

In RTCP SDES items the CNAME item is mandatory and is sent in every compound RTCP packet along with a sender report. Other items are not sent in every compound RTCP packet but periodically, for example every 10 seconds.

Some of these elements may not be present in a stream. For example, the take-off position may be unknown. Some of these values may be sent late or change within a streaming session. For example, the streaming session will likely be started before the take-off, therefore the take-off location will be known and sent later in RTCP SDES items. Likewise, multiple take-offs can occur in a same streaming session, a new take off location value can then occur in the stream RTCP SDES items. SDP items however are only sent in the reply to a DESCRIBE RTSP method at stream initialization, so the values are not updated later in the streaming session.

Accessing the session metadata with libvideo-metadata

The recommended way to access the session metadata is by using Parrot’s libvideo-metadata (vmeta for short). This C library handles serializing and deserializing the metadata, presenting a friendly C structure containing the metadata in the API.

One way to get the vmeta_session structure from a record or a stream is to use the pdraw_get_peer_session_metadata() / getPeerSessionMetadata() functions in PDrAW. Still using PDrAW, the vmeta_session structure is also output in the external texture loading and overlay rendering callback functions.

Another way is to use libvideo-metadata’s executable tool named vmeta-extract. This tool takes as input an MP4 file or a *.pcap capture and writes the session metadata found to the standard output.

If none of these options are suitable, libvideo-metadata’s API can be used to deserialize data from a record or stream by using the vmeta_session_recording_read(), vmeta_session_streaming_sdes_read() and vmeta_session_streaming_sdp_read() functions. In the case of a record, the input to the function is key/value pairs from the metadata tags in the MP4 file. In the case of a stream, the input to this function is either SDP items from the reply to an RTSP DESCRIBE, or SDES items from RTCP packets.

Session metadata definition

The following table lists the available session metadata for various drone models.

Any additional metadata present and not supported by the application must be ignored. Some of these elements may not be present in a file. For example, the take-off position may be unknown.

BB: Bebop
BB2: Bebop 2
D: Disco
BB-T: Bebop-Pro Thermal
BG: Bluegrass
ANA: Anafi
ANA-T: Anafi Thermal
Metadata Method Example BB / BB2 D BB-T BG ANA ANA-T
Friendly name Record: ‘udta’ with key ‘©ART’ (artist) “ANAFI-G059745” >= 4.1.0 >= 1.4.0 y y y y
Record: ‘meta’ with key “com.apple.quicktime.artist” >= 4.1.0 >= 1.4.0 y y y y
Stream: RTCP SDES item ‘NAME’ (id=2) >= 4.1.0 >= 1.4.0 y y y y
Stream: SDP session information (‘i=’) - - - - >= 1.5.0 y
Product maker Record: ‘udta’ with key ‘©mak’ “Parrot” >= 4.1.0 >= 1.4.0 y y y y
Record: ‘meta’ with key “com.apple.quicktime.make” >= 4.1.0 >= 1.4.0 y y y y
Stream: private SDES item (‘PRIV’, id=8) with prefix “maker” >= 4.1.0 >= 1.4.0 y y y y
Stream: SDP session-level attribute “X-com-parrot-maker” - - - - >= 1.5.0 y
Product model Record: ‘udta’ with key ‘©mod’ “Anafi” >= 4.1.0 >= 1.4.0 y y y y
Record: ‘meta’ with key “com.apple.quicktime.model” >= 4.1.0 >= 1.4.0 y y y y
Stream: private SDES item (‘PRIV’, id=8) with prefix “model” >= 4.1.0 >= 1.4.0 y y y y
Stream: SDP session-level attribute “X-com-parrot-model” - - - - >= 1.5.0 y
Product model ID Record: ‘meta’ with key “com.parrot.model.id” “0914” >= 4.1.0 >= 1.4.0 y y y y
Stream: private SDES item (‘PRIV’, id=8) with prefix “model_id” >= 4.1.0 >= 1.4.0 y y y y
Stream: SDP session-level attribute “X-com-parrot-model-id” - - - - >= 1.5.0 y
Serial number Record: ‘udta’ with key ‘©too’ “PI040416BA8G059745” >= 4.1.0 >= 1.4.0 y y y y
Record: ‘meta’ with key “com.parrot.serial” >= 4.1.0 >= 1.4.0 y y y y
Stream: RTCP SDES item ‘CNAME’ (id=1) >= 4.1.0 >= 1.4.0 y y y y
Stream: SDP session-level attribute “X-com-parrot-serial” - - - - >= 1.5.0 y
Software version Record: ‘udta’ with key ‘©swr’ “1.3.0” >= 4.1.0 >= 1.4.0 y y y y
Record: ‘meta’ with key “com.apple.quicktime.software” >= 4.1.0 >= 1.4.0 y y y y
Stream: RTCP SDES item ‘TOOL’ (id=6) >= 4.1.0 >= 1.4.0 y y y y
Stream: SDP session-level attribute “tool” - - - - >= 1.5.0 y
Software build ID Record: ‘meta’ with key “com.parrot.build.id” “anafi-4k-1.3.0” >= 4.1.0 >= 1.4.0 y y y y
Stream: private SDES item (‘PRIV’, id=8) with prefix “build_id” >= 4.1.0 >= 1.4.0 y y y y
Stream: SDP session-level attribute “X-com-parrot-build-id” - - - - >= 1.5.0 y
Video title Record: ‘udta’ with key ‘©nam’ “Sat, 06 Oct 2018 18:12:52 +0200” >= 4.1.0 >= 1.4.0 y y y y
Record: ‘meta’ with key “com.apple.quicktime.title” >= 4.1.0 >= 1.4.0 y y y y
Stream: private SDES item (‘PRIV’, id=8) with prefix “title” >= 4.1.0 >= 1.4.0 y y y y
Stream: SDP session name (‘s=’) - - - - >= 1.5.0 y
Comment Record: ‘udta’ with key ‘©cmt’ (unused) - - - - - -
Record: ‘meta’ with key “com.apple.quicktime.comment” - - - - - -
Stream: private SDES item (‘PRIV’, id=8) with prefix “comment” - - - - - -
Stream: SDP session-level attribute “X-com-parrot-comment” - - - - - -
Copyright Record: ‘udta’ with key ‘©cpy’ (unused) - - - - - -
Record: ‘meta’ with key “com.apple.quicktime.copyright” - - - - - -
Stream: private SDES item (‘PRIV’, id=8) with prefix “copyright” - - - - - -
Stream: SDP session-level attribute “X-com-parrot-copyright” - - - - - -
Run date Record: ‘meta’ with key “com.parrot.run.date” “2018-10-06 T18:11:04+02:00” >= 4.1.0 >= 1.4.0 y y y y
Stream: private SDES item (‘PRIV’, id=8) with prefix “run_date” >= 4.1.0 >= 1.4.0 y y y y
Stream: SDP session-level attribute “X-com-parrot-run-date” - - - - >= 1.5.0 y
Run ID Record: ‘meta’ with key “com.parrot.run.id” “B3891A8DE0A7FD32 0D4297E5386D9BF5” >= 4.1.0 >= 1.4.0 y y y y
Stream: private SDES item (‘PRIV’, id=8) with prefix “run_id” >= 4.1.0 >= 1.4.0 y y y y
Stream: SDP session-level attribute “X-com-parrot-run-id” - - - - >= 1.5.0 y
Boot ID Record: ‘meta’ with key “com.parrot.boot.id” “504F90CCA4428736 2F2E1C5EDA5860AB” - - - - y y
Stream: private SDES item (‘PRIV’, id=8) with prefix “boot_id” - - - - y y
Stream: SDP session-level attribute “X-com-parrot-boot-id” - - - - >= 1.5.0 y
Media date Record: ‘udta’ with key ‘©day’ “2018-10-06 T18:12:52+02:00” >= 4.1.0 >= 1.4.0 y y y y
Record: ‘meta’ with key “com.apple.quicktime .creationdate” >= 4.1.0 >= 1.4.0 y y y y
Stream: private SDES item (‘PRIV’, id=8) with prefix “media_date” - - - - y (replay) y (replay)
Stream: SDP session-level attribute “X-com-parrot-media-date” - - - - >= 1.5.0 (replay) y (replay)
Geotag / take-off location Record: ‘udta’ with key ‘©xyz’ in the ‘moov/udta’ box not the ‘moov/udta/meta/ilst’ box for Android compatibility; ISO 6709 Annex H string or latitude and longitude only (deprecated) “+16.42850589 -061.53569552+6.80/” or “+16.4285-061.5357/” >= 4.1.0 >= 1.4.0 y y y y
Record: ‘meta’ with key “com.apple.quicktime .location.ISO6709”; ISO 6709 Annex H string “+16.42850589 -061.53569552+6.80/” >= 4.1.0 >= 1.4.0 y y y y
Stream: RTCP SDES item ‘LOC’ (id=5); ISO 6709 Annex H string >= 4.1.0 >= 1.4.0 y y y y
Stream: SDP session-level attribute “X-com-parrot-takeoff-loc”; ISO 6709 Annex H string - - - - >= 1.5.0 y
Video mode Record: ‘meta’ with key “com.parrot.video.mode” “Standard”, “Hyperlapse” or “SlowMotion” - - - - >= 1.6.0 >= 1.6.0
Stream: private SDES item (‘PRIV’, id=8) with prefix “video_mode” - - - - >= 1.6.0 (replay) >= 1.6.0 (replay)
Stream: SDP session or media-level attribute “X-com-parrot-video-mode” - - - - >= 1.6.0 (replay) >= 1.6.0 (replay)
Picture HFOV / VFOV VFOV Record: ‘meta’ with key “com.parrot.picture.fov” or “com.parrot.picture.hfov” (deprecated) and “com.parrot.picture.vfov” (deprecated) “78.00,49.00” or “78.00” and “49.00” (deprecated) >= 4.1.0 >= 1.4.0 y y - -
Stream: private SDES item (‘PRIV’, id=8) with prefix “picture_fov” or “picture_hfov” (deprecated) and “picture_vfov” (deprecated) >= 4.1.0 >= 1.4.0 y y - -
Stream: SDP session or media-level attribute “X-com-parrot-picture-fov” - - - - - -
Thermal camera metadata version Record: ‘meta’ with key “com.parrot.thermal .metaversion” “1” - - y - - y
Stream: private SDES item (‘PRIV’, id=8) with prefix “thermal_metaversion” - - y - - y
Stream: SDP session or media- level attribute “X-com-parrot- thermal-metaversion” - - - - - y
Thermal camera serial number Record: ‘meta’ with key “com.parrot.thermal .camserial” “F07H7H00242” - - y - - -
Stream: private SDES item (‘PRIV’, id=8) with prefix “thermal_camserial” - - y - - -
Stream: SDP session or media- level attribute “X-com-parrot- thermal-camserial” - - - - - -
Thermal camera alignment data Record: ‘meta’ with key “com.parrot.thermal .alignment” “-0.870,0.318, 88.848” - - y - - y
Stream: private SDES item (‘PRIV’, id=8) with prefix “thermal_alignment” - - y - - y
Stream: SDP session or media- level attribute “X-com-parrot- thermal-alignment” - - - - - y
Thermal camera temperature conversion parameters Record: ‘meta’ with key “com.parrot.thermal.conv.low” for low-gain and “com.parrot .thermal.conv.high” for high-gain “1390082.947851, 1449.5,1.0,1476.356, 0.8,25.0,22.0,0.98” - - y - - -
Stream: private SDES item (‘PRIV’, id=8) with prefix “thermal_conv_low” for low-gain and “thermal_conv_high” for high-gain - - y - - -
Stream: SDP session or media- level attribute “X-com-parrot- thermal-conv-low” for low-gain and “X-com-parrot-thermal- conv-high” for high-gain - - - - - -
Thermal camera scale factor Record: ‘meta’ with key “com.parrot.thermal .scalefactor” “1.035156” - - y - - y
Stream: private SDES item (‘PRIV’, id=8) with prefix “thermal_scalefactor” - - y - - y
Stream: SDP session or media- level attribute “X-com-parrot- thermal-scalefactor” - - - - - y