WinCE용 FLV 컨테이너 파서(parser / splitter) 만들기..

개발자 일기

WinCE용 FLV 컨테이너 파서(parser / splitter) 만들기..

욱_스 2008. 10. 17. 19:13

회사 제품에 유투브 FLV 동영상을 바로 재생해보면 재미 있을 듯 해서,
약 하루 하고 반나절 정도 걸려서 FLV 파서(splitter)를 완성했다.

Video / Audio / Meta data 를 추출해내는 파서를 만들고,
추출된 Frame들을 tartget의 멀티미디어 플레이어의 디코더에 연결했다.

* 현재 파싱된 오디오 프레임은 정상적으로 재생된다.
(소리가 약간씩 끊기는 부분은 HDD쪽 버퍼링을 넣으면 될 듯하고,,)
문제는 비디오 프레임이 정상적으로 디코딩 되지 않아 화면이 나오지 않고 있다.
,,;;; 즉.. 소리만 나는거다.

이유는
standard H.263과 sorenson H.263 코덱 자체가 프레임 헤더 및 프레임 구조가 많이 다르기 때문.

Standard H.263에서 제거되어 Sorenson H.263에 없는 것.
- GOB (group of blocks) layer
- Split-screen indicator
- Document camera indicator
- Picture freeze release
- Syntax-based arithmetic coding
- PB-Frames
- Continous presence multipoint
- Overlapped block motion compensation

Sorenson H.263에 추가되어 Standard H.263에는 없는 것.
-Disposable frames (difference frames with no future dependencies)
-Arbitrary picture width and height up to 65535 pixels
-Unrestricted motion vetor support is always on
-A deblocking flag is available to suggest the use of a deblocking filter

* ITU H.263 Video Coding Standard 관련 문서를 첨부한다.

ITUH.263VideoCodingStandart.pdf

* Adobe 제공 Flash9, FLV file format Spec Documment.

video_file_format_spec_v9.pdf

현재 타겟이 DSP에서 standard h.263 decoding을 지원하니, 프레임 헤더나 프레임을
파싱 시에 살짝 바꿔서 standard h.263 디코딩을 이용하도록 해보려 했지만 어렵고..
아무래도 sorenson h.263 코덱을 arm 에서 돌리도록 해봐야겠다.

주요 decoding 부분을 arm asm으로 구현하고, resolution도 대부분이 320x240 전후 이니
어쩌면 A/V Sync가 밀리지 않을 정도의 퍼포먼스가 나올 수도 있을까 하는 생각도 해본다..

>>>>>>>>>>>>>>> 아래에 FLV와 관련한 내용을 정리한다.

FLV 컨테이너는 macromedia사에서 flash player plugin용으로 개발했다.
내부 비디오 코덱은 최초 Screen video 코덱을 이용하다가
VP6 코덱을 거쳐 현재는 h.263코덱이 주를 이루고 있으며,
flash player10 버전에서는 h.264 코덱을 사용한 HD무비를 웹을 통해 스트리밍 한다.
480p(848x520), 720p(1280x760), 1080p(1920x1080)
(macromedia사는 adobe사에 인수되었다.)
http://www.adobe.com/products/hdvideo/hdgallery/ <- HD 무비 샘플을 볼 수 있다.

flv 컨테이너가 지원하는 A/V 코덱 종류는 다음과 같다.

비디오.
1: JPEG (currently unused)
2: Sorenson H.263
3: Screen video
4: On2 VP6
5: On2 VP6 with alpha channel
6: Screen video version 2
7: AVC

오디오.
0 = Linear PCM, platform endian
1 = ADPCM
2 = MP3
3 = Linear PCM, little endian
4 = Nellymoser 16-kHz mono
5 = Nellymoser 8-kHz mono
6 = Nellymoser
7 = G.711 A-law logarithmic PCM
8 = G.711 mu-law logarithmic PCM
9 = reserved
10 = AAC
14 = MP3 8-Khz
15 = Device-specific sound

* 현재 웹상의 대부분의 flv는 VP6 또는 h.263 + mp3 의 조합 형태다.
* adobe에서는 h.264를 점차적으로 확대 진행 중이다.
* 번호가 순차적이지 않은 것은 adobe spec이다.

파싱에 있어서,
naver나 daum의 flv 와 youtube의 flv는 meta tag 부분이 약간 다르다.
메타태그 index들..

duration: (Number) Length of the FLV in seconds. FLVMDI computes this value.
lasttimestamp: (Number) TimeStamp of the last tag in the FLV file.

(You may think as 'duration = lasttimestamp + estimated or calculated duration of the last tag'. This is not exactly true though; FLVMDI calculates/estimates duration for last Audio and Video tags separately and uses the appropriate value for the duration. With version 2.0, FLVMDI is more conservative and duration will be equal to lasttimestamp more than 1.x version).

NetStream.Time will not return the duration, or the actual time as one might expect. It returns the position of the PlayHead which can only be a timestamp in the FLV file (see Position of the PlayHead). So, lasttimestamp value is useful for detecting the end of the FLV more than the duration value.
lastkeyframetimestamp: (Number) TimeStamp of the last video tag which is a key frame. This info might be needed because seeking a frame after this time usually does not work.
width: (Number) Width of the video in pixels. (Flash exporter 1.1 sets this to 0).
height: (Number) Height of the video in pixels. (Flash exporter 1.1 sets this to 0).
videodatarate: (Number) FLVMDI does not compute this value and imports it if present. (Defaults to 0).
audiodatarate: (Number) FLVMDI does not compute this value and imports it if present. (Defaults to 0).
framerate: (Number) FLVMDI computes this value, but uses imported value if not 0.
creationdate: (String) FLVMDI cannot compute this value and imports it if present. (Defaults to 'unknown').
filesize: (Number) Filesize in bytes (including the injected data).
videosize: (Number) Total size of video tags in the file in bytes.
audiosize: (Number) Total size of audio tags in the file in bytes.
datasize: (Number) Total size of data tags in the file in bytes.
metadatacreator: (String) Will be set to 'Manitu Group FLV MetaData Injector 2'.
metadatadate: (Date) Date and time metadata added. (Note that this is not of type string like 'creationdate').
xtradata: (string) Additional string data if specified.
videocodecid: (Number) Video codec ID number used in the FLV. (Sorenson H.263 =2, Screen Video =3, On2 VP6 = 4 and 5, Screen Video V2 = 6).
audiocodecid: (Number) Audio codec ID number used in the FLV. (Uncompressed = 0, ADPCM = 1, MP3 = 2, NellyMoser = 5 and 6).
audiodelay: (Number) Audio delay in seconds. Flash 8 encoder delays the video for better synch with audio (Audio and video does not start both at time 0, Video starts a bit later). This value is also important for Flash 8 Video Encoder injected Cue Points, because logical time of the cue points does not correspond to physical time they are inserted in the FLV. (Cue points are injected before encoding, when the video is shifted by 'audio delay' seconds, cue points are also shifted and their physical time in the FLV changes).
canSeekToEnd: (Boolean) True if the last video tag is a key frame and hence can be 'seek'ed.
keyframes: (Object) This object is added only if you specify the /k switch. 'keyframes' is known to FLVMDI and if /k switch is not specified, 'keyframes' object will be deleted.
'keyframes' object has 2 arrays: 'filepositions' and 'times'. Both arrays have the same number of elements, which is equal to the number of key frames in the FLV. Values in times array are in 'seconds'. Each correspond to the timestamp of the n'th key frame. Values in filepositions array are in 'bytes'. Each correspond to the fileposition of the nth key frame video tag (which starts with byte tag type 9).

저작자표시 비영리 변경금지 (새창열림)