Capturing a webcam stream using v4l2

Capturing a webcam stream using v4l2

A few months ago, I came across a blog post dating back to 2013, which described the basics of v4l2, and how to capture a video frame from a camera in Linux. However, this article was missing a few pieces, and since practical (and simple) examples of v4l2 examples are really rare online, I thought I’d publish an article about it.

What’s v4l2?

v4l2 stands for Video For Linux 2, the second version of the V4L API and framework. As opposed to many driver implementations, the v4l2 framework is made an integral part of the Linux kernel code. This static integration has been criticised by pro-BSDs, and several analogue projects were created for V4L2 on BSD (such as Video4BSD), however nothing came to an end (yet). The V4L2 API allows you to manipulate various video devices, for capture as well as for output. The API is also capable of handling other kind of devices such as TV tuners, but we’ll stick to webcams here.

This API is mostly implemented as a set of IOCTL calls for you to make to your video devices. Once you’ve understood the general mechanism, and know a few IOCTLs, you’ll be able to manipulate your camera with a certain ease.

Common implementation of a v4l2 application

Aside from the parts strictly related to device communication, v4l2 expects you to rely on a few other system calls. In this article, we’ll go through the following steps:

  1. Open a descriptor to the device. This is done UNIX-style, basic I/O.
  2. Retrieve and analyse the device’s capabilities. V4L2 allows you to query a device for its capabilities, that is, the set of operations (roughly, IOCTL calls) it supports. I’ll give a little bit more details about that later.
  3. Set the capture format. This is where you choose your frame size, your format (MJPEG, RGB, YUV, …), and so on. Again, the device must be able to handle your format. There is an IOCTL call which allows to retrieve a list of available formats (which are independent from the device’s capabilities), I’ll give you a little example.
  4. Prepare the device for buffer handling. When capturing a frame, you have to submit a buffer to the device (queue), and retrieve it once it’s been filled with data (dequeue). However, before you can do this, you must inform the device about your buffers (buffer request).
  5. For each buffer you wish to use, you must negotiate characteristics with the device (buffer size, frame start offset in memory), and create a new memory mapping for it.
  6. Put the device into streaming mode.
  7. Once your buffers are ready, all you have to do is keep queueing/dequeuing your buffers repeatedly, and every call will bring you a new frame. The delay you set between each frames by putting your program to sleep is what determines your FPS (frames per second) rate.
  8. Turn off streaming mode.
  9. Close your descriptor to the device.

Note (see comments for more information) : depending on your device, this routine might not work for you. In some cases, devices cannot be put into streaming mode if no buffer is queued. In this case, you’ll have to queue a buffer, switch streaming on, dequeue/queue in a loop, and switch streaming off. More information about this will be given further down.

Each of these steps is covered by a system calls or a set of IOCTL calls. However, first things first, you need to know how to make an IOCTL call to a device. Consider a descriptor stored in fd, you may use the ioctl system call as follows:

  •  MY_REQUEST is your IOCTL call. It’s a integer, and V4L2 provides you with constants which map these numbers to readable forms. For example, VIDIOC_QUERYCAP is used to retrieve the device’s capabilities.
  • Depending on the request you’re submitting, you may need to pass additional parameters along. In most cases, you have to submit the address of a data structure through which you’ll be able to read the result of your query. The above VIDIOC_QUERYCAP requires one parameter: a pointer to v4l2_capability structure.

The IOCTL calls we’ll be using in this article return 0 on success, and a negative value otherwise.

Open and close a descriptor to the device

Those are easy, so let’s get over it quickly. Your file descriptor can be obtained just like any other using open, and disposed of using close, two basic UNIX I/O system calls:

Note that we need both read and write access to the device.

Retrieve the device’s capabilities

While v4l2 offers a generic set of calls for every device it supports, it is important to remember that not all devices can provide the same features. For this reason, the first step here will be to query the device about its capabilities and details. This is done through the VIDIOC_QUERYCAP request. Note that every v4l2-compatible device is expected to handle at least this request.

When this request succeeds, the v4l2_capability structure if filled with information about the device:

  • driver: The name of the driver in use while communicating with the device.
  • card: The name of the device in use.
  • bus_info: The location of the device in the eye of the operating system, in our case, /dev/video0.
  • version: Your driver’s version number.
  • capabilities: A 32-bit longer integer withholding your device’s capabilities (one bit per capability). You may find the list of all possibles capabilities here. You can use a bitwise & to check for a particular one:

There are few other fields but I’ll stop here. If you’re interested, you’ll find more details in the links above. Now when it comes to capabilities, it’d be nice to check for the following:

  • V4L2_CAP_VIDEO_CAPTURE : we need single-planar video capture, because… we’re capturing video frames.
  • V4L2_CAP_STREAMING : we need the device to handle frame streaming so that our queue/dequeue routine can go fluently.

If your application has more specific needs, don’t hesitate to use the table linked above to check for more capabilities. You may also use the card and bus_info fields if you have several devices available and want the user to choose by name and path.

Set our video format

Once we’ve made sure that our device knows the basics, we need to set our frame format. Note that this format must be made available by your device. If you don’t want to list formats programmatically, I suggest you use v4l2-ctl which will do that for you just fine:

This will give you a list of all available formats. Once you’ve chosen yours, you’ll need to use VIDIOC_S_FMT (set format) to tell your device. This is done using a v4l2_format structure:

  •  type: remember that V4L2 can handle all kinds of devices. It’d be nice to tell it we’re doing video capture.
  • fmt.pix.pixelformat: this is your frame format (RGB, YUV, MJPEG, …). v4l2-ctl told you which ones you had available, at which resolutions.
  • fmt.pix.widthfmt.pix.height: your frame dimensions. Again, must be handled by your device, for the format you chose.

In my case, I chose MJPEG because it is extremely easy to display using the SDL. Plus, it takes less memory than RGB and YUV. As far as I know, MJPEG is supported by many cameras. Also remember that these parameters have a direct influence on the amount of memory you’ll have to request for the buffers later on. For instance, for a 800×600 RGB24 frame, you’ll store 800×600 = 480000 pixels, each one requiring 3 bytes (R, G, B). All in all: 1440000 bytes (about 1.5MB) per buffer.

Retrieving all available formats programmaticallyv4l2-ctl uses the VIDIOC_ENUM_FMT call to list your formats. You will find more information about this call (and its fellow v4l2_fmtdesc structure) on this page. To browse all formats, declare your first structure with .index = 0 and keep incrementing until your ioctl returns EINVAL. Additionally, you might want to have a look at VIDIOC_ENUM_FRAMESIZES to retrieve information about the resolutions supported by your formats.

Inform the device about your future buffers

This step is quite simple, but it’s still necessary: you need to inform the device about your buffers: how are you going to allocate them? How many are there? This will allow the device to write buffer data correctly. In our case, we’ll use a single buffer, and map our memory using mmap. All this information is sent using the VIDIOC_REQBUFS call and a v4l2_requestbuffers structure:

  • type: again, the kind of capture we’re dealing with.
  • memory: how we’re going to allocate the buffers, and how the device should handle them. Here, we’ll be using memory mapping, but you’ll find that there are a few other options available.
  • count: our buffer count, one here (no need to make it trickier by adding buffers for now).

Allocate your buffers

Now that the device knows how to provide its data, we need to ask it about the amount of memory it needs, and allocate it. Basically, the device is making the calculation I made above, and telling you how many bytes it need for your format and your frame dimensions. This information is retrieved using the VIDIOC_QUERYBUF call, and its v4l2_buffer structure.

A little difference here: I’m clearing the structure’s memory space before using it. In this case, we won’t be the only one writing into this structure, so will the device. For this reason, since all fields aren’t initialised by the programmer, it’s best to clean up garbage first. Just like before, we tell the device about our video capture and memory mapping. The index field is the index of our buffer: indices start at 0, and each buffer has his own. Since we’ve only got one buffer, there is no need to put that code into a loop. Usually, you’d iterate from 0 to bufrequest.count (which may have changed after the IOCTL if the device didn’t like it!) and allocate each buffer, one after the other.

Now, once this call has been made, the structure’s length and m.offset fields are ready. We can therefore map our memory:

Since memory mapping is a large topic, here is a link to mmap‘s man page, and more information about memory mapping under Linux (Linux Device Drivers, J. Corbet, A. Rubini, G. Kroah-Hartman). You don’t have to know everything about that to go further, but that’s another fascinating subject if you’re curious.

Here again, think about cleaning up the area. Your frame is going to be stored in there, you don’t want garbage messing around.

Get a frame

This is the part of the code you might want to put in a temporised loop. For this article, I’ll just retrieve one frame from the device and terminate. This is done in three steps:

  1. Prepare information about the buffer you’re queueing. This requires another v4l2_buffer structure we saw above, nothing new. This helps the device locating your buffer.
  2. Activate the device’s streaming capability (which we checked earlier through v4l2_capabilities).
  3. Queue the buffer. You’re basically handing your buffer over to the device (putting it into the incoming queue), and wait for it to write stuff in it. This is done using the VIDIOC_QBUF call.
  4. Dequeue the buffer. The device’s done, you may read your buffer. This step is handled using the VIDIOC_DQBUF call: you’re retrieving the buffer from the outgoing queue. Note that this call may hang a little: your device needs time to write its frame into your buffer, as said in the documentation:

By default VIDIOC_DQBUF blocks when no buffer is in the outgoing queue. When the O_NONBLOCK flag was given to the open() function system call, VIDIOC_DQBUF returns immediately with an EAGAIN error code when no buffer is available.

Again, this part of your code should be in a loop if you’re using several buffers (increment bufferinfo.index) :

Once the VIDIOC_DQBUF ioctl call has successfully returned, you’re buffer(s) is/are filled with your data. In my case, I have a beautiful MJPEG frame ready to be processed. If you’re using RGB or YUV, you are now able to get color information about everything single pixel of your frame: we did it!

Note (see comments for more information) : as I said earlier, this routine (stream on, queue/dequeue, stream off) might not work for you. Some devices will refuse to get into streaming mode if there isn’t already a buffer queued. In this case, your program should look more like this:

(of course, you might need another loop in the first one if you decided to use several buffers)

In the example I gave throughout this article, you should now be closing your descriptor to the device. However, here are two little complements if you’re using MJPEG as I was: print your frame to a JPEG file and display your frame into a SDL frame. I’ll assume you know about basic UNIX I/O routines and SDL mechanisms, since this isn’t the topic of this article.

Bonus: save your frame as a JPEG file

MJPEG is nothing but an animated extension of the JPEG format, a sequence of JPEG images. Since we captured a single frame here, there is no real MJPEG involved: all we have is a JPEG image‘s data. This means that if you want to transform your buffer into a file… all you have to do is write it:

Now, since we’re only writing the JPEG data, and not the associated metadata, it is very likely that your image reader will refuse to display anything, claiming it was unable to determine the frame’s dimensions. I’m giving you a little piece of the puzzle, but I didn’t try to write the JPEG’s metadata myself, since this wasn’t part of my program’s needs. Note that some readers will allow you to specify your image’s width.

Bonus: displaying a MJPEG frame with the SDL

The SDL (1.2) has a very interesting feature: it can display a frame using a MJPEG directly! No need to convert your image, or to make it go through never-ending processing, all you have to do is provide SDL with your buffer, your dimensions, and it’ll do the rest. For that, we’ll need both the SDL and the SDL Image Library (SDL_Image). The basic setup is as follows:

  1. Initialise the SDL, the screen surface and SDL_Image.
  2. Create a I/O stream (RWops) associated with your buffer.
  3. Create a SDL surface using the previous stream as a data source.
  4. Blit the surface wherever you want your frame to be.
  5. Flip the screen!

And there you go, don’t forget your -lSDL and -lSDL_Image switches so that your link editor succeeds. You should now be able to see an SDL window with your frame in it. Add some loops in your code, and you’ll build yourself a simple camera streamer! If you need more information about this API/framework, here is a link to the documentation I used to write this article. Don’t hesitate to go through it if you have time!

Any way, that’s pretty much all I wanted to cover today. See you later!