How to (can I) ask a PIPE how many bytes it has available for reading?

How to (can I) ask a PIPE how many bytes it has available for reading?

I've implemented a non-blocking reader in Python, and I need to make it more efficient.

The background: I have massive amounts of output that I need to read from one subprocess (started with Popen()) and pass to another thread. Reading the output from that subprocess must not block for more than a few ms (preferably for as little time as is necessary to read available bytes).

Currently, I have a utility class which takes a file descriptor (stdout) and a timeout. I select() and readline(1) until one of three things happens:

select()

readline(1)

Then I return the buffered text to the calling method, which does stuff with it.

Now, for the real question: because I'm reading so much output, I need to make this more efficient. I'd like to do that by asking the file descriptor how many bytes are pending and then readline([that many bytes]). It's supposed to just pass stuff through, so I don't actually care where the newlines are, or even if there are any. Can I ask the file descriptor how many bytes it has available for reading, and if so, how?

readline([that many bytes])

I've done some searching, but I'm having a really hard time figuring out what to search for, let alone if it's possible.

Even just a point in the right direction would be helpful.

Note: I'm developing on Linux, but that shouldn't matter for a "Pythonic" solution.

Here is a utility that you should know about: pipe viewer
– wim
Nov 19 '13 at 17:52

3 Answers
3

On Linux, os.pipe() is just a wrapper around pipe(2). Both return a pair of file descriptors. Normally one would use lseek(2) (os.lseek() in Python) to reposition the offset of a file decsriptor as a way to get the amount of available data. However, not all file descriptors capable of seeking.

os.pipe()

os.lseek()

On Linux trying lseek(2) on a pipe will return an error, see the manual page. That's because a pipe is more or less a buffer between a producer and a consumer of data. The size of that buffer is system dependant.

On Linux, a pipe has a 64 kB buffer, so that is the most data you can have available.

Edit: If you can change the way your subprocess works, you might consider using a memory mapped file, or a nice big piece of shared memory.

Edit2: Using polling objects is probably faster than select.

So even though my class might generically be able to use lseek() to count the number of available chars in a file descriptor, I won't be able to do that if I'm passing it a pipe of stdout (from Popen()), right? Am I understanding you correctly?
– Matt
Nov 19 '13 at 17:58

@mHurley: Yes, lseek doesn't work on a pipe.
– Roland Smith
Nov 19 '13 at 17:59

Are there any other ways that I might ask a pipe how many bytes it has waiting, or is lseek() the only way that kind of thing is done?
– Matt
Nov 19 '13 at 18:01

AFAIK, lseek is basically the only way.
– Roland Smith
Nov 19 '13 at 18:07

This question seems to offer a possible solution, though it may require retooling.

Non-blocking read on a subprocess.PIPE in python

Otherwise, I assume you know about reading data N bytes at a time:

all_data = '' while True: data = pipe.read(1024) # Reads 1024 bytes or to end of pipe if not data: break all_data += data # Add your timeout break here

Won't pipe.read(1024) block until it gets 1024 bytes or throws an exception (like finding EOF)?
– Matt
Nov 19 '13 at 18:07

pipe.read(1024)

...the method in that other question seems interesting, but doesn't solve my particular problem. I have a non-blocking reader that works; I need to know if I can do a non-blocking read THIS way ;-)
– Matt
Nov 19 '13 at 18:11

Yes, it will block until the 1024 bytes is read. You can make 1024 as small as you like to make it highly unlikely (but not guaranteed) to exceed your timeout limit. But your readline(1) is also blocking, right? Albeit on a smaller scale.
– supergra
Nov 19 '13 at 22:42

Yes, that's exactly the problem I'm trying to solve. Because I have an unpredictable number of bytes to be read, there's no way to choose a perfect byte limit. I could choose a byte limit that on average neither blocks very many times, nor requires very many reiterations, but both of those conditions are undesirable. I'd rather read exactly the number of bytes that are waiting, and neither block, nor have to go back and get the rest. But it appears that's not possible...
– Matt
Nov 20 '13 at 20:34

@mHurley: fyi, os.read(fd, 8096) may return less than 8096 bytes i.e., a simple select() (to avoid blocking if there are zero bytes available in timeout seconds) + os.read() (to get the data available after the select()) might be enough. You could test whether epoll() produces better results in your case and try different buffersizes with os.read() (larger is not necessarily better).
– jfs
Jun 4 '16 at 20:30

os.read(fd, 8096)

8096

select()

os.read()

select()

epoll()

buffersize

os.read()

You can find this out by calling os.fstat(file_descriptor) and checking the st_size property, which is the number of bytes written.

import os reader_file_descriptor, writer_file_descriptor = os.pipe() os.write(writer_file_descriptor, b'I am some data') readable_bytes = os.fstat(writer_file_descriptor).st_size

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Ciugk