Bypassing the 64K pipe buffer limit

pipe-limit.txt

Contents

Python imports and setup:

>>> import sfst
>>> from sfst import Transducer, MakeCompactTransducer
>>> from sfst import compile

The 64K pipe buffer limit

The deadlock

SFST itself uses files to return result lists for methods like analyze_string, generate_string or generate. To map them to string lists, pysfst uses pipes for SFST to write to. On large or even infinite result lists like e.g. from xmor_transducer.generate(), this technique causes python to freeze when the buffer size is exceeded.

The writing end of the pipe blocks until the reading end begins to read, but the reading end will not begin to read because it waits for the blocked SFST method to accomplish its task.

The 64K limit

On standard 2.6 Linux kernels, the buffer size is 64 kilobytes. Although $ ulimit -a reports a pipe size of 8 blocks, the buffer size is not 4K, because the kernel dynamically allocates maximal 16 "buffer entries" which multiply out to 64K. These limits are hardcoded in

/usr/src/linux/include/linux/pipe_fs_i.h:6 #define PIPE_BUFFERS (16)

The bad news

There seems to be no way to directly circumvent that limitation without patching the kernel. For 2.4 kernels, there once was - as part of the "Linux Scalability Effort Homepage" - a patch for "Large Pipe Support", see http://lse.sourceforge.net/pipe/

There also seems to be no direct way for python or SFST to detect a pipe buffer deadlock and to raise an overflow exception.

The threading workaround

Under normal circumstances, e.g. when pysfst/SFST is used to analyse or generate natural language strings, a limit of 64K should be enough. But when experimenting with transducers, these deadlocks are very annoying because they require to kill Python from another terminal.

The Python way of things to do is not to optimize for speed, but for usability. For that reason, the default behaviour of the pysfst module is to use threading. For the unit test environment, the default behaviour is not to use threading and to check the threading machinery explicitly.

Using Transducer.threaded

For that reason, there is a threaded property for every Transducer class. The default value during tests is False, which means that no threading is used, and the kernel limitation applies:

>>> transducer_non_threaded = Transducer()
>>> transducer_non_threaded.threaded
False

The threading boolean can be set via the corresponding keyword constructor argument:

>>> transducer_keyword_threaded = Transducer(threaded=True)
>>> transducer_keyword_threaded.threaded
True

You can change the default by changing the sfst module attribute. Each newly created transducer will then be threaded, too. Already instantiated transducers remain as they were.

>>> sfst.default_threaded = True
>>> transducer_threaded = Transducer()
>>> transducer_threaded.threaded
True
>>> transducer_non_threaded.threaded
False
>>> Transducer.default_threaded = False   # cleanup

A CompactTransducer generated with MakeCompactTransducer inherits the threaded flag from its base class.

>>> compact_transducer_threaded = MakeCompactTransducer(transducer_threaded)
>>> compact_transducer_threaded.threaded
True

Adjusting the buffer_size

When using a threaded transducer, the transducer's buffer size limits the maximum number of bytes a SFST method can return. The default is 100 MB:

>>> transducer_threaded.buffer_size
104857600

A negative number sets an unlimited buffer size. As for threaded, the default size can be set by the default_buffer module attribute.

>>> sfst.default_buffer_size = -1
>>> transducer_unlimited = Transducer()
>>> transducer_unlimited.buffer_size
-1
>>> sfst.default_buffer_size = (1024 ** 2) * 100  # cleanup

If the buffer size is exceeded by the transducer, an exception is raised. For return lists, one more byte is required for each member.

>>> transducer_small = Transducer('12345')
>>> transducer_small.threaded = True
>>> transducer_small.buffer_size = 6
>>> transducer_small.analyze('12345')
['12345']
>>> try:
...     transducer_small.buffer_size = 5
...     transducer_small.analyze('12345')
... except sfst.BufferOverflowError, error:
...     print error
buffer_size: 5

Operators

For operators to respect the threaded/buffer_size attributes, SFST had to be patched to transfer the values to the newly created transducer.

For unary operators, the values just are copied:

>>> a1 = Transducer('a')
>>> a1.threaded = True
>>> a1.buffer_size = 7
>>> a1_complement = ~a1
>>> a1_complement.threaded
True
>>> a1_complement.buffer_size
7

For binary operators, the values are copied from the "larger" transducer. If both are threaded, the larger buffer_size is taken.

>>> a2 = Transducer('a')
>>> a2.threaded = True
>>> a2.buffer_size = 13
>>> a1_u_a2 = a1 | a2
>>> a1_u_a2.threaded
True
>>> a1_u_a2.buffer_size
13

If only one is threaded, this always counts as the "larger" transducer, even if the other one has the bigger buffer size.

>>> a1.threaded = True
>>> a2.threaded = False
>>> a1_u_left_a2 = a1 | a2
>>> a1_u_left_a2.threaded
True
>>> a1_u_left_a2.buffer_size
7

The other way round:

>>> a1.threaded = False
>>> a2.threaded = True
>>> a1_u_right_a2 = a1 | a2
>>> a1_u_right_a2.threaded
True
>>> a1_u_right_a2.buffer_size
13

If both transducers are not threaded, the result transducer is of course not threaded, too, and the buffer_size is taken from the left one.

>>> a1.threaded = False
>>> a2.threaded = False
>>> a1_u_left2_a2 = a1 | a2
>>> a1_u_left2_a2.threaded
False
>>> a1_u_left2_a2.buffer_size
7

The unsolved infinity problem

It is not possible to stop a child thread from the main process in python, especially not a C++ one. Therefore, if a transducer produces infinite output (when it is_infinitely_ambiguous), a BufferOverflowError is raised and the calling thread stops, but python will not terminate no more because of the still running child process.

The performance downside

When using no pipe_limit, the results from an analyse/generate call are immediately processed. But when using threading, python needs to wait at least 1 tick for the thread to finish. Thus the minimal method call time is 1 tick, and this is usually 0.001 seconds on a modern desktop machine (see the CONFIG_HZ section in the kernel .config).

For threading to work at all, the interface has been generated with the -threads swig option. This somewhat slows down the wrapper code, see the CHANGES.current for details.

Threaded doctest

Because threading should not affect the behaviour of the transducer, any doctest except this one are executed twice, once with threading globally disabled and once with threading globally enabled.

For that reason, any of these doctests include the setup_sfst() function at the beginning which sets the threaded module attribute for that test.

Hanging threads

There is one doctest in transducer.txt (at the end) which leaves a running Producer thread behind. Exiting Python the normal way does not work no more. To enforce exit, use this idiom:

os.kill(os.getpid(), signal.SIGTERM)