Discussion:
[pyusb-users] problems with write timeouts.
Cameron Davidson
2016-11-21 07:48:13 UTC
Permalink
Hello,
This is a plea for help, but also a possible bug in error reporting by
pyusb (when using libusb0).

Sorry it is so long, but I did not know which bits of information you
would want. The symptoms of my problems are at the end.

=================== background

I am having trouble getting an Oregon Scientific weather station WMR300
to run reliably under Linux using PyUSB under the weewx package.
There are several users with this device, and reliability seems to vary
a bit depending on the versions of OS, libraries and so on.
The problems are manifesting on more than one system.
I have also tried a standard CentOS 6.8 desktop system as a vmware guest
vm under Windows-7, with similar results.

The background is that the device has no vendor documentation, and so
has been reverse engineered from communications with their MS-Windows
software.

My system:
Intel Xeon E-3 mainboard running CentOS 6.8, 64-bit
The kernel is from the Xen repo, at version 3.18.41-20.el6.x86_64 but
it is not running under Xen.
Python-2.6.6-66.el6_8.x86_64
libusb1-1.0.9-0.7.rc1.el6.x86_64
libusb-0.1.12-23.el6.x86_64
Pyusb is from easy_install - __init__.py says it is v1.0.0 ( I have
diffed against the current version and, apart from spelling changes,
there are only a couple of lines in legacy.py that are different.

I have used wireshark and usbpcap under windows 7 to look at the system
when it works with the vendor software, and to establish the packet
contents.
Everything seems to be interrupt mode input and output.
Under Linux I used tcpdump usbmon2 to get captures to compare. I then
look at the capture files in Wireshark for Windows.

I have done zero USB programming before, so I am not sure what is expected.
The captures look quite different, where the Linux dumps show twice as
many packets as the windows captures. I assume this is an
artifact/limitation of the usbpcap process, where the handshaking and
data transfer is blurred into a single packet. Wireshark refers to this
as a usbpcap pseudoheader, so I cannot even tell if I am communicating
under the same conditions.

Original code in the weewx package was written using pyusb in legacy
mode, but I ran into problems debugging it, so I reworked the driver to
use pyusb1. I have it set to be able to select libusb0 or libusb1, which
give similar but slightly different results.

I can send the initial packets to get status and start the device
sending back weather data packets. These packets need to be ACKed with
specific packet details and after a very short while the write operation
on these packets fails, with pyusb reporting timeouts.
I have set the code up to carry on after the timeout exceptions and the
device meanwhile happily keeps sending data, which we can read.
Eventually, after about 100 seconds, the device decides it is missing
too many ack packets and stops sending.

My initialisation code looks a bit like this, to cope with the fact that
it appears as a HID ...

dev = usb.core.find( backend=libusb.get_backend(),
idVendor=self.vendor_id, idProduct=self.product_id)
...
if this is libusb0:
try:
dev.detach_kernel_driver( self.interface )
except (usb.core.USBError):
pass

else:
kernelactive = dev.is_kernel_driver_active( self.interface )
if kernelactive :
dev.detach_kernel_driver( self.interface )


dev.set_configuration()

dev.reset()

========================== the problems

Start doing some writing and reading...

The linux captures initially look like:
frame 1: host sends URB_SUBMIT, direction OUT, status "in
progress(-115)", and includes the command in a 64-byte null-padded packet
frame 2: device sends URB_COMPLETE, dirn OUT, status success (0), data
length zero
frame 3: host sends URB_SUBMIT, direction IN, status "in
progress(-115)", zero data length
frame 4: device sends URB_COMPLETE, dirn IN, status success (0),with the
results in a 64-byte null-padded payload
with frames a millisecond or so apart.

The host reads remain the same, but then writes become
time1: host sends URB_SUBMIT, direction OUT, status "in progress(-115)",
and includes the command in a 64-byte null-padded packet
time1+timeout interval: device "returns" URB_COMPLETE, dirn OUT, URB
status "No such file or directory" (-2), but URB length was zero.

The timeout interval was changed between 200 and 2000 ms, with no
difference in result.

It does not make sense that the device itself is actually returning a
packet after the precise timeout duration (which I don't think it
knows), so is this a fake packet injected by the OS, or by libusb?
Why does the error message say ENOENT (-2) when is seems to be a timeout?

When I look at the USBError exception under libusb1, then e.strerror is
"Operation timed out", e.errno is 110 and e.backend_error_code is -7
When I do this under libusb0, the results vary.
If I have just plugged the USB in, then the HID kernel driver needs to
be detached, and
e.strerror is "Connection timed out", the e.errno is None, and the
e.backend_error_code is -110.
However, if I rerun the test, then the kernel does not have the HID
loaded, and it returns (repeatedly)
e.strerror is " could not detach kernel driver from interface 0",
the e.errno is None, and the e.backend_error_code is -110.

So, what do I do now?

Thanks
Cameron.

------------------------------------------------------------------------------
Cameron Davidson
2016-11-27 03:32:31 UTC
Permalink
I have largely sorted out my problems. I suspect it related to using
blocking calls at both ends, while the weather station device is prone
to doing i/o in a sequence that has a few random variations in it.
This led to deadlock when my code assumed there was some logical
sequence to the comms.

This became clearer using the trial version of USBlyzer, rather than
usbpcap, while running the vendor software under windows. This way, the
usb handshaking was revealed. Of course, I have no way to know if the
eccentric timing was due to the hardware or a limitation of usblyzer.

I am still living with the libusb0 error reporting bug, but can work
around it by referrring to the e.backend_error_code. Hardly elegant, but
the code is there now.

Cameron.

------------------------------------------------------------------------------
Continue reading on narkive:
Loading...