Hi All,
This has been an issue for sometime, but I was finally got around to capture some of the logs - it looks like a serial issue with the board when this happens.
Here are the logs:
[ WARN] [1704837015.462860721]: REJECT f8
[ WARN] [1704837015.467358822]: REJECT e0
[ WARN] [1704837015.467451284]: REJECT 1c
[ WARN] [1704837015.467499302]: REJECT f0
[ WARN] [1704837015.467545468]: REJECT e0
[ERROR] [1704837015.469791102]: DESERIALIZATION ERROR! - 2
The system will never recover from this, and must be restarted. It happens on 1/5 starts.
Any updates or fixes for this? It would be nice if it could auto-recover somehow.
Thanks,
Kris
This issue was a real bear to solve but it has been solved. The issue is after we set the serial rate to 38400 linux comes in and sets the uart port rate to 9600 or some other rate.
I tracked this down and we solved it but only in very recent images. I will have to try to find notes on this as it was ‘obscure’ and a race condition so you may only see it every 5 bootups or every 50 bootups or never.
Here is one note that is not specific in hacking in a fix but explains that this was only fixed recently on images from 2023.
There was an illusive problem where both the SIN and SOUT leds may blink but the baud rate of the signal from the MCB was NOT the proper 38400 rate and was what is called the default Ubuntu linux console rate. If this happens the serial sent to the MCB would be wrong baud rate so the MCB would just get bad data and also the raspberry Pi log would show a great many checksum errors. This issue is only fixed on recent (2023) versions of our image for the raspberry Pi and was very rare but did happen and if it happened on your Magni it may happen frequently where a reset may or may not fix it.
So I have to do some digging for details so you could hack your image if you need to stick with your own image due to changes you want for your usage.
Ok, this is the best note I have found. The fix I had found and then had the image team use on the 2023 image is roughly this although the exact specifics are missing here. Maybe I can find the actual linux file which will be some sort of config file for tty or something. This may not be ‘exact’ but it sounds like what I recall and it seems logical.
I tracked down and came up with a fix where we set a linux startup file to default all serial ports to 38400 because it was due to linux resettting baud rate to the system default AFTER we setup the baud rate. We never really found the proper way to hold things off so this is an image fix.
/boot/config.txt has an option commented out of init_uart_baud=115200. Perhaps we should set init_uart_baud=38400
So I ‘think’ what I had done was un-comment out that line but set baud rate to 38400. That sounds right to me so see if that fixes your image. Again, this is ‘supposed’ to have been done on 2023 images and I think I first figured this out April 2023.
Yup! that is it. On my board with recent image here is the segment in /boot/config.txt
init_uart_baud
Initial uart baud rate.
Default 115200
init_uart_baud=38400
Thanks Mark, this info really helped!
In addition to updating the ini_uart_baud, if for some reason linux ever switchs baud rates on this port, I added the following into motor_serial.cc, right after the “DESERIALIZATION ERROR”, as shown below. Note, I had to reset it first to another value (other than 38400), then back to 38400 for the “reset” to work as desired. Hope this helps. I’m sure there is a cleaner way, but this works quite well.
ROS_ERROR(“Magni: DESERIALIZATION ERROR! - %d”, error_code);
//When this happens, there is often a problem with the serial port
const char commandDefaultPort = “stty -F /dev/ttyAMA0 9600”;*
const char commandDesiredPort = “stty -F /dev/ttyAMA0 38400”;*
ROS_INFO(“Magni: Resetting Serial Port.”);
system(commandDefaultPort);
ros::Duration sleep_duration(0.1); // 100 milliseconds
sleep_duration.sleep();
system(commandDesiredPort);
ROS_INFO(“Magni: Serial Port Reset Complete.”);