Very Intermittent Base Station lock-up

Jimbo20 Apr 11, 2017

  1. Jimbo20

    Jimbo20 TrainBoard Member

    274
    178
    11
    I have a very frustrating problem with my DCC++ layout;

    Has anyone had any problems of intermittent locking up of the base station? Is it possible for a command (perhaps corrupted) sent to the base station to cause a lock up?

    I've built a small n-scale layout controlled with a Arduino Nano and homebrew L298 motor board DCC++ base station. It has 12 hall effect magnetic sensors which are connected via a 4067 mux board to d4,d6,d7,d8 and d9 pins on the Nano. 4 sets of points (turnouts) are controlled by another Arduino Nano running DCC decoder software.

    There as an ESP8266-01 connected via the serial lines, which runs an automated end to end railbus, and can also receive commands from a handheld wifi throttle based on David Bodnar's design and pass these commands to the base contoller.

    The DCC++ base station S/W has been slightly tweaked to allow it to compile on the Nano, and has a simple Mux 4067 library added.
    There is apparently plenty of memory and SRAM remaining.

    It all works very well - with the railbus automatically running from one end to the other - and I can control a diesel shunter and the points manually. Current sensing works well and I can read and program CV's without any problems.

    The problem I have is that very intermittently, the Nano will hang in a state where it stops responding to any command, but the DCC signal continues on the track, though it is then nonsense (because loco's runaway). Presumably this means that interrupts are continuing even though the loop() has crashed? This can happen either while loco's are on the track, or if the system is just left powered up with the ESP sending a Cab command every 5 seconds. It can take anything from between maybe 1 hour to 36hours(!) before the hang occurs.

    I've tried 4 different Nanos (admittedly chinese clones - but from different suppliers), Completely rewired the base station even with different ground cable routes, I have replaced the power supplies - and now each module has its own regulated supply. I have tried 2 different 4067 Mux libraries. The symptom has not changed at all.

    Any ideas?
    Thanks in advance for any suggestions....

    Jim
     
    Scott Eric Catalano and sboyer2 like this.
  2. sboyer2

    sboyer2 TrainBoard Member

    35
    41
    6
    one possibility, you're running out of memory

    the NANO is a UNO in a different formfactor, you can burn the UNO bootloader(512bits) into it, and it will run UNO code with no changes needed, the NANO bootloader wastes 2Kb of space

    Steph!
     
  3. Jimbo20

    Jimbo20 TrainBoard Member

    274
    178
    11
    Hi Steph,
    Many thanks for that suggestion, I hadn't appreciated that there was that subtle difference between the nano and Uno. The symptoms are indeed looking like some sort of memory issue. I will give it a try and report back.

    Jim
     
    Scott Eric Catalano and sboyer2 like this.
  4. Jimbo20

    Jimbo20 TrainBoard Member

    274
    178
    11
    Well, I really thought that would be the answer;

    I burned the UNO optiboot into a Nano board, reloaded the DCC++ software (Nano bd now thinks it is an UNO as expected) All seemed good.

    Turned it on at about 8:30 last night. Put the railbus on the track (so it travels to one end, pauses, then returns back pauses and so forth) for an hour or so, then lifted it back off again, last night and again this morning, all okay. Then after about 20 hours of up time, while the system was just idling, the Nano has stopped. I have an LED on one of the 4067 Mux select lines so I can tell when the Nano has died......


    Jim
     
    Scott Eric Catalano likes this.
  5. Jimbo20

    Jimbo20 TrainBoard Member

    274
    178
    11
    Well, I think I've found a cure - even if I don't know the cause!

    I had the ESP8266 setup so that it would send a repeating Cab command every 3 seconds, and this seems to be the cause of the lockups. I have reprogrammed the ESP so that now it only sends a Cab command if a Loco parameter ie speed or direction has changed, and instead repeats an <s> command every 20 seconds (to act as a heartbeat).

    It has now been running non-stop for over 50 hours without any lock-ups.

    I'm happy now - though I don't know why repeating Cab commands would cause a hang after a few hours?

    Jim
     
    Scott Eric Catalano likes this.
  6. sboyer2

    sboyer2 TrainBoard Member

    35
    41
    6
    one thing i hadn't noticed when first reading your original post, are you using the ESP8266-1 and the USB connection (as data not just power) at the SAME time?, if so both are connected to the same RX/TX pins (pin0 and pin1) and will fight with each-other. you will have to use a softserial connection to 2 other pins for the ESP if this is so.

    Steph!
     
    Scott Eric Catalano likes this.
  7. Jimbo20

    Jimbo20 TrainBoard Member

    274
    178
    11
    Thanks again for the reply Steph;

    Alas, I do not have any connection to the Nano apart from the ESP8266 on serial lines A0 and A1. There is nothing connected to the USB port. I am aware of the potential conflict. Indeed I advised another forum member of that issue on here a few weeks ago.

    I have tried monitoring the data lines (only one at a time) by connecting the receive line of an FTDI cable to either pin A1 or A0 and then running PuTTY with logging turned on until the Nano hanged to see if there was any indication of corrupted or incorrect data to try and identify the cause, but it only confirmed the repeating commands being sent EG <T3 03 28 1> every few seconds, and a reply back EG <T3 28 1>. When the hang occurs there is no apparent change in transmitted data from the ESP, but the replied data from the Nano just stops.

    Jim
     
    Scott Eric Catalano likes this.
  8. ThomasP

    ThomasP New Member

    8
    15
    4
    Hey Jim, although I have a totally different configurations, I have experienced that exact problem. The base station does not respond via USB anymore, and the locos continue to run on - the only possibility is to pull the plug. In my case, the solution came when I ran across a table in the internet, that compared the transmission error rates of various USB baud rates. The rate of 115200 baud, the standard setting for the base station, had by far the highest error rates. A rate of 57600 baud was amongst the best. My problem now is almost cured using the 57600 baud transmission rate from computer to Arduino.

    Take care,
    Thomas.
     
    Scott Eric Catalano likes this.
  9. Jimbo20

    Jimbo20 TrainBoard Member

    274
    178
    11
    Thanks for that Thomas!
    I will reduce the baud rate to 57600, and give that a try. I must admit I have been spending time on the landscaping of my layout recently, and so have not powered it up for a few weeks. And now Summer projects are pushing my model railway activities to the back of the priority list, so it could be a few months before I have a chance to try it.
    Thanks again for your input,
    Jim
     
    Scott Eric Catalano likes this.
  10. RCMan

    RCMan TrainBoard Member

    271
    132
    12
    Using high quality USB cables is a must at the higher baud rates.
     
    Scott Eric Catalano likes this.
  11. Jimbo20

    Jimbo20 TrainBoard Member

    274
    178
    11
    I'm not using a usb cable though; it is an ESP8266 linked directly to the Nano A0 & A1 pins via (short) links on the protoboard.
     
    Scott Eric Catalano likes this.

Share This Page