Liquidsoap
  1. Liquidsoap
  2. LS-532

Freeze after exception in streaming thread (other outputs never shutdown)

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Liquidsoap
    • Labels:
      None
    • Environment:
      Liquidsoap 1.0.0-beta+svn (@:20110515:103042).
      Linux wbox-turm 2.6.32-31-generic #61-Ubuntu SMP Fri Apr 8 18:24:35 UTC 2011 i686 GNU/Linux

      Description

      if input is alsa or pulseaudio and output is connected to a local icecast2,
      liquidsoap dies in case of shutting down icecast2.

      test steps:
      start liquidsoap
      sudo start icecast2
      play icecast stream with player
      sudo stop icecast2

      configuration:

      channel = mksafe(input.alsa())
      #channel = mksafe(input.pulseaudio())
      #channel = mksafe(input.alsa(device="pcm.liquidsoap"))

      radio=channel
      radio=limit(radio)
      output.icecast(
      %vorbis.cbr(samplerate=44100, channels=2, bitrate=160),
      host="localhost", port=4565, user="liquidsoap", password="hack", mount="/radio", description="text"
      restart=true,
      radio
      )
      output.file(
      %mp3(bitrate=160,stereo=true, samplerate=44100),
      "/mnt/archive/potsdam/%Y-%m-%d/%Y-%m-%d-%H_%M_%S.mp3",
      reopen_when={0m0s},
      radio
      )

      with alsa:

      /var/log/icecast2/error.log [2011-05-18 22:04:02] INFO source/source_main listener count on /radio now 0
      /var/log/icecast2/error.log [2011-05-18 22:04:19] INFO source/source_main listener count on /radio now 1
      /var/log/icecast2/error.log [2011-05-18 22:04:29] INFO sighandler/_sig_die Caught signal 15, shutting down...
      /var/log/liquidsoap/play.log 2011/05/18 22:04:30 [/radio:2] Cry socket error: could not write data to host: Broken pipe in write()!
      /var/log/liquidsoap/play.log 2011/05/18 22:04:30 [/radio:3] Closing connection...
      /var/log/liquidsoap/play.log 2011/05/18 22:04:30 [threads:1] Thread "wallclock_alsa" aborts with exception Cry error: could not close connection: Transport endpoint is not connected in shutdown()!
      /var/log/liquidsoap/play.log 2011/05/18 22:04:30 [threads:3] Raised by primitive operation at file "", line 0, characters 0-0
      /var/log/liquidsoap/play.log 2011/05/18 22:04:30 [main:3] Shutdown started!
      /var/log/liquidsoap/play.log 2011/05/18 22:04:30 [main:3] Waiting for threads to terminate...
      /var/log/icecast2/error.log [2011-05-18 22:04:29] INFO source/source_shutdown Source "/radio" exiting
      /var/log/icecast2/error.log [2011-05-18 22:04:29] INFO main/main Shutting down
      /var/log/icecast2/error.log [2011-05-18 22:04:29] INFO fserve/fserve_shutdown file serving thread stopped
      /var/log/icecast2/error.log [2011-05-18 22:04:30] INFO slave/_slave_thread shutting down current relays
      /var/log/icecast2/error.log [2011-05-18 22:04:30] INFO slave/_slave_thread Slave thread shutdown complete
      /var/log/icecast2/error.log [2011-05-18 22:04:30] INFO auth/auth_shutdown Auth shutdown
      /var/log/icecast2/error.log [2011-05-18 22:04:30] INFO yp/yp_shutdown YP thread down
      /var/log/icecast2/error.log [2011-05-18 22:04:30] INFO stats/stats_shutdown stats thread finished
      /var/log/syslog May 18 22:04:31 wbox-turm init: icecast2 main process (12201) terminated with status 255

      and for pulseaudio:

      /var/log/icecast2/error.log [2011-05-18 22:11:56] INFO source/source_move_clients passing 0 listeners to "/radio"
      /var/log/icecast2/error.log [2011-05-18 22:11:56] INFO format-vorbis/initial_vorbis_page seen initial vorbis header
      /var/log/icecast2/error.log [2011-05-18 22:11:57] INFO source/source_main listener count on /radio now 0
      /var/log/icecast2/error.log [2011-05-18 22:12:04] INFO source/source_main listener count on /radio now 1
      /var/log/liquidsoap/play.log 2011/05/18 22:12:16 [/radio:2] Cry socket error: could not write data to host: Broken pipe in write()!
      /var/log/liquidsoap/play.log 2011/05/18 22:12:16 [/radio:3] Closing connection...
      /var/log/liquidsoap/play.log 2011/05/18 22:12:16 [threads:1] Thread "wallclock_pulse" aborts with exception Cry error: could not close connection: Transport endpoint is not connected in shutdown()!
      /var/log/liquidsoap/play.log 2011/05/18 22:12:16 [threads:3] Raised by primitive operation at file "", line 0, characters 0-0
      /var/log/liquidsoap/play.log 2011/05/18 22:12:16 [main:3] Shutdown started!
      /var/log/liquidsoap/play.log 2011/05/18 22:12:16 [main:3] Waiting for threads to terminate...
      /var/log/liquidsoap/play.log 2011/05/18 22:12:16 [main:3] Cleaning downloaded files...
      /var/log/liquidsoap/play.log 2011/05/18 22:12:16 >>> LOG END
      /var/log/icecast2/error.log [2011-05-18 22:12:16] INFO sighandler/_sig_die Caught signal 15, shutting down...
      /var/log/icecast2/error.log [2011-05-18 22:12:16] INFO source/source_shutdown Source "/radio" exiting
      /var/log/icecast2/error.log [2011-05-18 22:12:16] INFO main/main Shutting down
      /var/log/icecast2/error.log [2011-05-18 22:12:16] INFO fserve/fserve_shutdown file serving thread stopped

      BR, Peter

        Activity

        Hide
        Romain Beauxis
        added a comment -
        Hi Peter!

        This one should be fixed in the latest hg...
        Show
        Romain Beauxis
        added a comment - Hi Peter! This one should be fixed in the latest hg...
        Hide
        David Baelde
        added a comment -
        I confirm the issue. I just ran
          ./src/liquidsoap 'output.icecast(%mp3, mount="alsa", input.alsa())'
        on my liquidsoap (some hg version, details should be irrelevant but it is Liquidsoap 1.0.0-beta+svn (LS-469@5b385e629167:20110501:104157)).

        I have no mic connected so it should be mostly encoding silence. After a few seconds I shutdown icecast, and liquidsoap did hang:
        2011/05/20 07:54:53 [alsa:3] Closing connection...
        2011/05/20 07:54:53 [alsa:3] Will try to reconnect in 3 seconds.
        2011/05/20 07:54:57 [alsa:3] Connecting mount alsa for source@localhost...
        2011/05/20 07:54:57 [alsa:2] Connection failed: could not connect to host: Connection refused in connect()!
        2011/05/20 07:54:57 [threads:3] Thread "wallclock_alsa" exited.
        2011/05/20 07:54:57 [main:3] Shutdown started!
        2011/05/20 07:54:57 [main:3] Waiting for threads to terminate...
        ^C^C^CKilled
        Show
        David Baelde
        added a comment - I confirm the issue. I just ran   ./src/liquidsoap 'output.icecast(%mp3, mount="alsa", input.alsa())' on my liquidsoap (some hg version, details should be irrelevant but it is Liquidsoap 1.0.0-beta+svn ( LS-469 @5b385e629167'> LS-469 @5b385e629167 :20110501:104157)). I have no mic connected so it should be mostly encoding silence. After a few seconds I shutdown icecast, and liquidsoap did hang: 2011/05/20 07:54:53 [alsa:3] Closing connection... 2011/05/20 07:54:53 [alsa:3] Will try to reconnect in 3 seconds. 2011/05/20 07:54:57 [alsa:3] Connecting mount alsa for source@localhost ... 2011/05/20 07:54:57 [alsa:2] Connection failed: could not connect to host: Connection refused in connect()! 2011/05/20 07:54:57 [threads:3] Thread "wallclock_alsa" exited. 2011/05/20 07:54:57 [main:3] Shutdown started! 2011/05/20 07:54:57 [main:3] Waiting for threads to terminate... ^C^C^CKilled
        Hide
        David Baelde
        added a comment -
        Me again, with latest hg in default branch (7a59a2e26b1e). Still using:
          ./src/liquidsoap 'output.icecast(%mp3, mount="alsa", input.alsa())'

        It freezes if icecast is initially stopped. If icecast is started but stopped during streaming I get the following exception:
        2011/05/20 08:17:04 [threads:1] Thread "wallclock_alsa" aborts with exception Cry error: could not close connection: Transport endpoint is not connected in shutdown()!
        2011/05/20 08:17:04 [threads:3] Raised by primitive operation at file "", line 0, characters 0-0
        Thread 4 killed on uncaught exception Cry.Error(_)
        Raised at file "tools/tutils.ml", line 167, characters 39-40
        Called from file "thread.ml", line 38, characters 8-14
        Show
        David Baelde
        added a comment - Me again, with latest hg in default branch (7a59a2e26b1e). Still using:   ./src/liquidsoap 'output.icecast(%mp3, mount="alsa", input.alsa())' It freezes if icecast is initially stopped. If icecast is started but stopped during streaming I get the following exception: 2011/05/20 08:17:04 [threads:1] Thread "wallclock_alsa" aborts with exception Cry error: could not close connection: Transport endpoint is not connected in shutdown()! 2011/05/20 08:17:04 [threads:3] Raised by primitive operation at file "", line 0, characters 0-0 Thread 4 killed on uncaught exception Cry.Error(_) Raised at file "tools/tutils.ml", line 167, characters 39-40 Called from file "thread.ml", line 38, characters 8-14
        Hide
        Romain Beauxis
        added a comment -
        two issues here too :)

        First, there was a missing set of changes in the current default branch which triggered the "shutdown" exception issues (raised by ocaml-cry improperly using Unix.shutdown, fixed in one of the missing commits). It should not be present in default now.

        Second is the fact that liquidsoap hangs instead of shutting down properly. I think we have tried to fix this previsouly but apparently, there are still cases where the threads do not terminate..

        As for this issue, I do not know what to do. I can confirm that the exception does not appear in current default, but liquidsoap will still hang if icecast is not present.
        The freeze seem to be caused by input.alsa.

        I would propose to create a new issue for the freeze, _after_ testing it against the branch that has fixes related to io/ operators and to close this one..
        Show
        Romain Beauxis
        added a comment - two issues here too :) First, there was a missing set of changes in the current default branch which triggered the "shutdown" exception issues (raised by ocaml-cry improperly using Unix.shutdown, fixed in one of the missing commits). It should not be present in default now. Second is the fact that liquidsoap hangs instead of shutting down properly. I think we have tried to fix this previsouly but apparently, there are still cases where the threads do not terminate.. As for this issue, I do not know what to do. I can confirm that the exception does not appear in current default, but liquidsoap will still hang if icecast is not present. The freeze seem to be caused by input.alsa. I would propose to create a new issue for the freeze, _after_ testing it against the branch that has fixes related to io/ operators and to close this one..
        Hide
        peter
        added a comment -
        I am not sure, if I understood everthing (my moderate english understanding).
        "hg update default" shows me no changes, so i should have been on the last default hg branch when writing the ticket.
        "the freeze seem to be caused by input.alsa" -> but the issue also occurred using pulseaudio (see log below, but maybe it is only wrapping alsa)
        What branch should I use for retesting?
        Show
        peter
        added a comment - I am not sure, if I understood everthing (my moderate english understanding). "hg update default" shows me no changes, so i should have been on the last default hg branch when writing the ticket. "the freeze seem to be caused by input.alsa" -> but the issue also occurred using pulseaudio (see log below, but maybe it is only wrapping alsa) What branch should I use for retesting?
        Hide
        Romain Beauxis
        added a comment -
        Hi!
        You need to call hg pull before hg update to grab the latest changes..
        Show
        Romain Beauxis
        added a comment - Hi! You need to call hg pull before hg update to grab the latest changes..
        Hide
        David Baelde
        added a comment -
        I ran the same tests again with the latest hg (a4e4305e5e6c). The crashes disappeared but I still get a freeze, only with buffered ALSA input (not with input.pulseaudio() or input.alsa(bufferize=false)). It seems that the ALSA buffering thread does not comply with shutdown.
        Show
        David Baelde
        added a comment - I ran the same tests again with the latest hg (a4e4305e5e6c). The crashes disappeared but I still get a freeze, only with buffered ALSA input (not with input.pulseaudio() or input.alsa(bufferize=false)). It seems that the ALSA buffering thread does not comply with shutdown.
        Hide
        David Baelde
        added a comment -
        This bug is from clock.ml but I don't think I can fix it quickly in a reliable way, it's not easy to get back to this piece of code...
        Show
        David Baelde
        added a comment - This bug is from clock.ml but I don't think I can fix it quickly in a reliable way, it's not easy to get back to this piece of code...
        Hide
        David Baelde
        added a comment -
        I was listening to some radio using output.pulseaudio(fallible=true,input.http(...)) and after a suspend on my laptop I got a "connection terminated" error (could be enhanced, but not a bug in itself) followed by a freeze, so it does seem like not only buffered ALSA input is affected. In any case, the problem is in the wallclock code: detached sources are not shutdown in case of a crash.
        Show
        David Baelde
        added a comment - I was listening to some radio using output.pulseaudio(fallible=true,input.http(...)) and after a suspend on my laptop I got a "connection terminated" error (could be enhanced, but not a bug in itself) followed by a freeze, so it does seem like not only buffered ALSA input is affected. In any case, the problem is in the wallclock code: detached sources are not shutdown in case of a crash.
        Hide
        David Baelde
        added a comment -
        I have understood the problem (cf. new description of the bug) and committed a fix in branch LS-532. Please let me know if you see anything wrong with it.
        Show
        David Baelde
        added a comment - I have understood the problem (cf. new description of the bug) and committed a fix in branch LS-532 . Please let me know if you see anything wrong with it.
        Hide
        David Baelde
        added a comment -
        I have added two options to allow users to tweak the behaviour of the new code. I consider the issue solved now. I have also added a new feature, source.init(). It can be really tricky to use, but it doesn't change anything if it isn't used... I'll wait for a little feedback before finalizing all this.
        Show
        David Baelde
        added a comment - I have added two options to allow users to tweak the behaviour of the new code. I consider the issue solved now. I have also added a new feature, source.init(). It can be really tricky to use, but it doesn't change anything if it isn't used... I'll wait for a little feedback before finalizing all this.
        Hide
        David Baelde
        added a comment -
        A quick memo about the "tricky" bug: running init can call #sleep of a playlist (if the output which woke up the playlist failed to initialize), and the playlist's #sleep needs the duppy scheduler to let its feeding task die. So if init is ran in the scheduler, and there's only one queue (the default) there's a deadlock. The important point: this is not specific to source.init() but can occur with any call of Clock.collect after code execution in the scheduler. A precaution would be to increase the default number of queues, but it may not be enough in all cases... I'm afraid that the only other option would be to move to a another scheduling model :\
        Show
        David Baelde
        added a comment - A quick memo about the "tricky" bug: running init can call #sleep of a playlist (if the output which woke up the playlist failed to initialize), and the playlist's #sleep needs the duppy scheduler to let its feeding task die. So if init is ran in the scheduler, and there's only one queue (the default) there's a deadlock. The important point: this is not specific to source.init() but can occur with any call of Clock.collect after code execution in the scheduler. A precaution would be to increase the default number of queues, but it may not be enough in all cases... I'm afraid that the only other option would be to move to a another scheduling model :\
        Hide
        David Baelde
        added a comment -
        I'll review the code once more before merging. People, this is your last chance to test before it moves to default!

        Feedback has been good and I'm satisfied with the code. The only thing is that source.init() hasn't been tested much, but it should matter if people don't use it (the change is not very invasive). I'll consider it as experimental for now.
        Show
        David Baelde
        added a comment - I'll review the code once more before merging. People, this is your last chance to test before it moves to default! Feedback has been good and I'm satisfied with the code. The only thing is that source.init() hasn't been tested much, but it should matter if people don't use it (the change is not very invasive). I'll consider it as experimental for now.
        Hide
        David Baelde
        added a comment -
        While reviewing the code I found another problem, I'll fix it but here's an explanation for the record...

        We had to avoid raising exceptions when #output_get_ready fails, but catch them, finish the #start_outputs, so that shutdown cleanly works. If #start_output is interrupted, some sources that started will be ignored from the shutdown (#sleep is never called for them) and shutdown hangs while waiting for their thread.

        For #get_ready there was no such protection. I managed to create a freeze by putting an assert false in output.dummy()'s #wake_up and with the following example:
        ./src/liquidsoap --debug 'x=blank() output.pulseaudio(fallible=true,add([x,input.http("http://dbaelde:dbaelde@mrpingouin.is-a-geek.ogg:8000/friends.ogg")])) output.dummy(x)'
        (It is important that the dummy output is in the same clock as the pulse output, and to have an http input that can freeze the shutdown.)

        Therefore I'm changing that part of the initialization code too. I was reluctant without an example because we don't get as much backtraces after the change.
        Show
        David Baelde
        added a comment - While reviewing the code I found another problem, I'll fix it but here's an explanation for the record... We had to avoid raising exceptions when #output_get_ready fails, but catch them, finish the #start_outputs, so that shutdown cleanly works. If #start_output is interrupted, some sources that started will be ignored from the shutdown (#sleep is never called for them) and shutdown hangs while waiting for their thread. For #get_ready there was no such protection. I managed to create a freeze by putting an assert false in output.dummy()'s #wake_up and with the following example: ./src/liquidsoap --debug 'x=blank() output.pulseaudio(fallible=true,add([x,input.http(" http://dbaelde: dbaelde@mrpingouin.is-a-geek.ogg :8000/friends.ogg")])) output.dummy(x)' (It is important that the dummy output is in the same clock as the pulse output, and to have an http input that can freeze the shutdown.) Therefore I'm changing that part of the initialization code too. I was reluctant without an example because we don't get as much backtraces after the change.
        Hide
        David Baelde
        added a comment -
        Merged in default at b63faf981ac2.
        Show
        David Baelde
        added a comment - Merged in default at b63faf981ac2.

          People

          • Assignee:
            David Baelde
            Reporter:
            peter
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: