This is the intended behaviour of the app.
Sipdroid sets up media early to avoid loosing the first bit of audio when answering calls. When you make an internal call, audio will already pass thru. Only on public networks, your audio will not be heard before the other side picks up.
|