英文:
Go's SSH client and PTY on AIX
问题
我怀疑我在这里得不到答案,因为AIX是非常罕见的东西,但我至少应该尝试一下。
背景:
我们有一个程序。该程序使用golang.org/x/crypto/ssh
库连接到远程服务并执行一些操作。该程序是一个大型服务的一部分,并且经过了广泛的终端用户测试。它不仅在所有基于Linux的客户端上(包括像Ubuntu 12.02这样的旧版本),而且在FreeBSD、OpenBSD、NetBSD、MacOSX、Solaris SPARC、HP-UX和其他*nixes上都可以正常工作。所以看起来它不仅仅是在三星冰箱上进行了测试。昨天我还确信它能够连接到冰箱并且能够顺利完成所需的操作。但那是昨天的事情了...
问题:
今天我们决定为我们的程序添加AIX支持。但我们在某种程度上失败了。
问题描述很简单:在pty
请求之后,程序停止工作。我的意思是,我可以执行ssh.RequestPty
而不会出现任何问题,但是当我尝试执行命令时,应用程序就会停止工作。没有错误,没有任何提示。就是停在那里。
什么时候能工作?
- 在PuTTY/KiTTY中可以工作,所以我可以连接到远程主机。
- 如果我删除
requestPty
- 一切正常。但是我们需要pty
来进行sudo
操作。 - 如果我请求
session.Shell
,即使已经请求了pty
,也可以正常工作。所以如果我编写一种交互式shell,它可以完美地工作。
到目前为止,我尽力进行了调试。执行的最后一个命令是来自ssh/channel.go
中的ch.sendMessage(msg)
。我的意思是它写入了数据包,然后就没有任何数据从远程主机返回。
在测试中,我使用了3个版本的AIX - 5.3、6.1和7.1。没有任何区别。
OpenSSH的版本也不同:
- 5.3 - OpenSSH_5.2p1,OpenSSL 0.9.8k 25 Mar 2009
- 6.1 & 7.1 - OpenSSH_6.0p1,OpenSSL 1.0.1e 11 Feb 2013
所有的机器都在LPAR中运行,但我怀疑这与问题有关。
我不知道出了什么问题。我甚至不能确定这是否是常见的AIX问题,还是只存在于我们的测试机器上。这是一个示例程序,如果它工作,应该会输出IT WORKS
。
package main
import (
"golang.org/x/crypto/ssh"
)
func main() {
server := "127.0.0.1:22"
user := "root"
p := "password"
config := &ssh.ClientConfig{
User: user,
Auth: []ssh.AuthMethod{ssh.Password(p)},
}
conn, err := ssh.Dial("tcp", server, config)
if err != nil {
panic(err.Error())
}
defer conn.Close()
session, err := conn.NewSession()
if err != nil {
panic(err.Error())
}
defer session.Close()
// Comment below and everything works
modes := ssh.TerminalModes{
ssh.ECHO: 0,
ssh.TTY_OP_ISPEED: 14400,
ssh.TTY_OP_OSPEED: 14400,
}
if err := session.RequestPty("xterm", 80, 40, modes); err != nil {
panic(err.Error())
}
// Comment above and everything works
session.Run("echo 1")
println("IT WORKS")
}
如果你有AIX,并且可以在上面运行这段代码,我将非常感谢你的反馈。
如果你有任何想法(即使是疯狂的想法)为什么它可能失败,或者我可以在哪里寻找其他信息,请不要犹豫。
英文:
I doubt that I'll get an answer here as AIX is very rare thing but I should try at least.
The background
We have the program. The program uses golang.org/x/crypto/ssh
library to connect to the remote services and do some things. The program is part of the large service and widely tested by end-users. It works without issues (at least related to connection) not only with all Linux-based clients (include quite old things like Ubuntu 12.02) but also with the clients on FreeBSD, OpenBSD, NetBSD, MacOSX, Solaris SPARC, HP-UX and other *nixes. So looks like it wasn't tested only on the Samsung refrigerators. And yesterday I was sure that it will be able to connect to the refrigerator and do what is needed without any issues. But that was yesterday...
The problem
Today we decided to add AIX support to our program. And we partly failed.
The problem description is simple: after pty
request program stops working. I mean I can do ssh.RequestPty
it executes without any issues but when I'm trying to execute commands after the app just hangs. Without errors, without nothing. Just hangs.
When it works?
- It works in PuTTY/KiTTY so I'm able to connect to the remote host.
- If I remove
requestPty
- everything works. But we needpty
for thesudo
. - It works without issues if I request
session.Shell
even withpty
requested. So if I write kind of interactive shell, it works perfectly.
What have I tried so far
I tried to debug so far as I could. The last command that executes is ch.sendMessage(msg)
from ssh/channel.go
. I mean it writes packet and that's all. No data returned from the remote host.
For the tests, I used 3 versions of AIX - 5.3, 6.1 and 7.1. No difference.
OpenSSH versions are different:
- 5.3 - OpenSSH_5.2p1, OpenSSL 0.9.8k 25 Mar 2009
- 6.1 & 7.1 - OpenSSH_6.0p1, OpenSSL 1.0.1e 11 Feb 2013
All machines are running in LPARs but I doubt this is related to the issue.
I have no idea what is wrong. And I even can't say if this is common AIX issue or only our test machine. Here is the sample program that should write IT WORKS
if it works
package main
import (
"golang.org/x/crypto/ssh"
)
func main() {
server := "127.0.0.1:22"
user := "root"
p := "password"
config := &ssh.ClientConfig{
User: user,
Auth: []ssh.AuthMethod{ssh.Password(p)},
}
conn, err := ssh.Dial("tcp", server, config)
if err != nil {
panic(err.Error())
}
defer conn.Close()
session, err := conn.NewSession()
if err != nil {
panic(err.Error())
}
defer session.Close()
// Comment below and everything works
modes := ssh.TerminalModes{
ssh.ECHO: 0,
ssh.TTY_OP_ISPEED: 14400,
ssh.TTY_OP_OSPEED: 14400,
}
if err := session.RequestPty("xterm", 80, 40, modes); err != nil {
panic(err.Error())
}
// Comment above and everything works
session.Run("echo 1")
println("IT WORKS")
}
If you have AIX somewhere around and can run this code against it I'd appreciate your feedback.
If you have any ideas (even crazy) why it may fail and where else I can look, don't be shy.
Update (2017-03-02):
By suggestion from @LorinczyZsigmond I launched sshd
in debug mode. Results are a bit strange.
Here is part of Debian 9.0 OpenSSH_6.0p1 Debian-4+deb7u3, OpenSSL 1.0.1t 3 May 2016
log after sample program execution:
debug1: session_input_channel_req: session 0 req pty-req
debug1: Allocating pty.
debug1: session_pty_req: session 0 alloc /dev/pts/1
debug1: SELinux support disabled
debug1: server_input_channel_req: channel 0 request exec reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec
debug2: fd 3 setting TCP_NODELAY
debug3: packet_set_tos: set IP_TOS 0x10
debug1: Setting controlling tty using TIOCSCTTY.
debug2: channel 0: rfd 10 isatty
debug2: fd 10 setting O_NONBLOCK
debug3: fd 8 is O_NONBLOCK
debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain
It works as expected.
Now the same block from AIX 7.1 OpenSSH_6.0p1, OpenSSL 1.0.1e 11 Feb 2013
log:
debug1: session_input_channel_req: session 0 req pty-req
debug1: Allocating pty.
debug1: session_pty_req: session 0 alloc /dev/pts/42
debug1: server_input_channel_req: channel 0 request exec reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec
debug1: Values: options.num_allow_users: 0
debug1: RLOGIN VALUE :1
debug1: audit run command euid 0 user root command 'whoami'
setsid: Operation not permitted.
After setsid: Operation not permitted.
it does nothing until I kill it with <kbd>Ctrl+C</kbd>. When I kill it it returns:
debug2: fd 4 setting TCP_NODELAY
debug3: packet_set_tos: set IP_TOS 0x10
debug2: channel 0: rfd 10 isatty
debug2: fd 10 setting O_NONBLOCK
debug3: fd 8 is O_NONBLOCK
debug2: notify_done: reading
Exiting on signal 2
debug1: do_cleanup
debug1: session_pty_cleanup: session 0 release /dev/pts/42
debug1: audit session close euid 0 user root tty name /dev/pts/42
debug1: audit event euid 0 user root event 12 (SSH_connabndn)
debug1: Return Val-1 for auditproc:0
And sends the result of whoami
back to the client. This looks like a bug in SSH server, but is this possible for the 2 different versions?
Another interesting fact is when I run sshd
with truss
(kind of strace
for AIX) the output looks like this:
debug1: session_input_channel_req: session 0 req pty-req
debug1: Allocating pty.
debug1: session_pty_req: session 0 alloc /dev/pts/42
debug1: server_input_channel_req: channel 0 request exec reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec
debug1: Values: options.num_allow_users: 0
debug1: RLOGIN VALUE :1
debug1: audit run command euid 0 user root command 'whoami'
debug2: fd 4 setting TCP_NODELAY
debug3: packet_set_tos: set IP_TOS 0x10
debug2: channel 0: rfd 10 isatty
debug2: fd 10 setting O_NONBLOCK
debug3: fd 8 is O_NONBLOCK
setsid: Operation not permitted.
debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain
debug2: channel 0: obuf empty
debug2: channel 0: close_write
debug2: channel 0: output drain -> closed
But truss
output is a bit more strange than strace
one (at least for someone who don't use *nix trace tools on daily basis) so I don't understand what is going on in the logs. If there is someone more skilled with this stuff here is the part of the trace data http://pastebin.com/YdzQwbt2 from debug1: RLOGIN VALUE :1
.
Also, in the logs, I found that ssh.Shell()
works because it doesn't request pty
. It starts an interactive session (or something like that). But in my case, the interactive session is not an option.
答案1
得分: 3
宁可迟到,也不要错过。
IBM表示这是openssh中的一个错误 - 在分配PTY时存在竞争条件。
https://www-01.ibm.com/support/docview.wss?uid=isg1IV82042
该问题已在openssh.base.server:7.5.102.1500软件包中修复。
奇怪的是这个错误只在AIX系统中出现,而在Linux系统中从未出现过。不过,在我的情况下问题已经解决了。
英文:
better late than never
IBM said it was a bug in openssh - race condition while PTY allocation
https://www-01.ibm.com/support/docview.wss?uid=isg1IV82042
fixed in package openssh.base.server:7.5.102.1500
it strange that bug only occurs in aix, never in linux. nevertheless, problem is solved in my case
答案2
得分: 0
我在使用ssh会话时遇到了类似的问题,出现了“分配pty”然后退出的情况。以下是我的sshd调试日志:
sshd以错误3004-010断开连接:设置终端所有权和模式失败。
debug1: 分配pty。
debug1: session_pty_req: session 0 alloc /dev/pts/2
debug1: 忽略不支持的tty模式操作码13(0xd)
debug1: 忽略不支持的tty模式操作码18(0x12)
debug1: server_input_channel_req: channel 0 request env reply 0
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req env
debug2: 忽略env请求LANG:不允许的名称
debug1: server_input_channel_req: channel 0 request shell reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req shell
debug1: Values: options.num_allow_users: 0
debug1: RLOGIN VALUE :1
setsid: 操作不允许。
操作系统是AIX 7.1(7100-04-03-1642)
我的环境目标是通过远程LDAP用户在AIX上进行用户身份验证(实际上LDAP服务器是Novell eDirectory)。所以,我在用户上添加了以下对象扩展,以修复通过ssh登录的问题:
posixAccount
posixGroup
shadowAccount
uamPosixUser(因为我不确定是否需要此对象)
我只想指出,在AIX操作系统中,以下用户不是本地用户,不在/etc/passwd
和/etc/group
中存在。
V.Davidov
英文:
I had similar problem with "Allocating pty" and then exiting from ssh session. Here is log of my sshd debug:
sshd drops connection with error :3004-010 Failed setting terminal ownership and mode.
>debug1: Allocating pty.
debug1: session_pty_req: session 0 alloc /dev/pts/2
debug1: Ignoring unsupported tty mode opcode 13 (0xd)
debug1: Ignoring unsupported tty mode opcode 18 (0x12)
debug1: server_input_channel_req: channel 0 request env reply 0
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req env
debug2: Ignoring env request LANG: disallowed name
debug1: server_input_channel_req: channel 0 request shell reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req shell
debug1: Values: options.num_allow_users: 0
debug1: RLOGIN VALUE :1
>setsid: Operation not permitted.
>The OS is AIX 7.1 (7100-04-03-1642)
The goal of my environment is to authenticate user on AIX through remote ldap user over ssh (ldap server actually is novell eDirectory).
So, I had similar issue with user authentication.
I fixed login over ssh as in eDirectory Schema (rfc2703), added following object extensions to the user:
posixAccount
posixGroup
shadowAccount
uamPosixUser (as I am not sure is it necessary this object)
I just want to note that on OS AIX following user isn't local, not exist in /etc/passwd
and /etc/group
.
V.Davidov
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论