Discussion:
SFTPConnectionPool connections leakage
Mikhail Pryakhin
2018-08-21 21:39:55 UTC
Permalink
Hi,
I’ve come across a connection leakage while using SFTPFileSystem. Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some methods of SFTPFileSystem are chained together resulting in establishing multiple connections to the SFTP server to accomplish one compound action, those methods are:

mkdirs method[1]. The public mkdirs method acquires a new ChannelSftp from the pool and then recursively creates directories, checking for the directory existence beforehand by calling the method exists[2] which delegates to the getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it ends up in returning the FilesStatus instance [4]. The resource leakage occurs in the method getWorkingDirectory which calls the getHomeDirectory method [5] which in turn establishes a new connection to the sftp server instead of using an already created connection. As the mkdirs method is recursive this results in creating a huge number of connections.

open method [6]. This method returns an instance of FSDataInputStream which consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp instance back to the pool but instead it closes it[7]. This leads to establishing another connection to an SFTP server when the next method is called on the FileSystem instance.

I’ve issued a Jira ticket https://issues.apache.org/jira/browse/HADOOP-15358 <https://issues.apache.org/jira/browse/HADOOP-15358>, and fixed the connection leakage issue.
Could I create a pull request to merge the fix?

[1] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658
[2] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321
[3] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202
[4] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290
[5] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640
[6] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504
[7] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123 <https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123>


Kind Regards,
Mike Pryakhin
Arpit Agarwal
2018-08-21 21:52:32 UTC
Permalink
Hi Mikhail,

There’s two ways to contribute a fix:

1. Attach a patch file to the Jira and click “Submit Patch”. I’ve made you a contributor and assigned it to you.
2. Submit a GitHub pull request with the Jira key in the title. Not tried this in a while and not sure it still works.


From: Mikhail Pryakhin <***@gmail.com>
Date: Tuesday, August 21, 2018 at 2:40 PM
To: <***@hadoop.apache.org>
Subject: SFTPConnectionPool connections leakage

Hi,
I’ve come across a connection leakage while using SFTPFileSystem. Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some methods of SFTPFileSystem are chained together resulting in establishing multiple connections to the SFTP server to accomplish one compound action, those methods are:

mkdirs method[1]. The public mkdirs method acquires a new ChannelSftp from the pool and then recursively creates directories, checking for the directory existence beforehand by calling the method exists[2] which delegates to the getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it ends up in returning the FilesStatus instance [4]. The resource leakage occurs in the method getWorkingDirectory which calls the getHomeDirectory method [5] which in turn establishes a new connection to the sftp server instead of using an already created connection. As the mkdirs method is recursive this results in creating a huge number of connections.

open method [6]. This method returns an instance of FSDataInputStream which consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp instance back to the pool but instead it closes it[7]. This leads to establishing another connection to an SFTP server when the next method is called on the FileSystem instance.

I’ve issued a Jira ticket https://issues.apache.org/jira/browse/HADOOP-15358, and fixed the connection leakage issue.
Could I create a pull request to merge the fix?

[1] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658
[2] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321
[3] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202
[4] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290
[5] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640
[6] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504
[7] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123


Kind Regards,
Mike Pryakhin
Mikhail Pryakhin
2018-08-21 22:06:49 UTC
Permalink
Hi Arpit,
Thank you, will contribute the fix shortly.
Post by Arpit Agarwal
Hi Mikhail,
1. Attach a patch file to the Jira and click “Submit Patch”. I’ve made
you a contributor and assigned it to you.
2. Submit a GitHub pull request with the Jira key in the title. Not
tried this in a while and not sure it still works.
*Date: *Tuesday, August 21, 2018 at 2:40 PM
*Subject: *SFTPConnectionPool connections leakage
Hi,
I’ve come across a connection leakage while using SFTPFileSystem. Methods
of SFTPFileSystem operate on poolable ChannelSftp instances, thus some
methods of SFTPFileSystem are chained together resulting in establishing
multiple connections to the SFTP server to accomplish one compound action,
*mkdirs method*[1]*. *The public mkdirs method acquires a new ChannelSftp
from the pool and then recursively creates directories, checking for the
directory existence beforehand by calling the method exists[2] which
delegates to the getFileStatus(ChannelSftp channel, Path file) method [3]
and so on until it ends up in returning the FilesStatus instance [4]. The
resource leakage occurs in the method getWorkingDirectory which calls the
getHomeDirectory method [5] which in turn establishes a new connection to
the sftp server instead of using an already created connection. As the
mkdirs method is recursive this results in creating a huge number of
connections.
*open method* [6]. This method returns an instance of FSDataInputStream
which consumes SFTPInputStream instance which doesn't return an acquired
ChannelSftp instance back to the pool but instead it closes it[7]. This
leads to establishing another connection to an SFTP server when the next
method is called on the FileSystem instance.
I’ve issued a Jira ticket
https://issues.apache.org/jira/browse/HADOOP-15358, and fixed the
connection leakage issue.
Could I create a pull request to merge the fix?
[1]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658
[2]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321
[3]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202
[4]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290
[5]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640
[6]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504
[7]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123
Kind Regards,
Mike Pryakhin
--
Regards, Mikhail Pryakhin.
Mikhail Pryakhin
2018-11-21 06:11:36 UTC
Permalink
Hi Aprit,

I contributed the patch[1] long ago, but it hasn’t been merged yet. Could
you please tell when to expect it in the main branch?


[1]
https://issues.apache.org/jira/browse/HADOOP-15358

Cheers,
Mike
Post by Mikhail Pryakhin
Hi Arpit,
Thank you, will contribute the fix shortly.
Post by Arpit Agarwal
Hi Mikhail,
1. Attach a patch file to the Jira and click “Submit Patch”. I’ve
made you a contributor and assigned it to you.
2. Submit a GitHub pull request with the Jira key in the title. Not
tried this in a while and not sure it still works.
*Date: *Tuesday, August 21, 2018 at 2:40 PM
*Subject: *SFTPConnectionPool connections leakage
Hi,
I’ve come across a connection leakage while using SFTPFileSystem. Methods
of SFTPFileSystem operate on poolable ChannelSftp instances, thus some
methods of SFTPFileSystem are chained together resulting in establishing
multiple connections to the SFTP server to accomplish one compound action,
*mkdirs method*[1]*. *The public mkdirs method acquires a new
ChannelSftp from the pool and then recursively creates directories,
checking for the directory existence beforehand by calling the method
exists[2] which delegates to the getFileStatus(ChannelSftp channel, Path
file) method [3] and so on until it ends up in returning the FilesStatus
instance [4]. The resource leakage occurs in the method getWorkingDirectory
which calls the getHomeDirectory method [5] which in turn establishes a new
connection to the sftp server instead of using an already created
connection. As the mkdirs method is recursive this results in creating a
huge number of connections.
*open method* [6]. This method returns an instance of FSDataInputStream
which consumes SFTPInputStream instance which doesn't return an acquired
ChannelSftp instance back to the pool but instead it closes it[7]. This
leads to establishing another connection to an SFTP server when the next
method is called on the FileSystem instance.
I’ve issued a Jira ticket
https://issues.apache.org/jira/browse/HADOOP-15358, and fixed the
connection leakage issue.
Could I create a pull request to merge the fix?
[1]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658
[2]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321
[3]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202
[4]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290
[5]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640
[6]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504
[7]
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123
Kind Regards,
Mike Pryakhin
--
Regards, Mikhail Pryakhin.
--
Regards, Mikhail Pryakhin.
Loading...