2022-01-25
开发杂谈
0
请注意,本文编写于 1248 天前,最后修改于 516 天前,其中某些信息可能已经过时。

在使用Kotlin进行页面分析和爬取数据时,我们需要用到爬虫。但是如果是https协议,可能需要进行安全校验。

我们以某网站(内容保护,不指明)为例,使用Jsoup库进行爬取。

当我们运行时,会报错:

Exception in thread "main" javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1946) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:316) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:310) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1639) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:223) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037) at sun.security.ssl.Handshaker.process_record(Handshaker.java:965) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1064) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1395) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1379) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1570) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:268) at java.net.URL.openStream(URL.java:1068) at kotlin.io.TextStreamsKt.readBytes(ReadWrite.kt:150) Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:450) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:259) at sun.security.validator.Validator.validate(Validator.java:262) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:330) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:237) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:132) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1621) ... 16 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:445) ... 22 more

这就说明我们遇到了https的安全校验问题,下面是解决办法。

我们只需要创建一个方法:

kotlin
package com // 你的包名 import java.security.SecureRandom import java.security.cert.CertificateException import java.security.cert.X509Certificate import javax.net.ssl.HttpsURLConnection import javax.net.ssl.SSLContext import javax.net.ssl.X509TrustManager fun trustAny() { try { HttpsURLConnection.setDefaultHostnameVerifier { _, _ -> true } val context = SSLContext.getInstance("TLS") context.init(null, arrayOf<X509TrustManager>(object : X509TrustManager { @Throws(CertificateException::class) override fun checkClientTrusted(chain: Array<X509Certificate>, authType: String) { } @Throws(CertificateException::class) override fun checkServerTrusted(chain: Array<X509Certificate>, authType: String) { } override fun getAcceptedIssuers(): Array<X509Certificate?> { return arrayOfNulls(0) } }), SecureRandom()) HttpsURLConnection.setDefaultSSLSocketFactory(context.socketFactory) } catch (e: Exception) { // e.printStackTrace() } }

在抓取前使用trustAny()方法即可。

本文作者:Morales

本文链接:

版权声明:本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 License 许可协议。转载请注明出处!