如何使用oauth2建立一个网站的Python爬虫程序[英] How to build a Python crawler for websites using oauth2

本文是小编为大家收集整理的关于如何使用oauth2建立一个网站的Python爬虫程序的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我是Web编程的新手.我想建造一个爬行者,以在Python的Foursquare中爬行社交图. 我使用apiv2库有一个"手动"控制的爬网.主要方法是:

def main():
    CODE = "******"
    url = "https://foursquare.com/oauth2/authenticate?client_id=****&response_type=code&redirect_uri=****"
    key = "***"
    secret = "****"
    re_uri = "***"

    auth = apiv2.FSAuthenticator(key, secret, re_uri)
    auth.set_token(code)    
    finder = apiv2.UserFinder(auth)        

    #DO SOME REQUIRES By USING THE FINDER
    finder.finde(ANY_USER_ID).mayorships()
    bla bla bla

问题是,目前,我必须在浏览器中键入URL并从重定向URL拾取代码,然后在程序中更新代码,然后再次运行.我认为可能有某种方法可以编码代码在当前程序中的进度并使其自动.

任何说明或示例代码都将不胜感激.

推荐答案

您应该检查 python-oauth2 模块.这似乎是最稳定的东西.

特别是此博客文章在如何与Python轻松地进行Oauth.示例代码使用Foursquare API,因此我先检查一下.

我最近必须让Oauth与Dropbox合作,并写了此模块包含进行OAuth交换的必要步骤.

对于我的系统,我能想到的最简单的是pickle oauth客户端.我的博客软件包刚刚对腌制客户端进行了启用,并要求具有以下功能的端点:

get = lambda x: client.request(x, 'GET')[1]

只是确保您的工人拥有此客户端对象,您应该很好地走: - )

其他推荐答案

您不必每次都这样做.他们会给您一个token x小时/天有好处的.最终您将获得403 HTTP代码,您需要重新认证

本文地址:https://www.itbaoku.cn/post/2090861.html

问题描述

I'm new in web programming. I want to build a crawler for crawling the social graph in Foursquare by Python. I've got a "manually" controlled crawler by using the apiv2 library. The main method is like:

def main():
    CODE = "******"
    url = "https://foursquare.com/oauth2/authenticate?client_id=****&response_type=code&redirect_uri=****"
    key = "***"
    secret = "****"
    re_uri = "***"

    auth = apiv2.FSAuthenticator(key, secret, re_uri)
    auth.set_token(code)    
    finder = apiv2.UserFinder(auth)        

    #DO SOME REQUIRES By USING THE FINDER
    finder.finde(ANY_USER_ID).mayorships()
    bla bla bla

The problem is that at present, I have to type the URL in my browser and pick up the CODE from the redirect URL, and then update the CODE in my program, and run it again. I think there might be some way that I can code the CODE taking progress into my current program and make it automatic.

Any instruction or sample code is appreciated.

推荐答案

You should check out the python-oauth2 module. It seems to be the most stable thing out there.

In particular, this blog post has a really good run down on how to do Oauth easily with Python. The example code uses the Foursquare API, so I would check that out first.

I recently had to get oauth working with Dropbox, and wrote this module containing the necessary steps to do oauth exchange.

For my system, the simplest thing I could think of was to pickle the Oauth client. My blog package just deserialized the pickled client and requested endpoints with the following function:

get = lambda x: client.request(x, 'GET')[1]

Just makes sure your workers have this client object and you should be good to go :-)

其他推荐答案

You don't have to do it every time. They'll give you a token that is good for X hours/day. Eventually you'll get 403 http code and you'll need to re-authenticate