You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
139 lines
4.0 KiB
139 lines
4.0 KiB
= LabInstance twint!
|
|
|
|
|
|
== Quickstart
|
|
|
|
This is a quickstart guide of howto use this *LabInstance*
|
|
|
|
=== Default Configuration
|
|
|
|
- Working Directory
|
|
|
|
> /home/docker/project
|
|
|
|
- Default user
|
|
|
|
> docker
|
|
|
|
- Default password
|
|
|
|
> docker
|
|
|
|
- Default password4root
|
|
|
|
> pass
|
|
|
|
|
|
== LabInstance Info
|
|
|
|
Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.
|
|
|
|
Twint utilizes Twitter's search operators to let you scrape Tweets from specific users, scrape Tweets relating to certain topics, hashtags & trends, or sort out sensitive information from Tweets like e-mail and phone numbers. I find this very useful, and you can get really creative with it too.
|
|
|
|
Twint also makes special queries to Twitter allowing you to also scrape a Twitter user's followers, Tweets a user has liked, and who they follow without any authentication, API, Selenium, or browser emulation.
|
|
|
|
|
|
> No authentication. No API. No limits.
|
|
|
|
|
|
=== Limits imposed by Twitter
|
|
|
|
Twitter limits scrolls while browsing the user timeline. This means that with .Profile or with .Favorites you will be able to get ~3200 tweets.
|
|
|
|
|
|
|
|
|
|
=== Quickstart quide
|
|
|
|
This is a quickstart guide of howto use this
|
|
|
|
|
|
CLI Basic Examples and Combos
|
|
|
|
A few simple examples to help you understand the basics:
|
|
|
|
- twint -u username - Scrape all the Tweets of a user (doesn't include retweets but includes replies).
|
|
- twint -u username -s pineapple - Scrape all Tweets from the user's timeline containing pineapple.
|
|
- twint -s pineapple - Collect every Tweet containing pineapple from everyone's Tweets.
|
|
- twint -u username --year 2014 - Collect Tweets that were tweeted before 2014.
|
|
- twint -u username --since "2015-12-20 20:30:15" - Collect Tweets that were tweeted since 2015-12-20 20:30:15.
|
|
- twint -u username --since 2015-12-20 - Collect Tweets that were tweeted since 2015-12-20 00:00:00.
|
|
- twint -u username -o file.txt - Scrape Tweets and save to file.txt.
|
|
- twint -u username -o file.csv --csv - Scrape Tweets and save as a csv file.
|
|
- twint -u username --email --phone - Show Tweets that might have phone numbers or email addresses.
|
|
- twint -s "Donald Trump" --verified - Display Tweets by verified users that Tweeted about Donald Trump.
|
|
- twint -g="48.880048,2.385939,1km" -o file.csv --csv - Scrape Tweets from a radius of 1km around a place in Paris and export them to a csv file.
|
|
- twint -u username -es localhost:9200 - Output Tweets to Elasticsearch
|
|
- twint -u username -o file.json --json - Scrape Tweets and save as a json file.
|
|
- twint -u username --database tweets.db - Save Tweets to a SQLite database.
|
|
- twint -u username --followers - Scrape a Twitter user's followers.
|
|
- twint -u username --following - Scrape who a Twitter user follows.
|
|
- twint -u username --favorites - Collect all the Tweets a user has favorited (gathers ~3200 tweet).
|
|
- twint -u username --following --user-full - Collect full user information a person follows
|
|
- twint -u username --timeline - Use an effective method to gather Tweets from a user's profile (Gathers ~3200 Tweets, including retweets & replies).
|
|
- twint -u username --retweets - Use a quick method to gather the last 900 Tweets (that includes retweets) from a user's profile.
|
|
- twint -u username --resume resume_file.txt - Resume a search starting from the last saved scroll-id.
|
|
|
|
|
|
|
|
== More info
|
|
|
|
|
|
https://github.com/twintproject/twint/wiki[^]
|
|
|
|
https://github.com/twintproject/twint/wiki/Graph[^]
|
|
|
|
|
|
== RUN INSTANCE
|
|
|
|
Swarmlab services can be run in different ways.
|
|
|
|
- You can run them **through the swarmlab hybrid environment** (http://docs.swarmlab.io/SwarmLab-HowTos/swarmlab/docs/swarmlab/docs/hybrid/start-microservices.html)
|
|
- or use them individually at will on the **command line of your system**
|
|
|
|
|
|
=== CLI
|
|
|
|
> git clone ...
|
|
|
|
> cd [DIRECTORY]
|
|
|
|
|
|
=== help
|
|
|
|
> make help
|
|
|
|
|
|
==== create service
|
|
|
|
> make create
|
|
|
|
|
|
=== start service
|
|
|
|
> make start
|
|
|
|
|
|
=== stop service
|
|
|
|
> make stop
|
|
|
|
|
|
=== list service
|
|
|
|
> make list
|
|
|
|
|
|
=== clean service
|
|
|
|
> make clean
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|