Friday 12 January 2018

Twitter Reporting

So one of the things that I was interested in doing was analysing some data from Twitter. I thought a good place for me to start was with data on Dupuytren's and then on Ledderhose. Why this? Well because I am a trustee for the British Dupuytren's Society.

In this post I am going to use a variety of tools and libraries, I probably don't do this in the most perfect way as I was trying to achieve a few different things and this was just because I find python and data analysis fun!!! 



Twitter Scraping

The first python library that I am going to use is twitterscraper. This is a great tool for scraping the information from twitter. I have had a few issues with the errors coming out but that is probably my fault. Overall it works well, see the link above and my code snippet below. 

I did add in retweets and likes but it did not seem to like that and the success rate for extraction was 10% rather than the 70% without it. The biggest issue is that I didn't want to cater for lots of different languages so everything, for example, in Chinese fails.

Re
This is the python regular expression library. I have used regex's before and they are super powerful and cool (if you like that sort of thing). For more information see the following link. I only use it for a very simple use below, removing return characters and tabs from the tweets so that exporting to a text document works. 

I also use other packages that I have discussed elsewhere: 
  • pandas
  • Xlwings 
  • pyodbc / SQL Alchemy 
  • SQL scripting 
Screen Shots of the Output: (Yes I could have made this look nicer with some of the formatting options I have posted about before or even just auto-fitting the columns but I felt those commands would have gotten lost in this post).




The Code:





No comments:

Post a Comment