If you’re like me, you’ve been coding in Python for a few years but non of your work has involved team efforts where source/version control would have been key. So much of the code is cobbled together, re-produced, recycled, sometimes documented through comments, often not… the list goes on.
Well, I’m breaking with all these bad habits. I’ve heard and read so much about PyCharm, I’m giving it a try, and I’m also starting with SVC. Getting Jetbrains’ Pycharm up and running is easy. You get it here (https://www.jetbrains.com/pycharm/). It claims “Best Python IDE”.
Then I installed Tortoise SVN, a client for Apache Subversion (SVN), available here (http://tortoisesvn.net/). I went with version 1.8.11.
Finally, once both are installed, in PyCharm, under File >> Settings => Version Control => You select the directory you’d like to put under version control and then pick your version control software (SVN). Done!
So I’ve added up another book to my Python library. “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython”.
So far, it promises to be a good introduction to data science using Python tools. – The first thing you need to do to be able to follow with the exercises in the book is get pandas installed. Which is easy. The link is right here.
But you don’t have to install Pandas or Numpy separately. I’ve just discovered pip. Again, likely due to the fact that I spend most of my Python time in the Standard Library. Once you have pip running with your Python installation of preference, getting new sotware is as easy as running “sudo apt-get install” in Linux.
You just type “pip install pandas” and let it download the necessary packages from the web and install them locally. If you’ve already downloaded a Python wheel (WHL), you can also point pip at that and install from the local file. For more about Wheels, which are replacing Eggs, go here.
Trying to install pandas though, I was at first getting an error related to “Windows C++ 10.0” (link). That turned out to be due to my trying to install 32-bit Numpy. So I did end up downloading a 64-bit version for my 64-bit Python 3.3 and then used pip to install from that WHL. The whole installation took no more than 10 minutes. Now, I’m ready to play with pandas.
I am reading a lot about the use of Python in the data management arena, and while I am not currently working with Big Data, I thought this article here – Using Python for Big Data Analytics – had some great information on things to avoid in Python.
Here is an article that lists Python among the best languages for crunching data (spells “Big Data”). It also mentions a number of other languages, I am not familiar with at all – Kafka? I must be living on the dark side of the moon.
Finally, to complete the triad of links for sharing, there is a page with some pandas how-to. Pandas is another framework I need to take a look at. It seems to pop up in data analytics everywhere. In fact, O’Reilly has a number of titles on Python in that sphere and this one touches on pandas, e.g. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. I should pick up a copy quick… Data science sounds like job security.
So this won’t cover all aspects of matching dates based on similar records stored in different databases but it could be the start for another Python solution. Picking up where I left off with my last post about fuzzy well name matching in oil gas, matching dates for different events in the life of an oil/gas well (permitting, drilling, completing) is another challenge.
Matching “09/30/2014″ 30 Sept 2014” is one thing. but what if the dates are approximate and you’d consider Sept 30 and Oct 2 a match because they’re close.
from fuzzyparsers import parse_date
date1 = "September 30, 1985"
date2 = "10/02/1985"
transformed_date1 = parse_date(date1)
transformed_date2 = parse_date(date2)
timediff = transformed_date2-transformed_date1
So parse_date from fuzzyparses cleans up your dates and the datetime.timedelta datatype allows setting a threshold for what you consider a match.
Several years ago, I was working as a Professional Geologist, sitting by drill rigs, logging borehole cuttings, preparing paper logs, and arguing with the drillers about ROP and TCR… these days I don’t leave my desk, I’m logging automated data loads, manage digital well logs and wrestle with RDP access or TCP packets. Yet, I’m still in the geosciences. It’s been an interesting journey.
Initially, I used this blog to log my baby steps of becoming tech savvy. After making the official transition from a technological geologist to IT professional, there must have been to much to learn to find the time and blog about it. That’s when my editorial output waned. So, I’m making another attempt at reviving my page. I have no particular audience in mind. Instead I like to think of it as merely a collection of useful tips and tricks for geo-technologists like myself who never went through any rigorous IT schooling but have learned to leverage technology to solve problems.
By now, I have 6 years of Python coding under my belt. I have moved from GIS to supporting a range of geoscience applications in the petroleum industry. I spent 3 years working with SQL Server and have spent the last year learning about Oracle. I once dabbled in Linux but will soon have to brush up on that. Also, after successfully completing a 4-course Python certificate, I’m now working towards completing one for C#/.NET.
Along the way, I’ve collected my share of notes and made my share of rookie mistakes. That’s not to say I don’t still feel like a rookie at least once every day. But I have gotten better about finding solutions quickly, asking the right questions, and knowing when it’s time to call in the Marines. So welcome to my blog, and leave me a message if you found something useful.
If you’re like me and you’re tired of being a SQL hack, producing SQL code that is the product of trial-and-error mixed in with google search results, then you might like to spend $100 on these four (4) volumes. I ordered the whole package on amazon and have it decorating my office shelf. Just about to finish the first book, and while little in it was new, it’s been a good refresher. The tone is light. The chapters are bite size in length. Yet the table of contents shows this series is quite comprehensive. So if you’re just getting started with SQL or feel like it’s time to get that first SQL book you’ve been avoiding, give these a chance.
Finally – to round off my triple of blog posts here (I’m getting caught up) – I have looked into consuming web services in Python with suds (https://fedorahosted.org/suds/). It looks straight forward. At this point, the only limitation is my knowledge of web services. Something as simple as:
from suds.client import Client
url = 'http://www.webservicex.net/usaddressverification.asmx?WSDL'
client = Client(url)
gets you a description of the service and its methods for this zip code verification service. This just happens to be a free service I could use for testing. One fine day, I hope to figure out how to use this for downloading data from a vendor who provides a wsdl web service portal. In fact, FME may offer a web services reader, too, and I need to take a look at that as well.