Tag Archives: oil gas

Fuzzy Matching for Oil/Gas MDM using Python

If you’ve worked with or read up on the common Master Data Management tools on the market, you will find that most of them provide functionality for matching records from various source systems and blending or merging them into a single, trusted data set, a practice also dubbed “Golden Record Management” (GRM). Many of these products are expensive, some prohibitively so.

So, as someone who deep down inside believes no problem should be too big for a Python solution, I thought there had to be a way to do something similar. In the oil/gas industry, for matching well data records from various databases, the trouble starts with the name of a well which may occur in various forms. A well could be called the “Peter Smith No.1”, “Smith, P.-1H” or “P Smith, #1”. So to match these across systems requires a degree of fuzziness. Enter Python’s fuzzywuzzy module. Find here on GitHub here, or get it using pip. Find instructions here.

For real world example, go to the Texas Railroad Commission for some sample data. I searched for “Smith, P” and found:

Smith, Patricia Unit 1 , Burleson County, API 4205130712

Smith, Pattie L, Shackelford County, API 4241731960

These are different wells and they get pretty low scores when comparing their names using fuzzy wuzzy:

from fuzzywuzzy import fuzz

>>> fuzz.ratio("SMITH, PATRICIA UNIT", "Smith, Pattie L")
29
>>> fuzz.partial_ratio("SMITH, PATRICIA UNIT", "Smith, Pattie L")
26

>>> fuzz.token_set_ratio(“SMITH, PATRICIA UNIT”, “Smith, Pattie L”)
61

Compare that two potential different spelling of the same wells:


>>> fuzz.token_set_ratio("SMITH, pattie", "Smith, Pattie L")
100
>>> fuzz.token_set_ratio("SMITH, pattie", "Pattie Smith L")
100
>>> fuzz.token_set_ratio("SMITH, pat", "P Smith L")
78

Anyway, I plan to play with some more. It’s promising. But of course this only addresses fuzzy matching for strings. It doesn’t help me match on dates *Sept 30, 1955″ and “October 1, 1955” being a near match. But maybe Python has another module for that, too!
Additional useful links:

http://chairnerd.seatgeek.com/

and

http://marcobonzanini.com/

Advertisements

Leave a comment

Filed under Python

First 2 Months as GIS Analyst…

As mentioned back in June, I started a new job as GIS Analyst for an oil & gas (E/P) company, and I’ve been having a blast. After straddling the worlds of GIS and engineering geology for a few years, I’m glad to finally get to focus on GIS. And there is no lack of challenges in the new job. The learning curve has been a joy!

The GIS department I’m part of has been tasked to build an enterprise GIS using ArcGISServer, ArcSDE, SQLServer 2008, and MS Silverlight. Moreover, we are working on a Master Data Management Solution that will allow data from multiple departments to be available through the Silverlight powered web app.

For me, the challenge, therefore, has been and continues to be two fold:

1) Brush up on many aspects of ESRI technology:

ArcGIS Server, ArcSDE, Python. Silverlight

2) Learn about a whole range of oil/gas industry software applications:

IHS Petra, SMT Kingdom, NeuraSection, Landworks, Merrick/RIO, the list continues

and how they might become part of the MDMS using NeuraDB, Volant, FME

I guess you could say we got our work cut out for us.

Leave a comment

Filed under Uncategorized