Dream to Learn is shutting down...

We are very sorry to say that Dream to Learn will be shutting down as of December 28th, 2019. If you have content that you wish to keep, you should make a copy of it before that date.


0COMMENTS0RECOMMENDS

"Ultra Newb's" Guide to RPostgreSQL (Part 1)

20
POSTED IN: Data Analytics & Visualization Blog

RPostgreSQL / PostgreSQL -> Serving Data to R - An "Ultra Newb's" Guide (Part 1)

After spending time wandering through Random Forests, I had the opportunity to work with some tools used on data from actual Forests.  My friend Jay - that's him in dark shirt in the the picture below, circa 1985, probably explaining something to me - (yes, I'm in a sweater-vest)

 

 

Objective:   Take some of the R tools and methods that Jay has been using to analyze forestry data, and re-purpose them for other things (and get a basic level of proficiency)

Problem:      I'm utterly unfamiliar with PostgreSQL and RPostgreSQL, which are key ingredients for Tool to operate.   What I hoped would be a plug and play over a couple of hours, turned out to take me a couple of evenings banging away trying to the SQL connected locally, and connecting with R.

What's below:   Part 1 - The VERY basics on how all the pieces fit together, and how to hopefully safe you a little time getting your local system set up and running RPostgreSQL.

Remember: 

  • SETWD > if you're borrowing source code, ensure one of the first things you do is modify the file to use the new location of transplanted files - e.g. setwd("C:/Users/Home/Documents/SampleAnalysis")
  • INSTALL + LIBRARY > if not already done, install RPostgreSQL package packages UI > install packages from CRAN > type in RPostgreSQL and will autocomplete library(RPostgreSQL)   #then install library

Issues Worked Through

  • Issue: PNG Function did not run well locally on mine (e.g. png(paste('stats/attributes.png', sep="")) 
  • FIX: - install.packages("Cairo") - need this to replace PNG with cairo PNG library(Cairo)   > e.g. CairoPNG(paste('attributes.png'), width = pngwd, height = pnght, pointsize = 16)
  • Issue: PERMISSIONS > ERROR: could not open file CSV for reading: Permission denied SQL state: 42501
  • FIX:  (1) move file to C:\Users\Public  (so all users have full access) or (2) fiddle with security settings to add users like postgres to have permissions (I did it second way initially, but having trouble recreating, so opted for 1.   Basically, you need to ensure Windows (or system) is "OK" with the user you've defined, reading the file.  Need to explicit permission.
  • Issue: Rookie with PostgreSQL - not sure if it's even connected/listening
  • FIX:  # run netstat -an from command line check 5432 is connected  (established on 5432 is good, means DB is set up on right port, listening
  • Other Info: Background - RPostgreSQL provides a DBI-compliant database connection from GNU R to PostgreSQL. https://code.google.com/p/rpostgresql/ Synopis - RPostgreSQL provides a DBI-compliant database connection from GNU R to PostgreSQL. Summary of basic usage 1. dbDriver(drv, ...) instantiates the driver object. Eg.

 

--------

Interfacing to Local PostgreSQL deployment with R Studio to Run RPostgreSQL: (My Launch Sequence)

 
1) Launch PGAdmin - this will enable launch of Database and loadnig of table
 
2) Connect Database service - i.e. activate "PostgreSQL 9.1 (local host: 5432) 
 
3) SERVICE > Verify you are running, that service can be connected to (e.g. by R) - do this by running CMD and "netstat -an" - in the list that appears, you should see "5432 ESTABLISHED". If you dont see this, you need to focus on getting local host to be visible to other applications by R.
 
4) DATABASE > Refresh or Activate the DATABASE you're running on service.  (mine is called "Apple_Cider") - note who the Owner is.  Permissions can hang you up sometimes (e.g. sometimes I used postgres as owner, sometimes other name, but it matters in code to ensure permissions are happy.  You should not see a red X if this is active
 
5) Click on Schemas > Public > Tables
 
6) CREATE TABLE > This may be where you need to do a little work to tweak.  
- if you've run the SQL in the past, and table exists, you EITHER need to drop the old table, or modify table name and table references to a new name
- SQL Must contain "CREATE TABLE" 
- SQL Must contain some way to load fields (columns) and data.  Here is mine
COPY cup_of_cider2 FROM 'C:\Users\Ryan Anderson\Documents\__________RRR and Analytics\Samples\apple_cider_csv_fka_p2.csv' DELIMITER ',' CSV HEADER;
the sample SQL Jay provided had the example data nested in it(data inside SQL) - I had trouble with this - SYNTAX Errors - so after a couple attempts to solve, I just used the CSV import and it liked that better.  Never figured why the "COPY" from data inside of SQL did not work, but googling around, looked like it was a problematic approach for some.
 
If it works, you should see "Query returned successfully with no result in 421 ms." or similar.  
 
7) VERIFY TABLE > Go back to PGAdmin UI;  REFRESH the Database; IN SChemas > Public > Tables, you should see evidence of your new table.  Now let's run a simple SQL command to see if the data is there:  Click on SQL ICON, then "Select * from Tablename" - mine is "select * from cup_of_cider2" - below in the output pane you should see your data.  
 
8) Rejoice!  if you see the data in the table, and netstat is showing 5432 Established, then you have a service that is ready to 'feed' the RPostgreSQL.   
 

Now, lets' go to R Studio and engage from the Other side

 
9) Launch RSTudio;    Set working directory setwd("C:/Users/Ryan Anderson/Documents/__________RRR and Analytics/Samples")
 
10) Ensure you have packages installed and run libraries you'll need.  The second one Cairo was needed because the PNG functions he provided to me did not run with my config.   It's possible there is another package that I should install, but this was my easiest fix.  Then I just did a 'search and replace" of PNG( with CairoPNG(
- library(RPostgreSQL)   #then install library
- library(Cairo)   ##library("Cairo", lib.loc="C:/Program Files/R/R-2.15.1/library")
 
11) Connect to the DATABASE (!) - cross fingers. if this works, you're nearly there
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, host='localhost', port='5432', dbname='apple_cider',user='postgres', password='YOURPASSWORD')
 
12) Query DB - rs <- dbSendQuery(con - then 
summary(rs)   ## did we get it?  (yes) - rejoice
 
---
 
So if you get here, you have the basics working to have RPOstgreSQL talk to PostgreSQLDB running locally and serving data.
Part 2 Blog will dive into the "Meat" of how tool can be used in the real world - leveraging Jay's methods he uses in FOrestry

 

 

 

 

Interested in more content by this author?

About the Author

Ryan Anderson

Ryan Anderson

Hi! I like to play with data, analytics and hack around with robots and gadgets in my garage. Lately I've been learning about machine learning.

About this blog

Description is...<br/>Data Analytics & Visualization Blog - Generating insights from Data since 2013

Created: July 25, 2014

Englishfrançais

Up Next