· 6 years ago · Dec 16, 2019, 11:54 PM
1## 1) The Executive Summary:
2The goal of the project is to produce a tool in python that allows users to find real estate property in NYC based on several search criteria. These criteria will include the user's estimated budget, the neighborhood of the property, and the purpose/tax bracket of the property (Residential/Small Office, Rental Building/Co-op unit, Utility, or Commercial/Industrial Property). The tool will then display a list of properties that fit into the user's search criteria, including information like the address, the neighborhood, and the type of property.
3
4Afterwards, the user is given the option to see additional properties that didn't fit exactly into the initial search criteria, but are still similar. Also included is a cell that displays a graphical google map with markers pointing at the relevant properties from the data output. In the end, we were able to create a tool that allows people to find information about various property that fits their search criteria and needs.
5
6## 2) Background:
7As current students, some of whom are new to the city, the problem of finding a place to live is very personal to us, so we wanted to try to create something that could be used as a resource to get information about property that we found would be relevant. Because of this, we decided to leave a certain degree of input parameters up to the user such as the price and the neighborhood.
8
9The data comes from NYC Open Data and we started by pulling it directly and assembling it into a local file (sales.csv). The data spans through Fall of 2019 and reflects the following data categories:
10
11**Location Information**
12BOROUGH
13NEIGHBORHOOD
14ADDRESS
15APARTMENT NUMBER
16ZIP CODE
17**Property Physical Details**
18BUILDING CLASS CATEGORY
19TAX CLASS AT PRESENT
20BLOCK
21LOT
22EASE-MENT
23BUILDING CLASS AT PRESENT
24RESIDENTIAL UNITS
25COMMERCIAL UNITS
26TOTAL UNITS
27LAND SQUARE FEET
28GROSS SQUARE FEET
29YEAR BUILT
30**Sales Information**
31SALE PRICE
32SALE DATE
33TAX CLASS AT TIME OF SALE
34BUILDING CLASS AT TIME OF SALE
35
36Since the scope of our grouo's goal was limited to properties in New York due to our use of NYC Open Data, as well as the fact that the end-goal of the project was to help people with finding properties to buy or live in, the most relevant location-based metrics were neighborhood and address. This is because aside from returning specific addresses and properties, the quantity of data returned by the output would show various availability of property with the right price range and room requirements across different neighborhoods. Combined with the use of the Google maps static API, this also allows users to use the tool to discover new neighborhoods to consider living in.
37
38The most important physical detail related data elements were the the number of units and the square footage. Number of units allows users to properly filter down the lists of properties by their needs. On the other hand, square footage allows users to compare different properties, or even neighborhoods by the amount of space they could get at each price point after their data was filtered. One potential use of the square footage with the maps API was to put the square footage in the markers as text, but this resulted in a cluttered map and we decided to just label the markers numerically instead.
39
40When it came to sales related information, the sales price and date were the important elements that we needed to consider here. The sale price is both an important search metric because of user budgets as well as an important part of the output that lets users compare properties and the overall expensiveness of neighborhoods on average. Conversely, the inclusion of sale date allows users to see how recent and accurate the sales price might be, and also allows them to more accurately price similar units in the same neighborhood that may not have been sold in a long time. For example, if a 2-bedroom unit in Alphabet City sold this past October for 3 million and a similar property on the same block has not been sold in 30 years, the first property's price may be a more accurate metric to users than the raw data held by NYC Open Data.
41
42## 3) Project Description: What were the steps involved, in acquiring the data, transforming it, analyzing it and presenting it.
43
44The google maps static API is not available on mashape/rapidAPI, so we had to set up a billing account with google to access the API and get output from it. Fortunately, they offer a sizeable amount of free pulls per month that were more than adequate for our needs. After registering properly and enabling the relevant API, we were able to obtain an API key that we used. After that, the next step was to convert the addresses from the data output to something that would work in the get method for the API. At first, we started by converting each of the addresses in the tool's output to geographic coordinates (latitude, longitude) but eventually realized that the Google maps static API can also use addresses in a string format to search for locations. With New York as the scope of our data, we fed a python list of the addresses into a code cell that created a string to be inserted into the get method that put markers on a google map of the first five locations so that users can get a geographic idea of where in New York the properties they were looking at were located. The center of the map can also be controlled but for the sake of consistency we chose to center it on midtown Manhattan.
45
46
47## 4) Conclusion and Further Steps
48
49In conclusion, we were able to leverage several of the tools and resources from this class, from using MySQL with Python to calling APIs to create a tool that takes in a set of parameters to help users find a dataset of properties that fit their search criteria. For futher steps, we might be able to incorporate other data sets and create more detailed search parameters or output. For example, if we could find a way to standardize data outputs from other cities, we could expand the tool's scope beyond that of NY. Alternatively, we could find other datasets about NY such as neighborhood crime statistics or nearby amenities to display additional information about different neighborhoods and properties. We might also be able to accomplish these things by looking into other APIs to use beyond or even in conjunction with google maps. For example, we could create a user input parameter with their place of employment and have the google maps directions API calculate the commute distance by public transit.
50
51## 5) Lessons Learned
52
53We learned how to use many of the different data tools from the class: acquiring data, putting it into a MySQL database, using SQL to search through it, using python to facilitate the entire process, and putting the data output into a format readable by a get method of an API to create extra visualizations. We learned some of the difficulties in clearly defining a project goal, as well as some of the difficulties involved in acquiring data using python. We also experimented with using different methods of input for different APIs and finding various ways they could interact with our data outputs as well as finding different ways a data set could be represented other than just having raw data displayed.