How to Get Open Source Project Ideas and Execute Them

A common question people ask me is how I come up with open source project ideas. It is a pretty simple process:

1. Have A Problem
If you are a developer, when you are building stuff you will have to solve problems. On top of those, as a human being you will have problems every day. How can I share responsive web design examples to educate my clients? How can I make a more flexible grid system for responsive web design? How can I help my kid learn his ABC’s? Are my passwords manly enough?

The first thing most of us do is see if that problem has been solved before, and how. It may be that the problem has been solved effectively, and there is no need for you to solve it again. Many times, there are different approaches to solving the problem, but none that work quite the way you’d like. On rare occasions, you have encountered a problem that doesn’t have any good solutions yet, which can be quite exciting.

1a. Come Up With a Cool Name
This is probably the most fun part of the process, haha. Cool names are cool!

2. Solve The Problem
Get on it. Solve that problem. It could be that solving a particular problem is too large and ambitious to pull off. Either find some friends to collaborate with, or break the problem into smaller components then go after them one at a time.

3. Abstractize Your Solution
When possible, craft your solution in such a way that it can be applied more generally, for people who encounter the same problem.

4. Publish
Throw it on Github, of course, but if you want your project to stand out, go the extra mile. Make a great demo page to showcase your project with some clear, easy-to-follow documentation.

5. Don’t Be Shy
If you release a Github project in the woods, does anyone notice it? To get your project noticed, you need to publicize it. Tweet it out and share it up. Send it off to newsletter publications (e.g. JavaScript Weekly, HTML5 Weekly, etc.) Even better, create a tutorial and try to get it published online somewhere like Codrops, Tuts+ or Smashing Magazine.

6. Repeat
The more you projects you publish, the more ideas you will have. I recommend maintaining a list of your project ideas on Google Drive or Trello.

One Year of Open Source Traffic

Screen Shot 2014-02-02 at 1.36.13 PMI’ve put Google Analytics on some of my Github project pages, but hardly ever look at it. Well, today I decided to take a look at a year’s worth of traffic, and I am absolutely blown away.

In one year, there were 743,557 unique visitors to my various open source projects. SuperScrollorama was the most popular with 487,475 pageviews. Unbelievable! And I noticed that I wasn’t even tracking Responsivator or BigVideo.js, so it was undoubtedly a lot higher, probably over a million. Amazing.

AWS Notes: Mastering NoSQL – Advanced Amazon DynamoDB Design Patterns for Ultra-High Performance Apps

DynamoDB Engineer David Yanacek started things out by giving an overview of the types of tables and queries that exist in a typical social network app, and how they look in DynamoDB. A second example he gave was image tagging, doing queries like images by users, by date, by tags and then combinations of those. He showed how to set up DynamoDB queries that would be useful and inexpensive for this type of datasets. He introduced their new concept of a Local Secondary Index, which is a temporary table created by DynamoDB, that allows you to run a secondary lookup on a result set. For tags, in DynamoDB you would have another table tie the tags to image IDs.

Then he revealed a new feature for DynamoDB: Global Secondary Index. This lets you do fast, inexpensive with more flexible queries.

The next example was managing states of a tic tac toe game, show some problems that arise and how to avoid them. David showed a problem in which a tic tac toe player could cheat by sending multiple update item requests, perhaps in separate browser windows. You can solve for that by doing conditional writes. Another fix was to add an expect clause to prevent bad writes.

He moved on to talking about finer-grained access control for user data. He proposed moving from a three-tiered architecture, to two-tiered where users do updates directly to DynamoDB rather than a service layer. Having just heard the ___ presentation, I knew where he was going with this: using Web Identity Federation to get temporary credentials and a session token. A new feature in DynamoDB is to limit access to particular hash keys and attributes. For example, allow all authenticated Facebook users to query the images table but only when their facebook id matches their user id. This allows us to expose the database directly to users, resulting in lower latency, cost and complexity. Some problems with this is less visibility into user behavior, more difficult to make changes without a service layer and requires work in scoping items to specific users.

Next up was David Tuttle, Engineering Manager of Devicescape. His talk would cover how they have implemented DynamoDB in their service. Prior to adopting DynamoDB, they faced challenges in scaling, DB maintenance, slow queries on large table and difficulty making schema changes. DynamoDB forces a different way of thinking about data.

He quickly gave a bunch of pro-tips: Organize data into items around a primary key, evenly distribute workload across hash keys, use parallel scans for high speed jobs. DynamoDB will not give you the expected throughput unless you distribute across hash keys. By using conditional writes and an atomic increment in the first item, they avoid collisions. Plan for failure. Use local files to resume data operations after a failure. Extract data to S3 to do non-realtime operations like analytics. Use DynamoDB local to speed up developer workflow.

Last it was Greg Nelson from Dropcam, the Wi-Fi enabled camera company that does intelligent activity recognition and cloud recording. They moved away from managed hosting to AWS and eventually DynamoDB. He showed how they use it to manage all their camera data, as well as user sessions. He drilled into what their table updates, queries and deletes look like for cuepoints (to keep track of important footage within the video tracks).

He talked about some of the complexity within DynamoDB, such as the concept of “eventual consistency”. With NoSQL, choosing a hash key is the most important up-front design decision. You need to keep very close attention on IOs. Throttling behavior is opaque, so you need to actively watch for problems.

AWS Notes: Writing JavaScript Applications with the AWS SDK

AWS Developer Loren Segal introduced the AWS SDK for Node.js It is open source, apache-licensed and on Github. They have full service coverage across 30 services.

Install through npm, of course:
npm install aws-sdk

Configuration is easy and can be done programatically (there’s a global configuration object). Loren ran through a bunch of code snippets for how to do various things. It wasn’t long before he got into some live coding in the terminal. First up some simple S3 requests and response handling. Next, his next livecoding example demo’d response pagination for scanning large DynamoDB datasets. Then he showed request event handling, and demonstrated the AWS global events object, allowing you to do global event handling on request success response for example. Last, he went over programmatically configuring credentials, showing that you can set temporary creds that automatically expires.

Loren moved on to show off the AWS SDK for JavaScript in the Browser which is now in developer preview. This literally allows you to build web applications with static files. Some examples of things that could be easily created are forum software, blogging, commenting systems, browser extensions and mobile apps. This is freaking awesome.

He created a sample blogging application (check it out on github). The app.js script is a mere 260 lines of code.

Loren live deployed the app. It had permissions set up, so he did a social login, then created a blog post using markdown and an open source WYSIWYG editor. Did I mention this is all with static HTML/CSS/JS files?! Nice. You can see the demo at

He went over the key differences between doing traditional three-tiered architecture vs two-tier development with AWS in the browser. He did say that in order to use it with S3, you need to configure CORS settings. For other services, CORS is not necessary because the requests for them are already authenticated.

He stressed the importance of not hardcoding your credentials. They get around this by using Web Identity Federation which trades access tokens provided through other services like Facebook, Google or Amazon for AWS keys. He showed quickly how to set up a Facebook application to create IAM roles and set up permissions for users. He showed some code for working with Facebook Access Tokens.

He finished by talking about the open source community. They love to get feedback, pull requests, issue reports and third-party plugins.

AWS Notes: 2nd Annual Startup Launches

I always love hearing startup pitches and launches.

First was Koality, a build deployment service built, of course, on top of AWS. Koality automatically parallelizes your test suite across available virtual machines, making tests up to 64x faster. They not only do pre-push unit testing (blocking bad changes), they make it easy to have private debug instances. They have a great client list with Dropbox, CrunchBase and most recently Airbnb.

Next was CardFlight, the open platform for mobile payments. They announced two new features, custom manual entry and integration with Braintree. Free 12-month subscription to their readers.

Some former Twilio guys created RunScope, which is a traffic inspector that aims to make it easier and better to integrate with APIs (like AWS APIs for example). Their debugging tools have tracked over 25 million API calls to date. I liked their motto: Everything is going to be 200 OK. Their new product announcement was Runscope Radar which is for adding automated API calls to your testing suite with assertions that can evaluate the request responses.

SportXast is the easiest way to create, view and share family sporting moments. You can do things like easily get instant replays. You can easily connect to a community around an event and share crowdsourced highlights around the players you care about. When a user uploads a video it goes to S3 to SQS to Elastic Transcoder to CloudFront and back to other users. This was a true launch as this was their very first release to users.

Nitrous is a free cloud-based development environment platform with a web-based IDE and cloud VMs. Their big differentiator from competitors is to reduce latency (via CloudFront) so that it is indistinguishable from localhost. They offer Google Docs-style collaboration. And of course, they’re hiring. (As is everybody!)

Hopefully one or more of these companies become super successful so I can say I was there when they launched!

AWS Notes: Scaling a Mobile Web App to 100 Million Clients and Beyond

For me, this was the best session of the conference so far. Joey Parsons, Head of Operations at Flipboard gave a talk about how they grew the company from their first user through to today.

He started by covering Flipboard’s “prototype phase” going to 100 million users. They started with a simple stack of Rails, EC3, S3, RDS, MongoDB and memcache. They submitted to the app store, launched on the iPad and monitored the Cloudwatch analytics. Then after some initial celebration, they noticed their CPUs were spiking. They spun up new servers and quickly hit the limit on their AWS account. Then they got rate limited on Twitter and Facebook. They made a call that night to rate limit their users to keep things in check, slowly opening the service up for new users.

They soon made a decision to switch to Java instead of Rails and add CloudFront to their stack. They also broke up what was once a monolithic app into separate microservices. They shifted their primary data store to MySQL via Amazon RDS. They started focusing on instrumentation and monitoring. Every instance they have was kept track of in a SimpleDB table with detailed information on the instance. This allowed them to do fast and powerful lookups of all the servers that power the operations of their company.

The next milestone for Flipboard was when they launched their iPhone app. Once again, after some initial celebration, they ran into some unforeseen performance problems as it scaled. In one night, with RDS, they were able to build a sharding mechanism that they still use to this day. Funnily enough though, the sharding didn’t matter for their problem — it all came down to one bad query that they fixed and everything went back to working great.

Their next launch was Android, and there were no bad stories to report there. Their stack continued to grow, adding HBase, Hadoop, Redis, Puppet and more.

They continued to focus heavily on instrumentation for all their services. They set up processing mechanisms with Hadoop, Storm and Kafka. They moved away from deploying with custom bash scripts, switching to Puppet. The most important thing though, he said, was to move away from just throwing hardware at problems and instead focus on using the appropriately sized EC2 instance both for best performance and cost savings.

chartTheir focus on instrumentation was not confined to server-side. Flipboard monitors a number of metrics on the client side by sending reporting data (such as how long it takes to open the app) from their apps to Graphite. They like the tool for metrics from hosts, apps, usage and logging. He gave props to the Cloudwatch2Graphite open source project that brings Cloudwatch metrics into Graphite. They divide their deployment into groups and use CloudWatch metrics to catch errors before they deploy. He had a neat slide of a pretty chart that they generate from that data using d3.js and cubism.js that can allow them to quickly see which parts of their stack may be causing performance problems.

What’s next for Flipboard technology? Better use of auto scaling groups, by dialing into lots of signals for better predictive analysis, continued heavy focus on picking the right instance types and taking advantage of any new AWS products.

He concluded with a philosophy that I share which is that the unknown is not scary, but rather it is exciting.

AWS Notes: Amazon Workspaces

This session was about the new Workspaces product that Amazon launched at the first day AWS re:Invent keynote. First he covered what customer problems they hope to solve with this. First, was to deliver desktop virtualization to tablets. Also, enable workforces to be more flexible and lower the cost for remote worker infrastructure.

End users can access their VM from laptop, iPad, Kindle Fire, Android. In his demo, AWS General Manager Gene Farrell grabbed an iPad that was running Windows, opened up Word and edited a document. It was interesting to see their UI for enabling a Windows 7 PC experience on a touchscreen tablet (via a radial touch menu). It integrates with Active Directory so users can access their organization’s intranet and so forth.

Interestingly, there is no data on the virtual device, it only delivers pixels and everything is stored on S3.

AWS Notes: Zero to Sixty with AWS Elastic Beanstalk

I ran a little late to this one, which is unfortunate because it was a really good one. I got there as Ann Wallace, Solutions Architect at Nike, was in the middle of her slides.

It seems Nike has a similar setup to AuctionsByCellular actually, with a Java stack built on top of AWS, deploying with Elastic Beanstalk (EBS). They went over their EBS deployment process and how they configure their environment. They do zero downtime deployments in much the same way we do by swapping the cname of the old environment with the new. They use .ebextensions to customize their EBS configuration. They showed their template.json file and an example .ebextension. She went over some of the problems that exist with the EBS deploy process (I’m sure Amazon is taking notes)

VTex is a large SAAS E-Commerce Platform in Brazil serving Latin America. Geraldo Thomaz, co-Founder and co-CEO, talked about how their use of EBS and AWS has evolved. They now have over 60 applications running on EBS. They did a quick demo of how easy it is to do releases with git, which again matched the deploy process we employ at ABC. They have a philosophy of doing many deployments of smaller size, multiple times a day. They even created a command line wrapper to further automate their deployment process. They use a Splunk .ebextension to monitor performance to make sure when they push new versions that there are no performance hits.

AWS re:Invent: Keynote Day Two


Amazon CTO Werner Vogels is quite a character. He talked about how there are so many products and announcements, it can be confusing. “Rapid delivery is in our DNA.” Werner repeated a theme that I had been hearing over and over again at the conference, that Amazon puts the customer at the center of everything they do. He said that when they start to evaluate new products, the first thing they do is write a press release and an FAQ before they write any code. They achieve rapid delivery by having small, autonomous “two pizza” teams that own their roadmap.

He then announced Amazon RDS for Postgres to much applause.

It was no surprise to see Netflix on stage as they may be AWS’s top and best known customer. Chief Product Officer Neil Hunt talked about all their open source projects. Chief Cloud Architect Adrian Cockcroft then announced the winners of the Netflix Cloud Prize winners. My favorite was the project that added the additional ability to torture servers to the chaos monkey.

Werner said we often think of innovation as creating “new stuff”, but often times the best innovations occur in improvements around things that don’t change and will help a customer forever. He said they focus on performance, security, reliability, cost savings and scale.

The next AWS announcement was I2 instances, SSD servers that have ridiculous read/write speeds, which in turn will enhance DynamoDB’s already fantastic performance.

Next up on stage was Ilya Sukhar, co-founder and CEO of Parse which offers a SDK that makes it easy to create apps across all devices. Parse is powering 180K applications with push notifications and API requests. He called out Elastic Beanstalk’s PIOPS as being particularly important to their success by delivering consistent DB performance to their apps.

Werner came back to cover security (he announced finer grained access controls and encryption for DynamoDB and other AWS products), cost savings (new bid-based pricing on the allocation and operations of AWS services).

Last, he spoke about scaling, citing WeTransfer as an AWS powered company that is a platform for transferring large files via email and also popular for serving wallpapers for artists.

Mike Curtis, VP of Engineering at Airbnb came out to talk about his company and its growth. From day one, it was built on AWS and their policy is that whenever AWS has a product that can solve one of their problems, they use it. They have over 1000 EC2 instances and 50TB of S3 Storage for photos. And they do it with a 5 person operations team, only possible because they are able to leverage AWS.

Werner came back on and showed a neat AWS-powered product called Narrative, which is a lifelogging camera that takes a photo every 30 seconds and sends it to S3 to store.

The next speaker was Dropcam CEO Greg Duffy. Dropcam is a Wi-Fi monitoring camera and cloud service for your home. They are now the largest inbound video service on the internet, with even more video being uploaded to it than YouTube. Without the cost savings provided by AWS, their company could not exist.

Werner came back to talk more about companies using AWS like Moovit, DeConstruction, Netflix and Echo.

Then, he announced yet another new service Amazon Kinesis, for real-time processing of streaming data at massive scale. This enables things like realtime analytics of data. It integrates with other AWS products, like DynamoDB, S3, RDS, etc. It was demo’d by Khawaja Shams. He showed an example of using it to do data exploration of tweets on twitter and trends via complex queries on large historical datasets. Undoubtedly this will be a very popular tool in the big data space.

AWS Notes – AWS Storage and Database Architecture Best Practices

AWS re:Invent SessionAWS Enterprise Solutions Architect Siva Raghupathy started by stating that 2.7 zettabytes (ZB) of data exists in the digital universe today. There will be 450 billion transactions per day by 2020. Most data is unstructured text.

How should we be handling all this data? It is about finding the right tool for the job. He broke down the AWS services into different categories based on the types of problems being solved.

There are primitive compute and storage options, kind of like a hard disk, that add flexibility because you can host any major data storage technology but come with operational burdens.

Next there are managed AWS services, for complex vs. simple queries and structured vs. unstructured data. He included blob stores like S3 and Glacier where you are storing unstructured data that isn’t attached to any query.

AWS Storage from hot to coldHe often asks his customers the question, “What is the temperature of your data?” Hot data is smaller, with low latency and a very high request rate. Cold data is vast, mostly static and infrequently requested. Warm data is somewhere in between. He then mapped the various AWS storage services, from hot to cold.

He spoke about cost conscious design, and then demonstrated the concept with an example. He fired up the AWS simple monthly calculator to figure out the correct AWS data storage service to use based on the cost. In his example, one would first think S3 was the appropriate solution, but after running it through the calculator we saw that because of all the small objects, DynamoDB was a better solution at less than 10% of the cost. You can use the AWS calculator to validate your architecture design. The best design is the one that will cost the least.

You can get further savings by moving data from one store to another as it cools down.

Next he moved on to talking about the AWS database services, starting with RDS. He said to use it for transactions and complex queries, but not for massive numbers of read/writes or simple queries that can be better handled by NoSQL. Furthermore, it is necessary to pick the right RDS DB instance class.

When to use DynamoDB? He said pretty much whenever you can. The only times you wouldn’t use it is for complex queries and transactions or for cold data. For DynamoDB best practices, keep item size small, store large blobs in S3 with metadata in DynamoDB and use a hash key for extremely high scale.

Last, he spoke about ElastiCache for speeding up reads/writes by caching frequent queries. Redis in particular is quite popular, but noted that it is not a good option for when data persistence is important.

He quickly wrapped things up going over the AWS unstructured data text search tool CloudSearch(don’t use as a replacement for a database), Redshift data warehouse service for complex queries on large quantities of historical data (copy large data sets from S3 or DynamoDB) and MapReduce (the “swiss army knife” for parallel scans of huge datasets).