Philometrics Big Data Case working Algorithm was smart enough to dilute psychographics

A group of social scientists and computer engineers who are passionate about transforming survey research. Our founders began as academics at the University of Cambridge, working on finding scalable ways to understand how people think. The survey--a 120 year old instrument of choice for most social scientists--became the focus of this work. Their team found that a combination of big data, machine learning, and survey acumen could lead to remarkable results: Hundreds of thousands of responses to virtually any survey - and at a fraction of the typical cost. 

Image result for philometrics
Philometrics was founded to help bring our technology to the masses. They were passionate about empowering researchers to do the most cutting edge work they can--and without adding big barriers to entry. Philometrics was built on these ideas. They wanted to be powerful, innovative, fast, and easy to use. They wanted to optimize research through the power of online behavioural big data analysis 

Through innovative SurveyExtender method produces insight on a revolutionary scale, empowering clients to understand people, markets, and competitors more deeply and accurately than ever before. Philometrics' team is formed of experienced business analysts and leading academic researchers, all of whom are experienced in helping academics and companies utilize the power of detailed survey insight.

What is SurveyExtender?

The basic idea behind surveys is 120 years old: To get a response, a person has to answer your questions. This is both expensive and time consuming. Not surprisingly, most surveys don't have many respondents. And those that do, are astronomically expensive. Yet they know that surveys with few respondents usually do not accurately reflect the population. So survey researchers are typically left with poor quality data and poor quality answers. To date, this has been acceptable because no better alternative existed.

SurveyExtender changes this. It's built on the idea that we can forecast how hundreds of thousands of people would respond to a survey without actually needing them to take the survey. This idea is provocative and a major shift in how we think about doing survey research. They have spent the last few years figuring out how to do it, and building a rigorous science to validate the method and give us the confidence that this is the way to do survey research.

The result: They take your survey of 1,000 respondents and give you back responses for over 100,000 anonymous respondents across the United States.

Survey Extender works by using state-of-the-art machine learning technology that automatically builds models predicting responses to your survey questions using people's a special survey of demographic and psychological questions.

The rest of this page shares in detail the scientific validation of SurveyExtender and the basic principles of how it functions. They were ultimately skeptical scientists and have worked hard to convince ourselves that survey forecasting works. We hope you find our evidence convincing. But the best way to actually see if it works is to test it yourself. Run a survey, have it extended, and play with the results.

Basics Behind Survey Forecasting

SurveyExtender works by (a) taking a sample of participants (at least 1000) who have answered your survey and a special "Rosetta Stone" survey of ours, (b) uses machine learning to build models of how the "Rosetta Stone" survey maps onto your responses, and then (c) feeds through the "Rosetta Stone" survey responses of 100,000+ anonymous people through these models to generate forecasts for how they might answer your survey.

Step 1: Your Survey + Social Media from 1000 respondents

For SurveyExtender to work, it will need data to build its machine learning models. This data has two parts: (a) Responses to your survey (which will be used as the output), and (b) A special all of our participants take which we call the "Rosetta Stone" survey (which will be used as the input). Philometrics makes it easy to get both parts of the data. Simply build your survey using our survey engine and then turn SurveyExtender on in the study settings (or when creating the study). They will automatically add the necessary questions to the beginning of your study to collect the "Rosetta Stone" survey.Once you have your survey ready, simply recruit at least 1000 people to do your survey. They need at least 1000 people to answer each question to ensure the models have high enough accuracy rates. For best results, they recommend recruiting even more--1500 to 2000 respondents.

Step 2: Automatically Building Models

Once you have collected at least 1000 participants who have both (a) Responses to your survey and (b) the "Rosetta Stone" survey, SurveyExtender applies machine learning algorithms to build a model for every question that is either continuous (likert, slider) or categorical (multiple choice, drop down, checkbox) in nature.The models take the "Rosetta Stone" survey responses as inputs--the Xs in a regression. Then they set each survey question as the output--the Y in a regression.

Step 3: Forecast

Once the models are ready, we feed the "Rosetta Stone" survey data of over 100,000 anonymous people through the models. This generates a forecast for each person how they might respond to each survey question. This forecast is based on each person's psychology and demographics behavior. To learn more about the logic behind why this works--and why it provides better results than just surveys--keep reading!

The Need for Large Samples

Research is about generalizing: They want to take something we've learned about our participants and generalize the insight to the broader population. This has been historically difficult to do without a very large number of participants. Let’s take an example: Imagine you want to conduct a political poll. You recruit 1000 Americans and ask them their political preferences. In the best case scenario, this sample is nationally representative. 

What you can confidently learn from these 1000 people is how on average, Americans are leaning in their political preferences. However, this is a national level snapshot, and misses much of the rich differences that exist between states. So now let’s imagine you want to go deeper and ask what are people's political preferences in each state. The 1000 participants provide poor answers to this question because you have very few people from each state - smaller states, such as Vermont, may not even have a single person represented in your 1000 participants. You quickly run into the small N problem: Too few participant data points on the population (each state's voters) you are trying to generalize about.

To get a good estimate by state, you would need to go out and get 1000 participants in each state, quickly ballooning the sample size to 50,000. The problem only escalates further as you zoom in deeper (say to the district level) or start comparing different groups within states (say men versus women, or younger versus older voters).

They've never had a good solution to this problem other than to go recruit massive samples - something that has been prohibitively expensive and thus rarely done. For the most part, we've simply been unable to explore the rich variability we know is there, but simply cannot access.

This problem stems from them being bound by a simple rule: The only way we can access how people respond to surveys is by asking them, and so our sample size is equal to how many people they recruit. Philometrics aims to change this through SurveyExtender.

               Image result for data analytics and psychometrics informing assessment practices

How Survey Forecasting Works ?

Let's start with a relatively uncontroversial (we hope!) claim: All of human behavior is driven by what's going on in our heads and the social environments we live in. If we could understand perfectly how a person thinks and the environment they are in, then we should be able to predict how they would answer a survey (which is just a form of behavior).

Survey forecasting is based on this principle - but it’s far more approximate than the ideal case. They first say let’s ignore the social environment and just figure out how a person thinks. This would get us quite far as it would remove some of the situational biases that often occur in our surveys (though potentially limit our ability to understand the role of situational forces in shaping behavior). Then, they try to figure out an approximation of what’s going on in their heads by using a questionnaire designed to estimate their personal psychology.

But people are extremely complex and our "Rosetta Stone" survey is unlikely to capture this richness fully. So the information we extract os imperfect: It is only a rough approximation of a person. At the same time, the information does carry some level of insight about the person.

What they would expect then is for the "Rosetta Stone" survey data to be able to provide us rough approximations of how people think - and in turn, how they answer surveys. These should be rough with lots of noise.

Accuracy of  Forecasted Data

How do they know their models and forecasts are any good? To answer this, we test all the models using people who were not part of the training process (so their data in no way shaped what the model looks like), but have both actual survey responses and our "Rosetta Stone" survey responses. They take these people's "Rosetta Stone" survey responses, feed them into our models, and generate forecasted survey responses. We then compare these forecasted survey responses to their actual responses.

There are two typical accuracy metrics. For continuous variables (e.g., Likert responses, slider responses, age), we simply correlate the forecasted response with the actual response. For categorical variables, we use Area Under the Curve (or AUC), which is a measure of the percentage of time you would accurately classify a person as being part of a category assuming that you had equal number of folks part of that category and not part of that category (so complete guessing should lead to a 50% success rate). The AUC number ranges from 0 to 1, with .5 being the chance guessing level and 1 meaning you classified every person correctly.

In their system, they do the training and testing using the participants that actually complete your survey. For all these participants, they collect the "Rosetta Stone" survey responses behind the scenes. Since the sample sizes are relatively small, we use resampling techniques to generate the test group. The ultimate result though is a model for every question you asked that can be used to forecast. For each model, we tell you the accuracy rates using either R (correlation for continuous variables) or AUC (for categorical variables).

The building model part is only step 1. The really exciting bit happens next: We take 100,000+ real people, they feed their "Rosetta Stone" survey responses through the models to forecast the best guess of how they would have answered your survey. These scores are noisy - in no way are they actually spot on in forecasting how a particular individual would actually respond. But they are better than chance, and far more importantly, they have several properties that make using them to generalize to the original population extremely powerful - in fact, far more powerful than using the self-report! Keep reading to find out how this happens and how they validate this claim.

How Do Biased Samples Affect Forecasted Scores?

In working with survey forecasting, they discovered two very important properties of forecasted scores. First, they found that even though the scores had a lot of noise, they were unbiased: Our models are just as likely to over-predict as under-predict a person's score.

The mean of the errors is nearly 0 and there's roughly a normal distribution around the 0 point. This is extremely important as it tells us our models are not making systematically wrong forecasts. Now this is where the power of a large sample comes in. If these were actual survey responses, and they were attempting to get an average for the population they came from, they would average the survey responses and get a pretty great estimate of the true population average: 

This is because with a large sample, and assuming there is no systematic bias in people's scores, the average across all people is close to the true average. In the case of forecasted scores, if we take a large group of people with forecasted scores and average them together, our average error becomes nearly 0 and we get a population average that should be very close to the truth.

What was amazing there is that since they can forecast for many more people than in our original training sample, they can get far better population averages from the forecasted data than the actual survey responses they collected! Think back to our political poll example. If they had surveyed 1000 people, there would be too few data points per state to say anything meaningful about the average in the smaller states. But through forecasting, they had estimated for 100,000+ anonymous people - plenty of data points per state. And since the forecasts are unbiased, when averaging them together, we should get pretty close to the true state average (e.g., what we would have found if we had surveyed everyone in each state).

The second important property of forecasted responses is that they tend to correlate with each other sensibly. If you take a correlation matrix of actual survey responses and compare it to a correlational matrix of the same variables, but forecasted, the correlation matrices are often quite similar. The directions of the correlations are almost always the same, with the only difference being magnitude.

They were continuing to work on making the models even better and making the correlation matrices as similar to the self-report as possible, but already it’s quite good. Why this is important is that ultimately, they often care about two things in our surveys: Population averages and how variables covary with one another. They've already covered the high accuracy at the population average from the forecasted responses; the correlation matrix similarity property provides shows us that forecasted data is also highly useful for understanding relationships.


Set up your Wordpress blog in 10 minutes

In the WordPress control panel: The options are all in the left sidebar. The important ones are:
  • Under “posts”, “add new”…
    That’s where you write a new post, then click on the “publish” button and it appears on your website. In the “posts” section, you can also edit or delete your old posts, divide them into custom categories, etc. It’s all pretty intuitive and self-explanatory, you’ll figure it out real easy. Just try all the buttons to see what they do, that’s probably quicker than me trying to explain everything.
  • The “links” section…
    This is where you build your blogroll or other lists of links you want to appear in your blog’s sidebar.
  • The “pages” section…
    “Pages” are pretty much like “posts”, except they show up in the header menu instead (on this site, scroll up and look at the menu under the picture at the top – those are “pages”). Use “pages” for things you want to always be easily accessible to your readers.
  • The “comments” section…
    Here you can see, approve, disapprove, and otherwise screw with comments your readers leave on your blog posts. If you want to disallow comments for a particular post, there’s a checkbox for that on the page where you write the post. If you want to disallow comments for the entire site, that’s under “settings -> discussion”. Allowing comments is generally a good idea if you want to discuss your posts with readers, but if you can’t find the time for that, you may just want to shut off the commenting function.
  • Under “appearance”, “header”…
    Here you can upload your own image to be shown at the top of the site, or choose from the default ones.
  • Under “appearance”, “background”…
    Choose a background color or upload a background image.
  • Under “appearance”, “menus”…
    Here you can create various menus to show in your sidebar, with links to your pages and/or elsewhere on the web.
  • Under “appearance”, “widgets”…
    Here’s more stuff for the sidebar, like a function to show a list of links to your most recent posts, a search bar, and stuff like that. You can drag these “widgets” into the various “widget areas” to have them appear in the sidebar or at the bottom of your site’s pages.
  • Under “users”, “your profile”…
    …pretty self-explanatory.
  • The “settings” section…
    All the pages in this section are filled with various options you can adjust to make your site work just the way you want it to. Do you want comments or not? Do you want emoticon graphics or not? Do you want to change the name or tagline of your blog? That kind of thing. One thing you should know about is…
  • Under “settings”, “reading”…
    Here you choose whether you want the front page of your blog to display your latest posts, or to be a static page with something else you choose to put there. If you want a static page, you first have to make two new pages in the “pages” section described above. Then, come back here and select one as the “front page” and another as the “posts page”. You can now edit the page that’s the “front page” to show whatever you want.
  • The “plugins” section…
    Here you can install extra functions to do things that WordPress normally doesn’t do. The “add new” option will allow you to search for plugins, and install them easily with one click. Just type in the name of the plugin you want, and it’ll show up.
  • Then click to install it, once it’s installed go to the “installed plugins” page and click to activate it, and click on its “settings” if you want to adjust those.
A few plugins you should search for and install right away:
  • AkismetProtects your blog from spam comments left by advertising robots.
  • W3 Total CacheReduces server load and makes your site faster. Your web host will get mad at you if you don’t use this. This plugin has a lot of settings you won’t understand, but you pretty much don’t need to worry about them. Just activate the plugin, go to your site and right-click to “view source code”, “view page source” or something like that depending on your browser, and look for a text that says “performance optimized by W3 Total Cache”. Great, it’s working! If you run into a problem, HostGator’s Live Chat Support will help you – the link for that is in the top right corner of the HostGator front page. (If you’re not hosting with HostGator like I suggested, well, then it just sucks to be you.)
  • Clean-Contact :A form that allows readers to send you email without you having to publicly display your email address on your website for all the spam robots crawling around the internet.
You may want more plugins to do different things for you. Google “wordpress plugin” and whatever words are relevant to what you want to do, and you’ll probably find something.  And that’s probably all you need to know about the technical aspects of blogging!. Now, there’s a few more things you may be interested in…

How to get readers for your blog:

Tell people about it. The beginning tends to be the hardest, once you get some readers they’ll generally tell their friends if your writing is any good. To get those first readers, you’ll need to somehow let people know that your blog exists. 
If you’re regularly commenting on other blogs or participating in forum discussions, just add your web address to your signature and people will find your site from there. If you write guest posts for another site, you can often do the same. 
If you write about other bloggers’ writing, participate in back-and-forth discussions across blogs and network with people who share your interests, you should soon notice other bloggers starting to link to your site (provided your writing is any good). 
How many readers do you really need, though? Just a few people who share your interest in whatever it is you write about and are willing to discuss it with you will work to sharpen your mind. If you want thousands of loyal followers to admire you as a prophet so you can feel special, you’ll need to write really good stuff that people will spread the word about blog money making then 

How to make money from your blog:
  • This is a science in itself and you can spend all the time in the world fine-tuning and optimizing and working to squeeze every extra penny out of your blog, or you can just sign up with one of the major ad networks, slap up some ads on your site and be content with whatever you get. If you want to make a living and/or get rich blogging, know that it will not be easy. You need to be pretty special, you’ll need a good business model and you’ll need a lot of readers. Many bloggers dream of living off their blog profits, and yes, it can be done, but just like Game or anything else, it’ll take a lot of work. There’s no particular reason you can’t be the one to do it if you have the determination to learn what you need to learn and do the work you need to do. It’ll take a while, but the people who think “you just can’t do that” like getting rich from writing a blog is somehow against the rules of life are just wrong. You know who got rich writing a blog? I bet you guessed it… Steve Pavlina.  I think you should start a website of your own. For me, it's been one of the best things I ever did, and it's a lot easier than people think...