Archive for the 'Internet' Category

Behind Google search - Udi Manber interview

Today I came across an interesting interview with Google guru Udi Manber.  Udi is the Vice President for search quality at Google. It is challenging position considering the monetary value associated with search engine ranking. There are two things which caught my attention from the interview,

So let me first tell you about Google. At Google we do not manually change results. For example, if we find for a particular query that result No. 4 should be result No. 1, we do not have the capability to manually change it. We made that decision not to put that capability in the algorithm—we have to go and actually change the algorithm. That is, we have to find what weakness in the algorithm caused that result and find a general solution to that, evaluate whether a general solution really works and if it’s better, and then launch a general solution. That makes the process slower, but it puts a lot more discipline on us and makes it more unbiased.

So what this means is that Google algorithm is continuously changed to tackle spammers, link exchanges, link sales etc. So if someone is clever enough to circumvent the quality parameters of Google, they fix it in the algorithm rather than specifically penalizing the guy. Of course, in extreme cases the site is taken out from search results!

Yes, I told you we launched our 450 improvements. When we decide to launch something, we have a weekly meeting where all those things come together and we look at all the evaluations and we make decisions—revenues and any effects on ads do not come into those meetings. We don’t even know what the effects are.

This is another interesting aspect. Even though majority of Google revenue comes from advertising, the search algorithm is focused only on search quality. It is hard to believe that search engine enhancements are in no way impacted by Google’s interest in the advertising revenue.

I have been experimenting with Google search algorithm for sometime now. One thing is certain. If you have quality articles which are not too short, they get ranked well even if you don’t have many links! Also there seems to be some algorithm which ranks new good content very high initially. During this initial period the content automatically gets linked by a lot of people who reads it through search results. If the content is not good, it won’t get linked and the ranking drops gradually.

Changing spamming tactics

I had submitted this blog to a couple of free web directories. The end result is that I don’t have much of traffic, but I do get plenty of spam comments. Now recently it has reached a new milestone of 100 spam comments per day.

I think spam study will be a fascinating field. Most of the spam is auto generated, but I do get occassional manual spam. If I have a post on “ruby”, the comment will be something like “your ruby knowledge is amazing!”, flattering indeed!

Then there is a kind of spam message no one is going to approve. These are long comments and most of them contains over 10 links! These guys seem to survive on blogs which are not moderated.

I had even some comments pleading not to delete them.

Now you would be wondering why I am writing a post on blog spam? Well, today I got a reason - I got a gem among the comments that needed moderation. Check out the following screenshot.

Best spam comment I have received

This is guy is telling me not to delete the message and he says “the money from spam will go to help hungry children in uganda”! Funny that he calls the message as spam :-)

I don’t know whether this tactic will work. Ok. This might work. After all, even in India nigerian spammers are able to cheat some innocent (read: greedy) guys.

Google search index now near realtime!

Matt Cutts has written a post showing how quick fresh content is indexed by Google crawler. It is unbelievable that it takes only 10 minutes for the content to appear on the search index!

I wonder what kind of an infrastructure they have for crawling the “entire” internet! This also means that google index is almost as fresh as google blog search or technorati (which uses ping service).

Planning to buy computer components in India?

If you planning to buy a computer component or thinking of assembling your own PC, here is a Website which can be quite handy!

Delta Peripherals in Chennai has a cool Web page which lists down the computer component prices in India (Chennai). I have been tracking this for sometime and it appears to be kept upto date. You can use this list to plan your purchases. It also ensures that your friendly neighbourhood computer dealer cannot fleece you!

Online reservation by Indian Railways

It was not long ago that Indian Railways started online reservation of railway tickets. Earlier booking a ticket was a tedious chore. You had to stand in the queue for hours and it was not easy to know whether there is ticket available or not! All that changed with the arrival of online reservation.

Unfortunately, the online reservation service has its own evils. First, you end up paying a lot of “transaction charge”, “service charge” and “reservation charge” for your ticket. Here is an example,

Suppose you try to book a ticket from Trivandrum to Ernakulam. Ticket price is Rs. 110.  On top of it there is something called “reservation charges” which is Rs. 20.  Now on top of it there is a Rs.15 service charge!  And then add Rs.10 as “bank transfer fee” and an additional service charge of Rs. 1 !!!  So the total now comes out to be Rs.156.  That means you end paying 40% extra for using the online reservation facility.

Remember, Railways is earning a lot through advertisements on its Web page. In fact, it is very difficult to find the login fields among all the ads on the home page as you can see below.  Interestingly irctc.co.in has an Alexa rank of about 2000! So it is an advertising gold mine!

IRCTC screenshot

Now the most funny part of the online reservation site is something called the Shubh Yathra. It is similar to a “frequent flyer” programme. But unfortunately it applies only to upper class travel (even 3 tier AC is excluded!!) and there is an annual fee of Rs. 500! Look at the following equation to see what you get,

Assume you get Shubh Yathra membership card for Rs. 500. Now you get points only for your own personal tickets. Assume that you spend Rs. 10,000 for tickets (excluding family members). That means you earn 4*10,000/100 = 400 points. This translates to a bonus of Rs. 400.  Now remember you have already paid Rs. 500 as subscription fee. So the net profit out of Shubh Yathra for you is = (-100 rupees).  Aaha, you are paying them money ?  Funny indeed!

Google to address paid link menace

I think text-link-ads will have to find a new business model soon. It appears that Google is taking the issue of paid links very seriously and according to this Matt Cutts article, you can now report paid links appearing on a Webpage!

In another post, Matt Cutts talks about the disclosures required if you are going to place a paid link. Basically provide a machine readable disclosure (rel=nofollow) and a human readable disclosure.

As you can guess from my previous articles, I am a strong advocate of rel=nofollow. As a leader of search engines, Google has the responsibility to clean up the paid link menace in its early stage itself. This is the only way to ensure integrity of search index.

The explosion in online advertising and the huge interest generated by blogs have begun to pollute the search results of google and other search engines. In fact “how to make money online” seems to be the most searched term these days. I am not against “making money online”, infact I am trying to atleast cover my hosting expenses via Adsense! But I think search engines should strive for achieving this simple statement - “Content is the King”.

It is a shame that other search engines doesn’t support rel=nofollow. 

We need rel=negative tag!

There are many who oppose the “rel=nofollow” tag used by major search engines such as Google. But I am in favor of this tag, infact I wish we had something called “rel=negative”. So if I link a site with “rel=negative” then Google should reduce the Pagerank of the site being pointed to.

This will ensure that all the spam based sites such as the MFA (Made for Adsense) will quickly get buried (borrowing the Digg terminology!). This opens up a lot of possibilites. For example, Wikipedia could use this tag and create a spam directory!

There are couple of issues with this approach. A determined campaign against any site could potentially bring it down! Also the domain which gets a lot of “negative rank” will eventually disappear from the internet, so bringing it back will not be easy!

Note that Google already does some kind of negative ranking for those sites which link to spam Websites and for those sites which has duplicate content.

But I think there cannot be any substitute for manual filtering. What Google needs is a  dedicated team of 10 guys who will do a daily scan for top 10 spam sites which gets most traffic. Once a site is identified as spam, these guys will simply remove it from the search index.  I am not sure whether such a team exists in Google!

PS: After writing this post, I was going through my RSS feeds and came across two interesting news items. It appears that Google is actively hunting spam sites on blogger and my friend binny got tagged as spam!.  They are also going after spammers on Gmail (which seems to have backfired since they had also deleted some non spam accounts!)

Google Trends Analysis - finding sex correlation

One of the little known Google tools is the Google Trends.  Surprisingly rich amount of information can be derived using this tool.

Google Trends shows the relative volume of keyword search in Google Search. You can provide a list of keywords separated by comma and then see comparative search volume over a period of time.

I started my post saying that Google Trends can give you amazing level of insight. Combined with regional information, it can give insight into social aspects of human life as well!

In my experiment, I used these keywords - peace,love,life,kindness,sex. Anyone can guess which is going to be the most searched term. There is no surprise there. Here is the Google Trends chart,

Google trends for sex, life, love and kindness

Now I did a comparison of search volume across United States and India. Here are the search volume graphs,

Google Trends - US

Google Trends - India

Noticed the huge difference? What is your conclusion? :-) I have no comments!

Lastly I did a search for sex for all the regions. Here is the top 10 regions.

Google Trends for sex

From this study, I derive the following formula,

Sex Interest = Sex Suppression * Population * Sex Co-efficient where Sex co-efficient is a universal constant. When Google gives out the actual volume figures, we will know this constant!

Is there a perfect online polling system?

Perfect voting system?I wanted to create a poll on this blog and then I realized that there is no such thing as a perfect poll! The main problem is, how do you prevent bogus or duplicate votes?

Many existing polling systems use IP address to identify duplicate votes. But this eliminates a lot of valid votes! For example, 1000+ employees at my office use the same proxy server and hence to the polling system will appear as a single voter!

Also different IP addresses can never guarantee perfect polls. Methods used for internet anonymity such as Tor can be misused for bogus voting.

Other methods used are the cookie method and the email verification method. Both these are not fraud proof. Cookies can be easily deleted and multiple email ids can be easily created.

What we need is an internet identity which probably links to something like a passport number!

During my search for a perfect polling system, I came across this link. It claims to build a fraud proof voting system. But in reality it is probably the worst voting system!

On why I need more idiots to read this blog!

First of all, I am sorry if the title was insulting. But I couldn’t find a better title than it for what I am going to say!

Some of you would have already noticed the presence of Google Adsense on this blog. The primary aim of this blog is not to make money, but at the same time I hoped that it would atleast cover my hosting expenses ($100 a year).

Now here is the interesting part. This blog is completing 2 months this week. So far I had around 1800 page views as you can see from the following Google Analytics report. I know, it is pretty low compared to A-list blogs, but still I was expecting some earnings from these impressions.

 jaysonjc.com page views

Next let us look at my Google Adsense earnings from this site. There is no surprise there! - It is a total of $0.00!

Now frankly this is very discouraging. But I know adsense does work. As you can see, guys like John Chow (a nasty guy who would probably bring down the whole internet!) and Amit Agarwal does make sacks of dollar notes every month. So what is the reason why I am not getting any adsense revenue?

I did a bit of research and I think there are mainly two reasons for this,

1. I don’t have enough ad units on the page. Some think that having less ads gives more money. In my blog, I don’t think it works!

2. The most important reason I think is that most of my blog readers are intelligent. Even those who come to this site via Google search are pretty intelligent. I guessed it from what my readers are searching to reach this blog.

So in order to get more advertising money, I need to gather an audience which is less intelligent. If I could get an army of idiots coming to this site, I am sure my adsense income will skyrocket!

I think the easiest way to do that would be to write about Paris Hilton, Britney Spears and American Idol. Don’t worry I have no such plans, yet. :)

iREEADD THIS spam hits Orkut

Orkut SpamThese days, spammers are everywhere. The latest spam to hit Orkut is the “iREEADD THIS” spam. I got this message from 5 of my Orkut friends. This clearly shows how easy it is to mislead people! Here is what the mail contains,

HEY ITS DIANNA, FROM THE DIRECTOR OF ORKUT,EVERYBODY SORRY FOR THE INTERRUPTION BUT ORKUT IS CLOSING THE SYSTEM DOWN BECAUSE TOO MANY BOTTERS ARE TAKING UP ALL THE NAMES, WE ONLY HAVE 57 NAMES LEFT, IF YOU WOULD LIKE TO CLOSE YOUR ACCOUNT, DONT SEND THIS MESSAGE, IF YOU WANT TO KEEP YOUR ACCOUNT ,SEND THIS MESSAGE TO EVERYONE ON YOUR LIST. THIS IS NOT A JOKE, YOU’LL BE SORRY IF YOU DONT SEND IT. THANKS DIRECTOR OF ORKUT, TIM BUISKI. WHOEVER DOESNT SEND THIS MESSAGE, YOUR ACCOUNT WILL BE DEACTIVATED AND IT WILL COST YOU $ 10.00 A MONTH TO USE IT.

As you can see it looks incredibly obvious that it is a spam(Upper case, full of grammer mistakes and they have only 57 names left!). But what surprised me was that 5 of my friends forwarded it to me! Surely they don’t want their accounts to be deactivated!

Another puzzling thing is the motive of this spammer. There is no link. Hence he/she must be doing this just for the kick of it :)

Buying electronics stuff from Ebay India

Ebay IndiaEbay India offers a lot of good electronics stuff. You can find a lot of stuff that is difficult to find in the neighbourhood electronics shop. It is also easy to pay (Paisapay) if you have an account in a “new generation” bank such as ICICI. But there are some things you need to be aware before buying any stuff on Ebay India. You should not use the number of positive feedback alone before making a decision. Here is why,

Final price may be different from item price
Since the competition is high, sellers tend to underprice the item and then increase delivery or other charges. So always check the total price you need to pay before making a decision.

Check the seller feedback
This is a no brainer. Sometimes you will see that people have given positive feedback even though there are problems with the stuff. Indians generally are ready to swallow minor problems! So read feedback carefully. An example is this positive feedback - “Checked everthing works fine, does warranty needs a card or the bill is enough?”. The real question he is trying to ask is - “Where is my warranty card?“.

Beware of the warranty trap
You will notice that a lot of items are advertised with a warranty. In reality, this is not manufacturer warranty. They don’t give any manufacturer warranty (I suspect some of the items are smuggled or refurbished items). What they mean is that if there is a problem, you need to send it to the seller and get it repaired! As you can imagine it is a big hassle.

Beware of pirated items
If you see something really cheap, chances are that it is a fake or an old item. This applies not only to games and software but even to memory cards (CF or SD cards)! There is no effective way to identify this beforehand. Leave it to your luck! :)

Never deal outside Ebay
If you started a deal in Ebay, close it in Ebay. Some tend to negotiate outside to minimize Ebay fees. But if something goes wrong, you are screwed. Also pay for your item using “PaisayPay”. This gives you some protection from rogue sellers.

Fun with Google Adwords

Last week I signed up to Google Adwords which allows me to advertise my site on Google partner sites using Adsense. I started a new campaign to see how the whole thing works. The whole setup process takes only 10 minutes and all you need is a credit card.

Initially, you start with a “basic” account which simplifies ad creation process. Being a geek, I immediately upgraded it to “Standard Edition”. This offers a huge set of options  such as content targeting, site targeting, geographic targeting etc. It will take sometime before you are familiar with the set available options. 

When I tried to add my first ad, I came across a problem. My idea was to promote my blog as the “best blog from India”. But unfortunately Adwords doesn’t allow putting superlatives such as “Best”! Here is the error I got,

 Google Adwords Error

This seems funny since any ad served by Adsense is clearly marked as “Ads by Google”. So why can’t call my blog the “best blog from India”?

After correcting the error, I released my first ad to the internet world. In a day, Google served over 10,000 impressions of my ad! I got around 80 visitors through the campaign. But all of this comes at a cost, and hence I paused my campaign for the time being :)

But this confirms one thing. If you have a ecommerce site, advertising using Adsense will get you a lot of new customers and business.

Fighting comment spam in a Wordpress blog

We all are used to email spam. I get around 100 spam mails daily in my Gmail account. Thankfully most of these are identified as spam by Gmail and gets moved to spam folder automatically.

When I started this blog, I never thought that spam would be a major issue. Initially I haven’t added any comment moderation. Within a week, I started seeing spam comments mostly related to pharmacy and drugs. I started manually deleting spam and soon I realized it is not going to work.

In Wordpress, under options->discussion, there are a couple of spam fighting measures available. I enabled comment moderation which automatically puts a comment in moderation queue if it contains 2 or more links. I have also enabled common spam word protection. This means that any comment which contains words in this list will be automatically put into moderation queue.

The solved the problem for a few more days. Then I noticed that I have over 100 comments to moderate. Now sorting through 100 comments to find a genuine comment is not something you would cherish!

Wordpress provides something called comment blacklist. If any of the words in this list is part of the comment, the comment will be nuked. It will not appear in moderation queue. So I analyzed few spam comments and added the common words into the comment black list.

I had hoped that these measures would solve the spam problem. Soon I realized that I was too optimistic. I started getting a lot of comments and it contained black listed words with spelling mistakes! For example, the word viagra will appear as viegra or something similar.

Looking at the spam comments I noticed that all of them are coming from a set of specific IP addresses. So what I needed was a way to blacklist IP addresses.

In Wordpress, under manage->files you can see the .htaccess file. This can be used to block a specific set of IP addresses. So I added the following entries in this file (Substitute the actual IP address instead of 127.0.0.1)

order allow,deny
deny from 127.0.0.1
allow from all

So today, I have no comments to moderate. Thank god! :)

Notes

1. There are sophisticated spam fighting tools such as the Akismet which is distributed. I am yet to use it.
2. It is better to disable trackbacks. Tools such as trackback submitter is widely used by spammers.

The yesfollow project - Not a good campaign!

Recently I came across the yesfollow project. This is a campaign against “rel=nofollow” attribute usage in blog comments. I can identify with their sentiments, but I don’t agree with them completely. Anyone who adds value to a blog entry by adding meaningful comments do deserve the credit, but only if his comment is related to the content he has on his own site! (For example, if I have a blog on chess and if I comment on a blog which deals with politics, should I get the link credit? I don’t think so!) 

If you are not clear about this whole issue, I will explain it for you.

Whenever you search on a keyword in Google, the web pages returned are based on the Google page ranking. The pages with highest rank will appear first and hence will get more traffic.

Now how is this pagerank determined? One of the key parameters is how many Websites link to your blog or Website. If many people refer your Website, you will have a better pagerank. This is a cool idea and in most cases will ensure that most relevant pages are displayed for a search keyword.

As blogs started appearing in internet scene, people realized that by commenting to blogs, they can increase their pagerank. So spammers started using automated tools to bombard blogs with comments. Soon it was apparent that some mechanism is required to fight the spammers. Then came the “rel=nofollow” attribute.

When “rel=nofollow” is added to a link, Google and other search engines ignore the link for pageranking. This means that there is no advantage in comment spam since your link is worthless.  Soon all the blogging platforms (Wordpress, Movable Type) etc. started added “rel=nofollow” to all the links in the comments automatically.

But unfortunately, this didn’t help in reducing the spam. Spammers have kept their comment bombing on. Sometimes they do get traffic via clicks on the comments.

I do agree with yesfollow that “rel=nofollow” is yet to have any impact on blog spamming. As a blog owner you need to be watchful of spam and should delete it immediately.

But there is a reason why Google wants us to use “rel=nofollow”. Note that relevance of a Web page to a keyword is determined by pagerank. So if I provide a lot of “meaningful and useful” comments on a lot of blogs, I do get a lot of inbound links. But that doesn’t guarantee that my site is relevant to the comment text or search keyword!

Hence if yesfollow becomes  widely used, the pageranking algorithm will have to be modified. Google will have to identify the blog comments and then discount them in pagerank calculation.

Yesfollow guys, pageranking is not to reward someone, but to find sites which is most relevant for a keyword!