Complications with Vision Processing

There have been a number of advancements in recent years with robotic vision.

As seen within this recently published article,

Improved Generalized Belief Propagation for Vision Processing

Well..  hold on.

One of my goals with this blog I feel is to offer translation services.

I want to take scientific articles like this and summarize them in normal speak for people who may get lost with the mumbo-jumbo. It is not to say that the average reader isn’t intelligent enough to comprehend what is written there- instead, it is to say that the authors who publish these articles write in a way that is basically ineligible to people not in their field. After all, they aren’t normally authors, they are scientists so why should what they write be easy to read?

Ok, back to GBP- or Generalized Belief Propagation.

I can go into why Robots are used, and the benefits of a Robot to a person and yadda yadda- but that’s an entire post in and of itself and i’m here now to talk about GBP so..  sorry, you’ll have to get that one later.

All you need to be concerned with right now is that some robots need to see.

A machine doing certain types of jobs needs to be aware of what is in front of it, so it can act accordingly.

Simple yes?

A Robot in the engineering world- ‘looks’ at what is in front of it self. It will try to form patterns and recognize what is there based off of pre designated conditions in it. It then has a NO GO/GO command option.

A NO GO, is when the Robot looks in front of it self, and can not recognize what is there. It will then follow the command for a NO GO, wether that is to send up an error report, or to try from a different angle, or to just sit there and do nothing.

A GO, means the Robot can make sense out of what is in front of it. It recognizes the thing there as matching the pattern in its memory and will then follow the next command which is associated with that pattern which could be a number of things, like to demolish that, or to connect piece A with piece B, or to turn a part over.

There are a number of things involved here. First, the camera involved. Is it electro-optical (normal) or Infra-red? Does the machine have vision like Predator and is checking variances in the levels of heat?

Who knows- and who cares. It all is the same thing from an engineering standpoint. The Robot (a computer) is getting input, just like if you typed in a command to your computer. The computer (Robot) will either recognize the command and move along, or if will tell you you’re an idiot for not typing in a command it knows.

That part where the Robot looks through its database and everything is a huge hiccup the Robotic Vision community have always had.  It’s not a show stopper at all, and there’s been a number of amazing developments, but the time it takes for a system to try and match what it is ‘seeing’ and what it knows is far too long to be practical in many cases.

For instance, if you held your hand up and looked at it- you recognize that as a hand. You could imagine taking a picture of that and putting that in to a computer and saying HEY, any time you see that, this is a HAND.

Now what if you turned your hand ninety degrees? You still know it’s a hand, but will the computer? How about if you used your right hand instead of your left. How about looking at the difference between the hands of a child and a senior citizen, or a light skinned male with an dark skinned female with red nail polish, or a thin person with a heavy person. You as a person can simply still say all of the above are hands. But would a computer? How about if you made a fist?

These comparisons are crucial in robotic vision and one of the reasons the industry still has much work to do.

This article, which comes from a group in China working on methods to increase vision efficiency- proposes a new method for algorithm processing to not only speed up the process but increase accuracy.

One of the ways they accomplish this is by when the robot “scans” an object for recognition, the two directions will be static and parallel along each axis. The purpose here is to match what the computer sees to what it has on file, and this method along with a formula provided (math math math, I know) increases output.

So- why does this matter to A.I.?

On two fronts actually

First, progression in the ways a computer visually scans an item is critically important in producing a cognitive system. Yes, blind people are people and exist and can think- I get that- but that is irrelevant to A.I. progression.

The code and algorithms which tell a system how to see need to be fully developed and far beyond where they are now- but the processing to tell a system what it is seeing is flawed here.

I have seen others literally tell a robot to look at something and then they say, “Hey Robot, that is a fish”. So then the Robot looks for itself, sees what it is that makes a fish, and goes from there. That’s more on the right track, but the brains of these robots need to be designed to allow for variations in classes. Fish could mean any number of thing. Dead fish, live fish, drawn fish, clown fish, barracuda- all are fish.  The leg work is going to be in making these A.I. systems understand that classification levels.

Man and/or Woman = Human

Human and/or Woman ≠ Man

Human ≠ Man

Man and/or Woman = Mammal

Wolf = Mammal

Wolf ≠ Man

So, that whole classification system needs to be able to be understood with everything by a computer.




You as a human can form those groupings and understand that. A computer will have to learn that. Not be told what it is looking at, but learn HOW to see what is in front of it.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s