Finding whether an input column is missing

Question

I am working on a problem similar to this one:(supervised, artificial data)

x=np.random.uniform(-100,100,10000)
y=np.random.uniform(-100,100,10000)
z=np.random.uniform(-100,100,10000)
a=np.random.uniform(-100,100,10000)
b=np.random.uniform(-100,100,10000)

i= x**2/1000 + 2*x*y/1000 + 4*np.sqrt(abs(z)) + z*a + 4*a + abs(b)**1.5 -1/((b+1)**2) * a + 10*y

Since I am not creating the data myself I want to make sure, that my customer provided all the relevant input features. Is there a way to find out, whether the input is complete and not lacking a feature, say "a"? Obviously if the input is the same and the output differs it would be evidence of missing data but it isn't guaranteed that any two input samples are the same. Another way I thought of would be to use an autoencoder to find the dimension of the dataset(including the output) and hope it is exactly the input dimension but in my case it is also possible that there are redundant features. Is there any other way to check whether a function is computable from the given inputs?

I might not have understood it well, but for handling missing data you don't need AI, a simple dimensionality check would do the trick. Also if you miss "a" then when evaluating "i" an error will be raised — JVGD, Aug 02 '20 at 11:04
Dimensionality checks aren't "simple" if your data is high dimensional and possibly not even continous. Also I'm fully aware that I would get an error but as I'm not creating the Data myself I can't check that. The problem is rather that my customer might have forgotten to include some input columns and there might also be some redundant or completely unnecessary ones. The question is essentially whether there is a way to check, whether I can actually learn the function given my data — munichmath, Aug 28 '20 at 11:23

Finding whether an input column is missing

0 Answers0