In the case that
$P$
is
$0$
on
${\mathcal{A}}_{2}$
, we shall need the following theorem, which already gives a lot of significant information about the function
$f$
. It should be thought of as analogous to the fact that the moments of a (vector) random variable can be read off the Fourier Transform of its distribution (the characteristic function) by looking at derivatives at
$0$
.
Theorem 3.8.
Suppose
$f:G={\mathbb{Z}}_{2}^{k}={\left\{0,1\right\}}^{k}\to \mathbb{R}$
is nonnegative (and not identically
$0$
) and has all its Fourier coefficients of order at most
$r$
(and at least 1) equal to
$0$
. Let
$\mu $
denote the uniform probability measure on the cube
$G$
and
$\nu $
denote the probability measure on
$G$
defined by
$\nu \left(A\right)={\sum}_{x\in A}f\left(x\right)/{\sum}_{x\in G}f\left(x\right),\text{(}A\subseteq G\text{)}.$
Let also
${X}_{1},\dots ,{X}_{k}$
denote the coordinate functions on
$G$
, which we view as random variables. Then for all
${i}_{1}<{i}_{2}<\cdots <{i}_{s}$
,
$0\le s\le r$
, we have
$${\mathbf{E}}_{\nu}({X}_{{i}_{1}}\cdots {X}_{{i}_{s}})={\mathbf{E}}_{\mu}({X}_{{i}_{1}}\cdots {X}_{{i}_{s}}).$$

Proof.
Let
$F={\sum}_{x\in G}f\left(x\right)$
. We assume for simplicity that
${i}_{1}=1,\dots ,{i}_{s}=s$
. Then, writing
$x=({x}_{1},{x}_{2},\dots ,{x}_{k})$
and
$\left[s\right]=\left\{1,\dots ,s\right\}$
, we have
$$\begin{array}{ccc}{\mathbf{E}}_{\nu}({X}_{1}\cdots {X}_{s})& =& \frac{1}{F}{\sum}_{x\in G}f\left(x\right){x}_{1}\cdots {x}_{s}\end{array}$$  
$$\begin{array}{ccc}& =& \frac{1}{F}{\sum}_{x\in G}f\left(x\right)\frac{1+(1{)}^{{x}_{1}+1}}{2}\cdots \frac{1+(1{)}^{{x}_{s}+1}}{2}\end{array}$$  
$$\begin{array}{ccc}& =& \frac{1}{{2}^{s}F}{\sum}_{x\in G}f\left(x\right){\sum}_{S\subseteq \left[s\right]}(1{)}^{\leftS\right+{\sum}_{i\in S}{x}_{i}}\end{array}$$  
$$\begin{array}{ccc}& =& \frac{\leftG\right}{{2}^{s}F}{\sum}_{S\subseteq \left[s\right]}(1{)}^{\leftS\right}\frac{1}{\leftG\right}{\sum}_{x\in G}f(x\left)\right(1{)}^{{\sum}_{i\in S}{x}_{i}}\end{array}$$  
$$\begin{array}{ccc}& =& \frac{\leftG\right}{{2}^{s}F}{\sum}_{S\subseteq \left[s\right]}(1{)}^{\leftS\right}\widehat{f}(S)\end{array}$$  
$$\begin{array}{ccc}& =& \frac{\leftG\right}{{2}^{s}F}\widehat{f}\left(0\right)\text{(by the vanishing of}\widehat{f}\text{)}\end{array}$$  
$$\begin{array}{ccc}& =& {2}^{s}\end{array}$$  
$$\begin{array}{ccc}& =& {\mathbf{E}}_{\mu}({X}_{1}\cdots {X}_{s})\end{array}$$  
Remarks. 1. For functions
$f:\{0,1{\}}^{k}\to \{0,1\}$
, the above theorem follows directly from the definition of
$t$
nullity in Section 3.1 . However, as we shall see in the proof of Lemma 3.10 we need to apply this theorem for functions whose range is not
$\{0,1\}$
.
2. If the nonnegative function
$f$
is symmetric then the identity of moments up to order
$r$
with those of the uniform distribution (
$r$
wise independence) and the vanishing of the nonconstant Fourier coefficiens of weight up to
$r$
are equivalent. This can be proved by induction on
$r$
. We do not use this here.
Corollary 3.9.
Under the assumptions and definitions of Theorem 3.8 the random variable
$S={X}_{1}+\cdots +{X}_{k}$
has the same power moments under the probability measures
$\mu $
and
$\nu $
, up to order
$r$
.

Proof.
The power
${S}^{s}$
,
$s\le r$
, can be written as a sum of terms of the type
${X}_{{i}_{1}}\cdots {X}_{{i}_{t}}$
, for
$t\le s$
.
One uses the fact that
${X}_{j}^{2}={X}_{j}$
.
Lemma 3.10.
If
$P$
is
$0$
on
${\mathcal{A}}_{2}$
, then
$f\in \{0,1\}$
.

Proof.
Suppose the polynomial
$P$
is constantly equal to
$0$
on the set
${\mathcal{A}}_{2}$
and that
$f\notin \{0,1\}$
. The sequence
${f}_{j}$
is constant in each of the three intervals of
${\mathcal{A}}_{2}$
. By possibly considering
$1f$
(whose Fourier coefficients vanish exactly where those of
$f$
do), we may assume that
${f}_{j}=0$
on the middle interval
$(a,b)$
. Define the nonnegative function
$g:G\to \mathbb{R}$
by
$$g({x}_{1},\dots ,{x}_{k})=f({x}_{1},\dots ,{x}_{k})+f(1{x}_{1},\dots ,1{x}_{k}),$$
and observe that the Fourier coefficients of
$g$
of weight at most
$kN$
vanish. Let
$\tau $
be the distribution of the random variable
$S={X}_{1}+\cdots +{X}_{k}$
under the measure induced by
$g$
on
$G$
(each vertex
$x\in G$
has probability proportional to
$g\left(x\right)$
). Note that this is a well defined probability distribution since we assumed that
$f$
and
$1f$
are not the
$0$
function. Clearly
$\tau $
is symmetric about
$k/2$
and has no mass in
$(a,b)$
, since both
$f({x}_{1},\dots ,{x}_{k})$
and
$f(1{x}_{1},\dots ,1{x}_{k})$
are
$0$
when
${x}_{1}+\cdots +{x}_{k}\in (a,b)$
. The
$s$
th moment with respect to the measure
$\tau $
of the variable
$S$
in Corollary 3.9 is the expression
$$M(\tau ,s)=\frac{1}{F}{\sum}_{j}{g}_{j}\left(\genfrac{}{}{0ex}{}{k}{j}\right){j}^{s},$$
where again
$F={\sum}_{j}{g}_{j}\left(\genfrac{}{}{0ex}{}{k}{j}\right)$
. By Corollary 3.9 this must equal the
$s$
th moment with respect to the binomial measure
$\mu $
, which is the quantity
$$M(\mu ,s)={2}^{k}{\sum}_{j}\left(\genfrac{}{}{0ex}{}{k}{j}\right){j}^{s}.$$
But the variance of
$S$
under
$\mu $
is
$$M(\mu ,2)M(\mu ,1{)}^{2}=k,$$
since under
$\mu $
the random variables
${X}_{1},\dots ,{X}_{k}$
are independent, while the variance of
$S$
under
$\tau $
is
$$M(\tau ,2)M(\tau ,1{)}^{2}\ge C{\epsilon}^{2}{k}^{2},$$
as half the mass of
$\tau $
sits to the left of
$\frac{1\epsilon}{2}k$
and half to the right of
$\frac{1+\epsilon}{2}k$
. These orders of magnitude are different whenever
$\epsilon \ge C/\sqrt{k}$
, which is true in our case as
$\epsilon \ge C/logk$
. This contradiction proves that
$P$
cannot equal
$0$
on
${\mathcal{A}}_{2}$
.
Extending
${\mathcal{A}}_{2}$
to
$[0,k1]$
.
The rest of the proof goes as follows. By Lemma 3.10 , we may assume that
$P\left(l\right)=1$
or
$1$
for
$l\in {\mathcal{A}}_{2}$
. Without loss of generality, assume
$P$
is
$1$
on
${\mathcal{A}}_{2}$
. We apply Theorem 3.6 for
$m=4,8,16,\dots $
successively and each time we choose a prime
$r$
such that
$mrN$
is minimized. Theorem 3.6 gives for all
$\nu \in \mathbb{Z}$
$$\begin{array}{c}P\left(\nu \right)mP(\nu +r)+\left(\genfrac{}{}{0ex}{}{m}{2}\right)P(\nu +2r)\cdots +P(\nu +mr)=0modr.\end{array}$$ 
(11)

When
$\nu \in [0,kN]$
the numbers
$\nu +lr$
for even
$l$
in 11 are in the set
${\mathcal{A}}_{m/2}$
and therefore the corresponding
$P$
values are all
$1$
, by induction on
$m$
. In order to deduce that 11 holds as an identity of integers (not residue classes) it is enough to guarantee that the sum of the absolute values of all terms is less than
$r$
. This amounts to the inequality
${2}^{m}<r$
. Given that
$mr\sim k$
this is true if we can guarantee that
$$\begin{array}{c}m\le {c}_{1}logk,\end{array}$$ 
(12)

for some small enough constant
${c}_{1}$
. Therefore, as long as
$m$
satisfies the bound 12 , we have that, for
$\nu \in [0,kN]$
,
$$\begin{array}{c}P\left(\nu \right)mP(\nu +r)+\left(\genfrac{}{}{0ex}{}{m}{2}\right)P(\nu +2r)\cdots +P(\nu +mr)=0.\end{array}$$ 
(13)

Since the total weights of the positive and negative terms in 13 are the same, it follows that the
$P(\nu +lr)$
terms corresponding to odd
$l$
are also
$1$
.
Each time we perform this operation we deduce that
$P$
is
$1$
on a collection of intervals
${\mathcal{A}}_{m}$
which consists of
${\mathcal{A}}_{m/2}$
and one interval of length
$\epsilon k$
in the middle of the gap between any two succesive intervals of
${\mathcal{A}}_{m/2}$
. So
${\mathcal{A}}_{m}$
has
$m+1$
disjoint equispaced intervals of length
$\epsilon k$
. We apply this operation until we have
$\epsilon m\sim 1$
, which implies that we have covered the whole interval
$[0,k1]$
with our set
${\mathcal{A}}_{m}$
. We need to make sure that 12 still holds then. Since
$\epsilon m\sim 1$
this is achieved by setting
$\epsilon =C/logk$
, for a large enough constant
$C$
. At the end of this process, there could still be some very small possibly uncovered intervals of size
$o\left(\epsilon k\right)$
. However since we have already shown that
$P\left(l\right)=1$
on a set of
$ko\left(\epsilon k\right)$
entries, we can use the fact that
$P$
has degree at most
$N1$
to obtain that
$P\left(l\right)=1$
on the whole interval
$[0,k1]$
.
This concludes the proof of the Theorem
3.3 , which implies:
Corollary 3.11.
The class of symmetric
$k$
juntas can be learned exactly under the uniform distribution with confidence
$1\delta $
in time
${n}^{O(k/logk)}\cdot poly({2}^{k},n,log(1/\delta \left)\right)$
.
4 Discussion
The main open question is to obtain tight upper and lower bounds on the running time of the Fourierbased algorithm for symmetric juntas. It may even be that for large
$k$
, every symmetric function has a nonzero Fourier coefficient of constant order.
It should also be noted that in the case of balanced symmetric functions, i.e., symmetric functions with
$Pr\left[f\right(x)=1]=1/2$
, a bound of
$O\left({k}^{0.548}\right)$
follows from [
?]
(see [
?]
). Hence to improve our result, one may focus on finding new techniques for unbalanced functions.